gpt-json provides schema definition and validation for
the GPT line of models, for use in data pipelines and
structured reasoning.
vectordb-orm is a small ORM wrapper on top of vector
databases. It allows for easier model definition as
Python objects and abstracts the backend details.
Currently supporting Milvus and Pinecone.
Dagorama is a simple daemon library for Python. It's a
hybrid between Celery and Dask, where tasks can be
chained together but run on different machines in
parallel. Work in Progress.
Groove is a MITM proxy specifically optimized for web
crawling and unit test construction. The core logic is
written in Go and provides an API client in Python. It
allows customization of cache handling, request
recording, 3rd party routing, and TLS fingerprinting.
Headfull Chrome is a docker image for easier web
crawling that integrates font packages, display
virtualization, and remote control. It makes it easier
to develop web automation that mirrors how you use a
browser. For more background, see
the post
on motivations.