[Read time: 5 minutes] May 26, 2023

The most compelling bearish case I see for LLMs is that they'll plateau in performance, either because we'll saturate novel data from large scale crawling or because of some fundamental logical reasoning limitation of existing transformer architecture. "Hooray," the critics will cheer. We can go back to humans being in the driver's seat. Skynet will have to wait until another day.

That last part might be true. But even a bearish case for performance can still change a lot. Let's make three conservative assumptions:

  1. LLMs will be trained in largely the same way as they are today
  2. LLMs will only have knowledge of public information through large-scale text corpora
  3. Experts (either individually or a consortium) will still perform better than LLMs for professional grade tasks

In this world the real breakthrough with large language models might not be exceeding human levels of performance in a discrete task. Perhaps it's enough that they can approach human level performance in a variety of tasks. There might be more whitespace in intersectional disciplines than aiming for true expert status in any one.

Jack of all trades

There's a good book by David Epstein that argues for breadth over depth for most pursuits in life: education, sports, business. The subtitle of the book sums it up: "Why Generalists Triumph in a Specialized World." His arguments mostly boil down to two key observations:

  1. The majority of successful adults took a circuitous route to their professions. They almost all built up general skills in childhood before focusing on niche domains later.
  2. More problems in the world today require some intersectionality than ever before. They require people to take novel information or signal from across domains and apply them to new problems (think: data science x product, surgery x robotics, law x net neutrality).

Extrapolating a bit from these two points: Learning general skills allowed people to build up a mental model of the world. This applied both physiologically (precision while throwing a ball, fast twitch muscle fibers, etc) and neurologically (mental models for thinking about problems, collaboration, creativity). These skills were broad enough to be applied to their more specific field.

Large capacity generative models are by definition generalists. They are not isolated to training data in a specific discipline, but instead it's the act of being exposed to a whole variety of different domains that seems to give them the positive emergent properties of reasoning skills.

This broad training also has the benefit of individuals being able to ask them about a whole range of areas. It'll usually give an answer with some compelling validity. Even when hallucinating, the answers often sound plausible. They have been able to internalize a lot of the vocabulary and facts of each domain. LLMs in many ways are a programatic Jack of all trades.

The rare connective tissue between disciplines

Back when I was at university I was surprised by how little collaboration happened between departments. Even within one department people are often so separated by philosophical schools of thought that they continue to pursue their own research interests in shallow lanes.1 This is despite clear applicability and relevancy of external domains.

Even in ML, a lot of the most innovative work of the past ten years came from other fields. Optimization momentum in Adam came from an interpretation of physics; word embeddings came from linguistics and lexical semantics going back to 1957.

These are anecdotal experiences but it seems like the trend rings true. Most experts have spent their lives honing a skill in one particular discipline. They went deep to pursue some new course. And they needed to; there are enough smart people working on hard problems in each discipline that to truly do something novel, you have to go into the weeds.

But there are just not enough hours in the day to get a grasp of everything. An expert can't wade through arxiv to pattern match between multiple disparate disciplines in the hope of stumbling upon something that might help your own pursuit. And if they try, they are often faced with such differing terminology to refer to the same concepts that they can miss a pattern that's otherwise hiding in plain sight. I'm reminded of the overridden meanings for the k constant: everything from the spring constant in physics to the Boltzmann constant in thermodynamics.

Enter language models

Where humans can't pursue that breadth, neural networks might be able to. They have enough of that breadth codified into weights that they're able to regurgitate them on command. And if they're able to summarize facts from a particular discipline, there's no reason to think they might not be able to automatically mine similarities between two of them.

I'm convinced there is a sea of research (perhaps some quite meaningful) that are relatively low hanging fruit today. They've just been historically overlooked because they sit at the intersection of two disciplines; or require pulling pulling on threads from the far ranges of two unrelated disciplines. Connecting these dots likely don't require a PhD. They don't require being a true world class expert. But they do require enough understanding of two subjects to pattern match in novel ways.

This fits with the above theses on the current limitations of LLMs. They might not exceed human performance but still could develop some breakthroughs - even if that breakthrough is just reframing the problem in understandable terms, and having a person run from there.

The key lifecycle to research in the early stages are:

  1. Need identification: What are the biggest problems that are facing a given research area today?
  2. Literature review: Why has existing prior art in your field not solved for this problem?
  3. (External) What relevant streams of research exist from other disciplines?
  4. Frame hypotheses and experiment.

The 3rd part of the cycle seems like the most obvious usage of these broad models today. But with a bit more sophistication, there's no reason to think they can't run a broader loop of this lifecycle: research literature, develop additional questions, perform additional research, and hone from there.

Conclusion

LLMs may not be the best at everything, but their very nature as broad, generalist systems might just prove to be their biggest strength. And this isn't even a pie-in-the-sky idea reserved for the next generation of architectures. I'd be surprised if these models aren't already working behind the scenes in some PhD theses.

As the original saying goes, "A jack of all trades is a master of none, but oftentimes better than a master of one."


  1. This is one phenomenon that even artificial intelligence research does not escape. The school of ML that comes from a more classical logician heritage dismissed end-to-end neural networks for years. Now the tables have turned and it's the neural network practitioners that are largely ignoring the logicians. 

[Read time: 3 minutes] May 11, 2023

I took a computer vision course in college with a rather provocative professor. One of the more memorable lectures opened with a declaration: representations in machine learning are everything1. With a good enough representation, everything is basically a linear classifier. Focus on the representation, not on the network.

When we were simultaneously training neural networks with millions of parameters, I thought it a rather insane thing to say.

Technically he's right - of course. If you can shove most of the difficult work of taking an input and projecting it into a numerical representation (and that numerical representation is conditioned on the task that you want to accomplish), you by definition have turned your problem into a separable one. Once you've solved your problem, you've basically... solved your problem. Technically speaking the representations can collapse to one common set of representations for True and one common set of representations for False.

It's interesting to think about this again with the recent wave of autoregressive models. The latency in the current generation of generative models are caused because they feed every newly generated token back into the decoder - one by one by one. The longer your desired output the longer it's going to take the model to generate the answer.

But if you can fit a lot of data into the initial request to the model, and manage to frame the problem it as a binary classification, results can be delivered almost instantly. You still need the model to encode the initial string but this is lightning fast. It's a few big tensor multiplications and can be parallelized accordingly. That's it. Results happen in one decoder step.

Representations are hidden to the average user of a language model but they're hiding just underneath the surface. It's the representation that lives within a layer or two short of the projection head, right before the network has to decide the next word to output. It has a sense of what you're intending to do; at least in so far as that intent minimizes the perplexity of what comes next.

Interestingly, encouraging the model to "think" or "criticize" itself in writing helps to refine these representations further. Encouraging some text-based linear reasoning can be enough to arrive at a right answer. This is true even though jumping straight to predicting the output might result in a wrong one.

Another perspective on LLMs is that they're universal representation agents, and able to condition a good representation based on the goals that you want it to achieve. Once it has that internal state the eventual projection to [True,False] is the easy part. Representations might not be everything but they come pretty close.


  1. Representations being the numerical equivalents of real world objects. Neural networks operate on numbers; not words or images. So before a network can even start learning, you need to make some choices about how to turn those words into floating points (bigram, character-level, wordpiece, byte-pair encoding, etc). 

[Read time: 7 minutes] April 28, 2023

I was considering a quick weekend project to route Siri requests to ChatGPT. My immediate thought was pretty simple based on my existing mental model for Siri as a local<->server pipeline:

  • Set up a MITM proxy or custom DNS server
  • Intercept outgoing Siri requests and route them to a backend LLM
  • Respond with a correctly formatted payload, potentially with some additional utilization of the widgets that are bundled in iOS for weather or calculated responses

That required me seriously poking around with Siri for the first time in a couple years. A lot has changed since I last took a look.

Everything's Local

Since iOS 15, all Siri processing is done locally on device. There's a local speech-to-text model, a local natural-language-understanding module, and a local text-to-speech model. The local NLU module surprised me the most. All logic appears hard-coded and baked into the current iOS version. It'll respond to known tasks like the weather, setting a timer, converting weight, looking up a word, and sending a text message. For all other requests it will open up a web view and search your default search engine for a result. It doesn't attempt to create a text response to queries that fall out of its known action space.

To confirm this behavior I set up a proxy server and started capturing requests. Making a Siri request did indeed issue no external requests, until you ask it for something that requires current world understanding. Asking What's the Weather? routes a specific weather request to Apple's SiriSearch backend through PegasusKit. PegasusKit is a private framework that contains some miscellaneous utilities for image search and server communication.

No Visible Conversation History

One of the original Siri announcement demos was reading a text message, checking for conflicting appointments, and responding to the text message. This demonstrated some contextual understanding - discussing a topic, having a sidebar, and going back to the same topic again. It was impressive because it was similar to how humans communicate. Because we have a robust context memory, we can use sentences that drop the subject, object, or verb because the meaning can still be inferred from what was said before.

On previous versions of iOS, the logical object in Siri was one of these conversations. You'd hold down the home button and Siri would take over the screen. New requests would pop to the top, but you could scroll up to reveal past requests in the same session. The new Siri removed support for these conversation flows. But the underlying logic is still there, as evidenced by requests that do reference previous context:

How's the weather in San Francisco?
How about in Oakland?

This works - it successfully knows we're asking about weather. It's just that the interface for previous prompts is hidden. The new logical object in Siri is intended to be adhoc questions.

An aside on old NLU

The previous generation of personal assistants had control logic that was largely hard-coded. They revolved around the idea of an intent - a known task that a user wanted to do like sending a message, searching for weather, etc. Detecting this intent might be keyword based or trained into a model that converts a sequence to a one-hot class space. But generally speaking there were discrete tasks and the job of the NLU pipeline was to delegate it to sub-modules. If it believes you're looking for weather, a sub-module would attempt to detect what city you're asking about. This motivated a lot of the research into NER (named entity recognition) to detect the more specific objects of interest and map them to real world quantities. city:San Francisco and city:SF to id:4467 for instance.

Conversational history was implemented by keeping track of what the user had wanted in previous steps. If a new message is missing some intent, it would assume that a previous message in the flow had a relevant intent. This process of back-detecting the relevant intent was mostly hard-coded or involved a shallow model.

With increasing device processing and neural inference, all of these models could be brought to the edge. So why not?

Motivation

I don't know the internal reason why Apple chose to roll out local Siri processing in iOS 15 - but we can loosely speculate. The first BETA was released at WWDC in June 2021, which meant that work on a local migration probably started around a year prior at June 2020. GPT-3 was released at nearly the same time: June 2020. Prior to that point generative models were still pretty niche; their main power was generating cohesive text but not logical reasoning or reliable output. The risk factor of malicious output was too high and there was no clear roadmap to decreasing the amount of hallucinations and increasing the logical abilities.

So, given this landscape, I imagine Apple had two key motivations:

  1. Getting Siri on a local device would decrease latency and increase its ability to function offline.

    Those are big wins for a platform that often forced users to wait longer for a server response than for users to do the task themselves. Speech-to-text and text-to-speech models were getting good enough to deploy on the edge and have fast inference to happen in realtime. And Siri's business logic itself was always a relatively simple control system, so this would be easy enough to code locally. There was no need to keep this pipeline on the server.

  2. Privacy

    Apple's has long tried to push more processing to the edge to avoid sending data to their servers if avoidable. Object detection in photos happens locally, encrypted iMessages are routed through a central routing system but otherwise sent directly to devices for storage, etc. Siri was a hole in this paradigm - so if Apple could push it to the edge, why wouldn't they?

Future

The new generation of self-supervised LLMs have almost nothing in common with these previous generation of NLU models. They may support task delegation through something like ChatGPT Plugins or LangChain, but their control logic and subsequent follow-ups are all the emergent property of the training data. They don't limit their universe of responses to known intents, which has shown to be incredibly powerful both for its ability to respond in natural language and its ability to bridge logic across multiple sub-systems.

Apple's in somewhat of a bind here. On one hand - they made a switch to local devices to improve offline support and improve privacy. On the other - the new generation of LLM models are drastically better than the NLU approaches of previous years. They support more functionality and better reasoning than the systems that came before.

Can't Apple just implement a new backend to Siri using LLMs? There's been a lot of movement in compressing LLMs onto laptops and phones using bit quantization. The phone POCs have focused on the 7B or 11B Alpaca models because of memory requirements (and almost certainly inference computation speeds). This is in the ballpark of the GPT3.5 model powering ChatGPT (at 20B) but a far cry away from GPT-4's 1T parameters 1.

At least until we improve model distillation and quantization we can always assume local models will be a generation behind server hosted versions. And people are perfectly willing to use server processing to access the latest and greatest models across personal and businesses2. 11B models are useful; 1T models are super useful; 5T models will probably be even more so - although with some diminishing returns to scale. Privacy might take a backseat to processing performance.

I have no doubt that Apple is working on a local generative architecture that can back future versions of Siri. I'd actually put money on them rebranding Siri in iOS 17 or iOS 18 and drop the legacy baggage. The real question in my mind is how Apple will weigh higher cognitive performance (server-side only) or more privacy (local only).

This is how I'd roadmap a feature rollout like this:

  1. V1. Re-introduce a server-side processing model. Speech can be converted into text on-device for speed of device->server text streaming, but the LLM processing logic should be on the server.
  2. V2. Allow 3rd party applications to provide a manifest with their own API contracts. Define what each API endpoint does and the data that it requires to work. If the LLM detects that these applications are relevant to the current query, route the parsed data into the application payload and send it back to the device.
  3. V3. Add a local model to the device that's only supported offline and routes to a server side when users have the bandwidth.

OS integration is certainly where we're headed. I'll hold my breath for WWDC this year to see if any of these dreams are realized, and what it looks like when they are.


  1. At least according to Semafor. There's been some public debate about how many parameters GPT-4 actually contains. 

  2. Quoting the linked, "ChatGPT is the fastest growing service in the history of the internet. In February 2023, it reached the 100 million user mark." 

[Read time: 7 minutes] April 27, 2023

Minimum Viable Products (MVPs) are popular in startups because they allow testing of underlying risks without needing to build a full solution. In hardware companies, MVPs demonstrate that a solution can be physically built given specification constraints. In software companies, they prove that a problem is legitimate and there is consumer interest in the solution. Once you get the initial signal, iterate. Move fast and break things. The other clichés also apply.

Public infrastructure obviously isn't treated the same way. You can't iterate when you're building huge things. You also can't tolerate failure in the same way. You don't want a bridge constructed in a month only to fall down the year after. The bulk of the bureaucracy for infrastructure is making sure projects meet this bar of safety; safe to use, safe to be around, and safe for the environment.

But we do have minimum viable infrastructure: situations where we have some basic infrastructure but it's simply not good. You can check the boxes on a tourism advertisement and that's about it. It doesn't actually solve for the key needs that the infrastructure seeks to address but it's typically visible enough for people to be aware of its existence.

As one representative case, a staggering number of people I speak with in San Francisco categorically refuse to ride the Muni, which services the bus and lightrail lines in the city. Common complaints include reliability, noise, safety, and the inconvenience of switching lines. The drawbacks are so severe they've opted to avoid public transit altogether. If their destination is nearby, they'll walk; otherwise, they'll call an Uber or buy a car1.

But as someone who still rides Muni despite its problems, I'll be the first to say it's really not that bad. And given my own hesitations, I'm always surprised that I often end up being its strongest defender. But the reputation of public transit in the Bay Area is unfortunately so deeply buried in the trash can that it would be a Herculean task to lift it out. Only 57 percent of Muni riders rate its overall service positively; and that's only considering people that actually ride the Muni.

That's bad. It causes a negative flywheel - fewer people ride transit, resulting in less public support for funding transit, which in turn leads to service interruptions or cuts, and even fewer riders. The cycle continues.

Irritatingly, nothing's actually being done about it. There needs to be more investment in public transportation around US metro centers to make it legitimately useful, but I'm sure voters would galk at the pricetag to fund changes that are actually required. For comparison, here's a non-scientific comparison of SF and a few other cities2:

City Operating Budget USD Revenue USD Note Latest Project Costs USD
San Francisco 1.3B 219M (usage fares)
361M (parking fees)
Shortfall made up by city general fund and state operating grants
Large capital projects funded by proposition funds
1. 300M (Van Ness Bus Lanes)
2. 1.95B (SF Central Subway)
London 9.82B 11.32B 925M in capital renewals
518M in net interest costs
1. 25B (Elizabeth line)
2. 435M (DLR Train Upgrade)
Copenhagen Unknown Unknown 1. 2.3B (M1 & M2 Subways)
2. 3.8B (M3 Subway)
3. 492M (M4 Subway)
Auckland 2.29B (amortized budget for the decade) 2B

Muni's operating budget is $1.3 billion annually. The composition of this budget is relatively rare, however, in having such a large discrepancy between budget and revenue brought in by ridership fares. Even when you include parking ticket revenue, the SFMTA is still reliant on the city and state for the $720million of shortfall in additional subsidies. Imagine increasing the Muni budget by 2x to equate it to Auckland, or 8x to equate it to London. Based on current revenue this would mean almost all budget increases would have to be funded by additional tax revenue. Most voters would shrug and ask why. They're not going to use it anyway.

Let's take away SF's minimum viable infrastructure for a second. Let's imagine that the city had no public transit. None at all. No busses, no lightrail, no subway, no anything. Just cars on the winding hills for as far as the eye can see. It can still have a CalTrain terminal in Mission Bay, that I'll give to you.

Given San Francisco's progressive tendencies and its wealthy tax base, the absence of public transit would certainly be deemed unacceptable. Task forces would be formed, activists would get involved, and before you know it there would be a proposal on the local ballot. The outcome would be one of two things:

  1. Subsidize ridesharing, like Uber or Waymo
  2. Invest in public transit

I imagine the bureaucratic headache of the first approach would be severe but not entirely unsurmountable. More problematic is the congestion; rideshare services will need more cars to service the demand and those cars will necessarily make traffic worse. Public transit has the benefit of being able to bypass individual cars - either in dedicated lines or underground - so it can be net beneficial to reducing traffic and getting people somewhere faster.

So let's assume SF chooses to invest in a serious public transit works project. How do you pay for it? Levy additional property/sales tax most likely, or sell bonds for a future return on investment. I'd bet you voters would pass that proposition in a heartbeat - even if it ended up being a 2x, 5x, or 10x multiplier of what Muni's budget is today. The unknown whitespace of something new (dare I say - going from 0 to 1) creates excitement. The promise of a better future for a currently broken system just doesn't deliver in the same way.

The danger of having infrastructure offerings at all when they're bad is that people think they can never get good. There's nothing physically intractable about getting good infrastructure in San Francisco or in the US more broadly. It's a question of policy and funding. But I do believe those policy issues are intractable if there's not a hard fork with the current system.

One proposal:

  1. Modify California's stringent (and often unnecessarily litigious) environmental review for certain public transit projects. Have a fixed public comment window and then stop accepting roadblocks.
  2. Take two billion dollars (the same amount that SF just spent on one 2 mile extension of a subway) and put a public competition out for digging tunnels and building stations. The Boring Company is obviously a candidate but let the city put its innovation where its mouth is. I'm sure the proptech ecosystem would be thrilled to throw their hat into the ring.
  3. Create a new public-private company to manage the new effort. Cap the staffing at 20 of the best people you can. Pay them way above top of market, assuming they meet certain project milestones.
  4. Call it something sexy. Don't focus-group names to only come up with Muni, Bart, or CalTrain. And give it a clean and recessed color scheme; small splashes of color, mostly neutral backgrounds. I know I'm nostalgic but you can't beat these old subway designs.
  5. Make it go where people want to go. Connect Chrissy Field with the Financial District. Connect the Mission with the Sunset. Connect the Richmond with the Embarcadero.
  6. Dig. Dig out of the lime-light. Dig without having to close Van Ness for 5 years. But whatever you do, just dig.
  7. Don't open it prematurely. Per the point of this whole article, it's better to exceed expectations than to fall flat. First impressions are lasting. Build momentum, keep building excitement.

Then finally, in 5 years, or 10 years, release it to the public. Maybe coincide it with the first flower blooms of spring. Bypass rush hour traffic and get to the ocean shore from your office building. Go out in Hayes when you live in Potrero and pop back home. For that vision 2028 or 2033 doesn't even sound that far away.

The core of my point here - and unfortunately I fear it may be true - is we need a reset on public transit. For all the promises of finally reinvesting in infrastructure development, we're not doing the best job. Voters need to be handed a more comprehensive choice: a package of law changes that make it easier to build, the establishment of learner agencies to conduct that building, and a better marketing message to the people: "If you fund this, we're going all in. It'll be worth the wait."

There's nothing minimum viable about that.


  1. When I really pushed, most people said they last tried to ride the Muni 5 years ago. Some admitted to never trying to ride it at all. Its damaged reputation so far preceded it that they just wrote it off entirely. 

  2. I looked around for a centralized source that aggregated funding, revenue, and large projects but came up short. If you dig into the raw data I was surprised to find how variable accounting models are for transit across the world. There are a lot of subtle differences in how revenue sources are reported, what is drawn from the city budget versus increases in tax rates, and how large infrastructure projects are green-lit and funded. If anyone has more updated or consistent data, please shoot it my way. 

[Read time: 4 minutes] April 22, 2023

This week I wrote an initial version of an ORM for vector databases. It lets you define indexes as Python objects and search for them using method chaining. The API aligns closely with existing SQL ORMs like SQLAlchemy or Peewee so the learning curve to getting started with this library should be relatively minimal.

Read on for a quick introduction to vectordb-orm, or hop into the source code here.

Introducing vectordb-orm

vectordb-orm offers an opinionated way to define and query for objects that have vector embeddings. Everything is oriented around the declared schema of the objects that you're looking to store. Typehints specify what kind of data these fields should accept and the ORM takes care of synchronizing the database to this schema definition. To define an example object that has a unique identifier with purchase , tag, and embedding fields, do:

class MyObject(VectorSchemaBase):
    __collection_name__ = 'my_collection'
    __consistency_type__ = ConsistencyType.STRONG

    id: int = PrimaryKeyField()
    purchase: int
    tag: str = VarCharField(max_length=128)
    embedding: np.ndarray = EmbeddingField(dim=128, index=Milvus_IVF_FLAT(cluster_units=128))

Each key is optionally configured by a constructor that gives additional options. Some of these are required to give additional metadata about what the database expects (like in the case of embedding dimensions). The type annotations themselves indicate what form the values will take, and are used for casting and validation from the backend storage systems.

Querying also makes use of these type definitions to define the fields that you can search. Searching relies on native Python operations so requests can filter for values:

results = (
    session
    .query(MyObject)
    .filter(MyObject.tag == 'in-store', MyObject.purchase > 5)
    .order_by_similarity(MyObject.embedding, search_vector)
    .limit(2)
    .all()
)

Once the query executes, it'll cast the found database objects into instances of MyObject. It will also return the relevancy score returned by the vector similarity method. This lets you pass these ORM objects around your application logic, complete with IDE typehinting:

print(results[0].result.tag, results[0].score)

> in-store 0.05

The ORM masks a good amount of complexity on the backend for each provider, like casting types, field validation, and constructing the correct queries to the backend providers.

Why VectorDBs

Rather severe context length limitations in the current generation of LLMs have given rise to approaches like the ReAct model. In this design pattern you embed a user's query or the current context into an embedding, then retrieve the most semantically similar pieces of content from a vector database. These can either be documents in a search system or memories in a more general purpose chatbot.

There's a lot of movement in building the ideal vector database. Like most distributed databases there are usually some fundamental tradeoff between consistency, recall, or querying speed. The most popular right now are Pinecone, Weaviate, and Milvus but new ones are popping up all the time with a different claim to their weighing of the core tradeoffs in search recall.

Given different requirements as deployments grow, I see the actual database in large part as an implementation detail. As it stands right now the switching costs between databases are pretty high.

Why an ORM

The mental model for different vector databases is effectively the same, and very similar to conventional relational databases. You have a datapoint that has some metadata and is enriched with a vector embedding. You want to do some combination of INSERT and SELECT from this table, where SELECT queries involve both filtering for exact match data and finding similar vectors to some new embedding input.

Despite the common similarities, each of the vector database providers has their own API structure that are largely incompatible with one another. As such each major project is having to re-implement these backends manually for their own business logic to allow for the community to plug and play with their own favorite vectordbs.

An ORM naturally makes this easier by abstracting the complexities of backends from user-written application code. And so vectordb-orm was born. Like traditional ORMs it also allows for:

  • Improved code maintainability and readability by abstracting low-level database operations
  • Easy switching between different vector database providers without changing the application logic
  • Encouraging best practices and design patterns for working with vector data
  • Native typehints in your IDE hints when developing
  • (Future) Centralized optimizations for insert batching and search pagination

The Future

vectordb-orm is still quite new so it only supports Milvus and Pinecone backends at the moment. A few items on the roadmap for future versions:

  • Add support for additional databases. Weaviate and Redis are the next two on my priority list.
  • Support bulk insertion of input vectors for the providers that support them. This can significantly speed up the initial upsert time for requests that go over the wire.
  • Support more complex chaining of filters as backends allow. Allow or, and chaining to create more complicated predicates. For the providers that don't support these commands natively, provide a local implementation that fetches data and then post-processes locally.
  • Enhanced documentation and community support, including sample projects and tutorials.

If you give vectordb-orm a spin and have some thoughts on the API contract or missing functionality, I'm all ears.