The most compelling bearish case I see for LLMs is that they'll plateau in performance, either because we'll saturate novel data from large scale crawling or because of some fundamental logical reasoning limitation of existing transformer architecture. "Hooray," the critics will cheer. We can go back to humans being in the driver's seat. Skynet will have to wait until another day.
That last part might be true. But even a bearish case for performance can still change a lot. Let's make three conservative assumptions:
- LLMs will be trained in largely the same way as they are today
- LLMs will only have knowledge of public information through large-scale text corpora
- Experts (either individually or a consortium) will still perform better than LLMs for professional grade tasks
In this world the real breakthrough with large language models might not be exceeding human levels of performance in a discrete task. Perhaps it's enough that they can approach human level performance in a variety of tasks. There might be more whitespace in intersectional disciplines than aiming for true expert status in any one.
Jack of all trades
There's a good book by David Epstein that argues for breadth over depth for most pursuits in life: education, sports, business. The subtitle of the book sums it up: "Why Generalists Triumph in a Specialized World." His arguments mostly boil down to two key observations:
- The majority of successful adults took a circuitous route to their professions. They almost all built up general skills in childhood before focusing on niche domains later.
- More problems in the world today require some intersectionality than ever before. They require people to take novel information or signal from across domains and apply them to new problems (think: data science x product, surgery x robotics, law x net neutrality).
Extrapolating a bit from these two points: Learning general skills allowed people to build up a mental model of the world. This applied both physiologically (precision while throwing a ball, fast twitch muscle fibers, etc) and neurologically (mental models for thinking about problems, collaboration, creativity). These skills were broad enough to be applied to their more specific field.
Large capacity generative models are by definition generalists. They are not isolated to training data in a specific discipline, but instead it's the act of being exposed to a whole variety of different domains that seems to give them the positive emergent properties of reasoning skills.
This broad training also has the benefit of individuals being able to ask them about a whole range of areas. It'll usually give an answer with some compelling validity. Even when hallucinating, the answers often sound plausible. They have been able to internalize a lot of the vocabulary and facts of each domain. LLMs in many ways are a programatic Jack of all trades.
The rare connective tissue between disciplines
Back when I was at university I was surprised by how little collaboration happened between departments. Even within one department people are often so separated by philosophical schools of thought that they continue to pursue their own research interests in shallow lanes.1 This is despite clear applicability and relevancy of external domains.
Even in ML, a lot of the most innovative work of the past ten years came from other fields. Optimization momentum in Adam came from an interpretation of physics; word embeddings came from linguistics and lexical semantics going back to 1957.
These are anecdotal experiences but it seems like the trend rings true. Most experts have spent their lives honing a skill in one particular discipline. They went deep to pursue some new course. And they needed to; there are enough smart people working on hard problems in each discipline that to truly do something novel, you have to go into the weeds.
But there are just not enough hours in the day to get a grasp of everything. An expert can't wade through arxiv to pattern match between multiple disparate disciplines in the hope of stumbling upon something that might help your own pursuit. And if they try, they are often faced with such differing terminology to refer to the same concepts that they can miss a pattern that's otherwise hiding in plain sight. I'm reminded of the overridden meanings for the k constant: everything from the spring constant in physics to the Boltzmann constant in thermodynamics.
Enter language models
Where humans can't pursue that breadth, neural networks might be able to. They have enough of that breadth codified into weights that they're able to regurgitate them on command. And if they're able to summarize facts from a particular discipline, there's no reason to think they might not be able to automatically mine similarities between two of them.
I'm convinced there is a sea of research (perhaps some quite meaningful) that are relatively low hanging fruit today. They've just been historically overlooked because they sit at the intersection of two disciplines; or require pulling pulling on threads from the far ranges of two unrelated disciplines. Connecting these dots likely don't require a PhD. They don't require being a true world class expert. But they do require enough understanding of two subjects to pattern match in novel ways.
This fits with the above theses on the current limitations of LLMs. They might not exceed human performance but still could develop some breakthroughs - even if that breakthrough is just reframing the problem in understandable terms, and having a person run from there.
The key lifecycle to research in the early stages are:
- Need identification: What are the biggest problems that are facing a given research area today?
- Literature review: Why has existing prior art in your field not solved for this problem?
- (External) What relevant streams of research exist from other disciplines?
- Frame hypotheses and experiment.
The 3rd part of the cycle seems like the most obvious usage of these broad models today. But with a bit more sophistication, there's no reason to think they can't run a broader loop of this lifecycle: research literature, develop additional questions, perform additional research, and hone from there.
LLMs may not be the best at everything, but their very nature as broad, generalist systems might just prove to be their biggest strength. And this isn't even a pie-in-the-sky idea reserved for the next generation of architectures. I'd be surprised if these models aren't already working behind the scenes in some PhD theses.
As the original saying goes, "A jack of all trades is a master of none, but oftentimes better than a master of one."
This is one phenomenon that even artificial intelligence research does not escape. The school of ML that comes from a more classical logician heritage dismissed end-to-end neural networks for years. Now the tables have turned and it's the neural network practitioners that are largely ignoring the logicians. ↢