AI engineering is a different animal

John Gruber has written a lot this week on how Apple missed the Siri deadline:

When Apple showed a feature, you could bank on that feature being real. When they said something was set to ship in the coming year, it would ship in the coming year. In the worst case, maybe that “year” would have to be stretched to 13 or 14 months. You can stretch the truth and maintain credibility, but you can’t maintain credibility with bullshit. And the “more personalized Siri” features, it turns out, were bullshit.

I have no inside baseball on this particular setback. But I can tell you, Apple wouldn't be the first company to get bitten by slotting AI research into engineering planning windows.

For 30 years, ever since CPUs were pretty well understood animals and memory was more than a few kilobytes, software engineering has been a deterministic profession. You design the interface, you write the spec, you figure out constraints, then you write the code. Rinse and repeat. Great engineers can see most of the major roadblocks in the way because they can visualize how to get from Point A to Point B. You make progress day by day. You pull some all-nighters. And before you know it you have enough features to make a product.

Engineering can have measurable progress week over week and month over month. Deadlines are still missed but you can typically see them before they happen. Half of the whole scrum/agile movement was built with the goal of project tracking.¹ If someone is constantly behind their target launch dates on each individual feature, you're probably going to miss the final milestone. If someone is on-track throughout, they can usually wrap it up.

AI is not like that. Deep transformer architectures perhaps the least of all. I'm reminded of this xkcd comic that rings true even now. Back when I was leading an ML research team, I always had to tell other departments: until we've done it, we haven't done it. You can make 0% progress for months only to one day hit the right combination of data, architecture, and training configurations that solve the problem. You might even get to 95% of the solution only for that last 5% to be impossible.

But even that term - impossibility. It feels so imprecise, doesn't it? Is it really impossible? So let's say we mean that it's beyond the capabilities of any probabilistic model right now. There's some missing sauce that just isn't there. But the problem is you don't know that. You can't know that. It's impossible to prove that models can't do something. It's only possible to prove that they can. The only thing we can do is keep trying, however fruitless that may seem.

You see this when you look at the release cadence of the major AI shops. Most deploy updates incrementally: updating the system prompt or adding a new tool call comes near daily for OpenAI, Anthropic, and Perplexity. Their researchers modify, A/B test, measure, and go back to the drawing board. These small changes are occasionally bundled and make a splashy media release, but the cadence is unannounced and unexpected. They release a thing when they're ready to release. That's it.

WWDC is a yearly deadline. It falls on roughly the same dates every year, and Apple wants to make a splash. It's in their DNA going back since macOS became a platform: you give developers a sneak peek of things and they help take it to the next level. I get that impulse. A deadline is motivating. And indeed, a WWDC keynote slot is perhaps the ticket to get most features to the finish line.

So: what should they have done? They could have waited. Wait until it was really polished, wait until they had gotten to that 100% (or at least to a good 90). Wait until the next WWDC or wait until a new iPhone hardware drop to do a joint presentation. I think that's what Gruber is advocating in his piece. Pre-announcing and then an indefinite delay can crush years of credibility. It's hard to earn those back.

But there's something deeper to this story. Something that speaks more about the core DNA of what makes a company culture. If Apple wants to win in the AI game, it has to rid itself of the mindset of deadline-based deliveries. It also may have to rid itself of the mindset of immediate perfection. Researchers need to be given the flexibility to ship things periodically, mess up, and try again. LLMs were embarrassing before they got good.² We don't hear nearly as much about hallucinations as we did three years ago.

It's hard to have any tolerance for imperfection when you're one of the most valuable companies in the world. But Apple also doesn't have to train frontier foundational models. They can easily partner with research labs that do, or just provide the framework SDKs to have tighter operating system integration. If they're going to play, they need to play to win. I think that goes deeper than just a missed deadline.

Agile tries to move some of the pain of big features into the planning stage. You break down big features into chunks both so different people can work on them, but also so PMs can track % completion to the overall goal. I'd debate whether it succeeds. But the hours spent backlog grooming or doing t-shirt sizing sure try to scope properly. ↢
They still are embarrassing sometimes. But importantly, the big labs mostly shrug and admit that's a cost of doing state-of-the-art research. The utility outweighs the embarrassment. And never did their brands meaningfully suffer for it. ↢

AI engineering is a different animal

# March 15, 2025

👋🏼