Unfortunately SEO still matters

Everyone's saying SEO is done and dusted. It's over. GEO (generative engine optimization) is the name of the game now. If more people are using LLMs for product recommendations, having your product rank first there is as important as showing above the fold on Google.

But is it so different really?

You have to look in different places for your performance metrics, sure. Some software startup specific to GEO will undoubtedly replace Google Search Central and Semrush. But when people start talking about tactics to rank more highly on LLMs, the differences seem to fade.

LLMs are doing the same basic authority prioritization as search engines are. They have to. Most language models are powered by the same giant, broad crawl of the web.¹ The corpora is the same; the process of indexing and the interface at inference is the main difference.

In the early days, language models would train on all of this content with equal weight. A token was a token. Learning how to predict one well had the same loss decrease as any other one. But as decreasing hallucinations became more important, researchers figured out that you can increase accuracy (and decrease corpus requirements) by focusing on high quality data:

LIMA: Less Is More for Alignment (Zhou et al. 2023)
Text Quality-Based Pruning for Efficient Training of Language Models" (Sharma et al. 2024)
AlpaGasus: Training A Better Alpaca with Fewer Data (Chen et al. 2023)

This is true both for pretraining (always higher data requirements) and task specific finetuning (lower data requirements but appropriate generalization diversity required). Sharma showed that a 40% reduction in dataset size resulted in a better pretrained model on OpenWebText. Zhou showed that finetuning a model for conversations on a thousand high-quality datapoints beat a 52,000 datapoint finetuned Alpaca.

To do this curation at scale, LLMs have to use heuristics to filter the data. Some are simple hardcoded functions, others use scoring models to prioritize high quality content. Some even use links to give some authority from site to site. Sound familiar? Yup, welcome back to the party PageRank.

Site reputations still matter and page reputations do too. How do you get inbound links of people talking about your content? You use the same strategies that usually apply to SEO and backlink generation. Taking that at its most negative, people hack the system and build junk backlinks. At its best, that's writing valuable content and having people cite statistics that they can't find elsewhere. Most sites are somewhere in between.²

More LLM queries are also becoming realtime in the backend. o3, Claude 4, DeepSeek Cloud have all integrated a runtime loop where they can task-call out to search engines and read the recommended pages. Whether this search engine is built in-house or licensed from a provider, it's still pulling from pages that rank highly in the information retrieval setting.

Reasoning models are trying to emulate human activity.³ So at their best, I think about what information I seek out when I'm doing some product research online. And I notice that I gravitate mostly towards reviews. I prioritize blogs that have good taste (design, typography), a reasonable bio (skim through the about page), and don't stuff their website with tons of reviews. Second to that I'll usually check the product pages themselves for any screenshots or videos. Corporate blogs can be useful for some basic product comparisons at the feature level or for tutorials about how to get something done, but since they're so biased you can't assume any truth in the qualitative.

Seeding your product into the market by engaging with real users, and giving them something legitimately of value before they write about their experience, is probably a win for SEO and certainly a win for your product. It's just harder: or at least harder than stuffing keywords in your blog posts.⁴ But it's way more valuable.

Even as research agents keep getting better, that genuine authority can only help you. If you feel like you're gambling on a soccer game but it can spontaneously become a cricket match, you're probably off not gambling at all. Stick to doing the hard work that will withstand the test of time. That goes for SEO, GEO, or whatever acronym is in vogue right now.

Yes, if GEO is just SEO, you can hack it all the same. Avoid the urge to take the easy way out. Write useful shit and people will respond. That's good for humans and good for machines.

The bigger frontier labs have their own bots to do horizontal crawls of the web, with some large seed set. The smaller ones will use a cleaned up version of CommonCrawl and some enterprise datasets. ↢
In my own cynical opinion, most sites lean more toward the former than the latter. ↢
That's a large part of how they're trained: data collected from humans doing the same task and describing it in words. ↢
Please, please. Don't keyword stuff. ↢

Related tags:

#seo #search-engines #llms #content-strategy #ai-agents

Unfortunately SEO still matters

# June 1, 2025

👋🏼