Headfull browsers beat headless

# September 7, 2022

Twenty years ago a simple curl would open up the world. HTML markup was largely hand designed so id and name attributes were easily interpretable and parsable. Now most sites render dynamic content or use template defined class tags to define the styling of the page. To handle this richness of rendering, most production crawlers use headless browsers - at least for a part of the pipeline. Since you're running a chromium or webkit build, these should render sites exactly how users see them. In reality, headless browsers are sometimes quite different:

  1. The headless chromium build is a different executable - there are some codepaths that are only available in the full build or have different behavior in headless mode. Extensions are one example; Chrome supports them but headless does not.
  2. There are some Javascript APIs that are missing from the headless implementations, mostly with regard to viewport or other screen features.
  3. There are canvas elements that render differently - both because of missing fonts and underlying pixel driver differences.
  4. Headless browsers are harder to inspect and debug. By definition they are hidden, so they must must be debugged remotely via control tools. But because of the above points this still doesn't give a full 1:1 comparison to visible browsers.

Instead of working around the shortfalls of headless browsers, I've been building with headfull browsers recently and couldn't be happier.

Containerized Headfull

Headfull browsers are simple to use when running them locally on your desktop. In Puppeteer or Playwright, you can activate one through a headless:false parameter and use the same control code as your normal headless logic. Chromium will launch a window that behaves almost 1:1 like Chrome.

When running in the cloud within a container orchestration framework like Docker, though, headfull browsers requires more work and some finesse. The standard ubuntu-base image is intended for CLI execution so doesn't come with any of the requisite display code needed for headfull mode.

If you're interested in a pre-packaged docker solution, skip to the bottom.

Base Image

One thought for building a headfull container is whether we can build on top of an existing base. After all, Playwright ships with an image that makes it easy to get started in docker headless mode. When running in headfull mode it throws the following error:

> Looks like you launched a headed browser without having a XServer running.

> [pid=106][err] [106:106:0907/081624.714406:ERROR:ozone_platform_x11.cc(247)] Missing X server or $DISPLAY
> [pid=106][err] [106:106:0907/081624.714534:ERROR:env.cc(226)] The platform failed to initialize.  Exiting.

Installing an XServer gets past this error but throws others in turn.

In theory we could leverage this base and install the additional libraries iteratively. However one drawback of their base image is that it bundles all browsers together: chromium, webkit, and firefox. If you're only using one browser in your deployment, these additional browsers are just killing bandwidth. A larger container payload means longer bootup times in a serverless function or cluster deployment when containers have to be downloaded. Instead, you'll likely want to fork it and only install the bare minimum.

Font / Library Support

Headfull Chromium needs to access core libraries that the headless version doesn't need. These fulfill some of the system logic that I was referring to above, mostly around cursors, rendering functions, and font support. apt-get carries all of these as packages so installation is longwinded but straightforward.

$ apt-get install conf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 
    libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 
    libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libxcb1 libxcomposite1 
    libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 
    ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget 
    xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic libgbm-dev

This clears up the Chrome dependency issues that will prevent launch. Now we have to render the browser graphics somewhere.

X11 Support

Chromium needs to be launched in an environment where the graphics code can be successfully painted. After all, that's the key differentiating factor of headfull browsers. Containers don't have the notion of a window server, however, since they're just a raw shell implementation. We therefore virtualize a display through X11, which mirrors the X sever that some OS implementations are built off of.

$ apt-get install xvfb x11-apps x11-xkb-utils libx11-6 libx11-xcb1

The xvfb-run script that's bundled into xvfb makes it easy to launch a virtual display and spawn an executable that will draw into this display.

$ xvfb-run --server-args="-screen 0 1524x768x24" npm run start

VNC Support

While debugging it's often helpful to view the browser's current state, inspecting the viewport and DOM. To connect with the X11 server from your host and therefore bridge from host->container, we'll need a VNC server to be hosted inside the container. This is the most common screen sharing protocol and plays nicely with X11.

apt-get install x11vnc

The server can then be launched on the given screen, in this case display :0. This can then be accessed over 5900 when you port forward to your host.

x11vnc -display :0 -noxrecord -noxfixes -noxdamage -quiet -forever -passwd mypassword

All Together

There's a bit of nuance to stringing these dependencies together into an entrypoint. xvfb-run won't work out of the box because it launches the display, runs the application, and cleans up before yielding. This gives no time for us to launch our VNC server. It also intercepts some SIGINT and SIGTERM signals that we would rather forward to our application code.

For a pre-packaged Docker image that contains all the above, plus execution code and a more detailed getting started guide, see my headfull-chromium image. It's licensed under Apache and free for personal and commercial use.

Related tags:
#programming #projects
Webcrawling tradeoffs
A couple of years ago I built our internal crawling platform at Globality, which needed to be capable of scaling to billions of pages each crawl. The two main types of crawlers that are deployed in the wild are typically raw or headless. We ended up implementing a hybrid architecture. Hybrid crawling can make use of the strengths of both while trying to minimize their weaknesses.
The curious case of LM repetition
I was doing some OSS benchmarking over the weekend and was running into an odd issue. Some families of models would respond with near-gibberish, even with straightforward prompt inputs. This is a debugging session for LLM repetition.
Let's talk about Siri
Last weekend I spent some serious time with Siri for the first time in a couple years. A lot has changed since I last took a look. Since iOS 15, all NLU processing is done locally on device. There's a local speech-to-text model, a local natural-language-understanding module, and a local text-to-speech model. All logic appears hard-coded and baked into the current iOS version.

Hi, I'm Pierce

I write mostly about engineering, machine learning, and company building. If you want to get updated about longer essays, subscribe here.

I hate spam so I keep these infrequent - once or twice a month, maximum.