You know the old saying "you have only 15 minutes to impress someone?" On social media feeds it's more like 500 milliseconds. For my new social media product Saywhat, I set out to build a fully accurate post previewer - so you know what your post's going to look like before you hit submit.

Static typehinted languages can make us lazy about adding types at the right time. We have all the context when we start a new project, but as we increase complexity and focus on other things that context wains. Rewards compound from typing on day zero.

In addition to the core language, code LLMs also have to interplay with an ecosystem of constantly changing dependencies. These package versions constantly change in features, with different functions and syntax. What are some long-term approaches to making coding assistants more aware of the package ecosystem?

Mountaineer 0.5.0 introduced database migration support, so you can now upgrade production databases directly from the CLI. It generates SQL for you automatically instead of writing manual table migrations, and removes the need for third party packages to support the same functionality. Let's dive into the details of how we implemented the engine.

Today I'm really excited to open source a beta of Mountaineer, an integrated framework to quickly build webapps in Python and React. It's initial goals are quite humble: make it really pleasurable to design systems with these two languages.

LLMs are by definition probabilistic; for each new input, they sample from a new distribution. Even the best prompt or finetuning will minimize (but not fully resolve) the chance that they give you output you don't expect. This is unlike a traditional application API, where the surface area is known and the fields have a guaranteed structure.

§ Passthrough above all

February 15, 2024

In the Vision Pro, there's sometimes a conflict between the window's existence and your own passthrough reality. Try to place one in a room and then walk through a doorway, peeking back at the room from within the door frame. Practically speaking, it's better to keep the reality of what people are actually seeing than to keep the reality of the augmented reality.

Most of the people I know in San Francisco have used a Waymo at least once. Many friends of mine swear by them. The fact they're self driving doesn't really enter into the equation: they just prefer the product they're being offered when they're picked up.

I was doing some OSS benchmarking over the weekend and was running into an odd issue. Some families of models would respond with near-gibberish, even with straightforward prompt inputs. This is a debugging session for LLM repetition.

Extensions are basically mini web applications these days, just with access to a `chrome` global variable that can interact with some browser-level functionality. Aside from that - it's all familiar. That extends to the debugging experience. Since extensions run in the regular V8 Chrome runtime, Chrome exposes the same debugging tools that you're used to on the web.

§ Speeding up runpod

December 18, 2023

One issue I've occasionally observed on Runpod is varying runtime performance box-to-box. My working mental model of VMs is that you have full control of your allocation; if you've been granted 4 CPUs you get the ability to push 4 CPUs to the brink of capacity. Of course, the reality is a bit more murky depending on your underlying kernel and virtual machine manager, but usually this simple model works out fine.

I couldn’t write without footnotes. Or at least - I couldn't write enjoyably without them. They let you sneak in anecdotes, additional context, and maybe even a joke or two. They're the love of my writing life. For that reason, I wanted to get them closer to the content itself through inline footnotes.

In addition to forming a bulk of the foundation of modern language models, there's a ton of other data buried within Common Crawl. Incoming and external links to websites, referral codes, leaked data. If it's public on the Internet, there's a good chance CC has it somewhere within its index. Here we parse all of common crawl in a day, on the cheap.

§ The Next 10 Years

August 24, 2023

Personal notes for where we're headed over the next 10 years. While the future is never written in stone, I'm 90% sure of these outcomes. Past a decade, my confidence diminishes significantly.

flash-attention is a low level implementation of exact attention. Unlike torch, which processes attention multiplications in sequence, `flash-attention` combines the operations into a fused kernel, which can speed up execution by 85%. And since attention is such a core primitive of most modern language models, it makes for much faster training and inference across the board. It now has an install time that's just as fast.

The real breakthrough with large language models might not be exceeding human levels of performance in a discrete task. Perhaps it's enough that they can approach human level performance in a variety of tasks. There might be more whitespace in intersectional disciplines than aiming for true expert status in any one.

One of my more memorable CV lectures in college opened with a declaration: representations in machine learning are everything. With a good enough representation, everything is basically a linear classifier. Focus on the representation, not on the network.

Last weekend I spent some serious time with Siri for the first time in a couple years. A lot has changed since I last took a look. Since iOS 15, all NLU processing is done locally on device. There's a local speech-to-text model, a local natural-language-understanding module, and a local text-to-speech model. All logic appears hard-coded and baked into the current iOS version.

You can't iterate when you're building huge things. You also can't tolerate failure in the same way. You don't want a bridge constructed in a month only to fall down the year after. The bulk of the bureaucracy for infrastructure is making sure projects meet this bar of safety; safe to use, safe to be around, and safe for the environment. There's no such thing as MVP Public Infrastructure.

By virtue of their training objective, LLMs are optimized to model language and minimize the perplexity of examples. Memorization of input facts is an expected biproduct of this pipeline. General reasoning skills are the more unexpected emergent property.

If you're using SQLAlchemy as your database ORM, there's a good chance you're using Alembic to migrate across revisions. Alembic doesn't support enums out of the box. Keep enum values in code synced up with database values.

GPT-4 represents the latest leap in LLM sequence length. Doubling down on longterm dependencies might be the advance we need for real business value and machines that operate closer to humans.

Common wisdom says children explore while adults exploit. At some point, we tend to transition from one to the other - perhaps because of risk intolerance, time limitations, or sheer laziness. I learned how to ski this year, which was the first new sport I've picked up in at least a decade. Some thoughts on learning new things and throwing yourself down mountains in the process.

Most of the grpc docs use the dynamic approach - I assume for ease of getting started. The main pro to dynamic generation is faster prototyping if the underlying schema changes, since you can hot reload the server/client. But one key downside includes not being able to typehint anything during development or compilation. For production use compiling it down to static code is a must.

§ Opportunity years

February 15, 2023

The last few months have been tough for a lot of people. Layoffs, down rounds, and bankruptcies jolt the expected progression of life. Decisions that were within grasp are now no longer. Despite the environment, people in tech are more optimistic than the media might lead you to believe.

It's 2023 and once again, we are all in on AI. This is thanks in part to the cultural phenomena that is ChatGPT. Many companies are racing to deploy AI models (generative where possible) just to put it on their slide deck. Like clockwork, three years later, we've reverted back to AI. It sometimes feels like we're back in 2017.

There are a series of resolution layers governing DNS, IP, and port routing on OSX. Included are notes on the different routing utilities supported locally, specifically using /etc/hosts, ifconfig, pfctl, and /etc/resolver.

Copy and paste is ubiquitous. A topic that receives less attention, however, is the provenance of data that flows into and out of your clipboard. I often find myself going through documents that I've written or were written by colleagues. I almost inevitably have to wonder where in the world some of the data came from. A thought experiment for a copy and paste implementation that retains a history chain going back to the original source.

Practical notes for debugging more complicated training pipelines and architectures, informed by pure research and productionalizing models in industry. This guide has a bias towards debugging large language models.

A revised comparison between GPU availability for AWS and GCP. Includes some internal strategies for GCP request allocation. Updated benchmarking numbers.

It's been a month since going full time on my own thing. In some ways I'm surprised by how natural the transition has been. This is a short progress update on the first month of going independent. Finished a first launch of GrooveProxy with some progress on Popdown.

Given the pandemic's isolation of friends and friend groups, I've been thinking a lot about relationships. Which ones fulfill, which ones entertain, and which ones are resilient to strain. Why do we spend so much time talking about the past or trying to predict the future?

There's a reason why dashboards have become increasingly common over the last decade. Hearing from people with more context can immediately dissolve fears. In that way trains have a lot to do with status pages.

§ A new chapter

October 13, 2022

Last week I said goodbye to my colleagues at Globality after five years on their engineering team. It's hard to believe it's been so long. I still remember my first day perfectly - no laptop, no desk, not even a manager to greet me. I ended up writing my first PR on a personal computer in the kitchenette. I've been putting a lot of thought into what I want to focus on next. Here's my current list.

Libraries might be one of the greatest assets in modern America. They're free, have an extensive selection, provide technological support, dot cities and rural counties alike, and are often beautifully architected. Their physical spaces are also increasingly underutilized.

Cloud compute is backed by physical servers. And with the chip shortage of CPUs and GPUs those resources are more limited than ever. After encountering some reliability issues with on-demand provisioning of GPU resources on Google Cloud, I put together a benchmarking harness to test AWS vs. GCP availability.

Twenty years ago a simple curl would open up the world. HTML markup was largely hand designed so id and name attributes were easily interpretable and parsable. Now most sites render dynamic content or use template defined class tags to define the styling of the page. Building a headfull browser container to more easily deploy and debug Chromium in a remote cluster.

§ Webcrawling tradeoffs

September 6, 2022

A couple of years ago I built our internal crawling platform at Globality, which needed to be capable of scaling to billions of pages each crawl. The two main types of crawlers that are deployed in the wild are typically raw or headless. We ended up implementing a hybrid architecture. Hybrid crawling can make use of the strengths of both while trying to minimize their weaknesses.

Public transit is often framed as necessary philanthropy for cities. It cuts down on cars and pollution at the expense of convenience. If people can more efficiently get to their destination by other means, they will. This is the wrong way to look at things. For public transit to really work, it needs trust. The main KPI for a transit system has to be adherence to schedule.

I default to bare metal where I can. But recently I had to adopt a more complicated server management solution. And after a couple months of building for kubernetes, I must admit I'm falling for it more every day.

A constantly updating collection of content that I highly recommend to others. Movies, TV, and Books. Updated occasionally if something has staying power of more than 3 months.

Over the pandemic I've been able to work from a variety of places. I've vacationed to most before. In almost all cases, I've vastly preferred working there. It gives you the encouragement to do what locals do. You're way more likely to meet people who live there if you engage them where they're most likely to be doing work and living lives themselves.

There is going to be a new class of travel option: working by day and socializing by night. This model upends traditional tourist activities since it encourages a participation in local cultural life, like the working professionals that live in that city full time.

What explains the differing pay between talented people in different careers? Something is clearly lost in our typical conversation about what a salary includes.

We rely on FastText in some of our NLP microservices. Since upgrading to an M1 Macbook, these dependencies have failed to build wheels.

§ Architecting a blog

January 4, 2022

An obligatory post on blog architecture. I started focusing more on writing this year and wanted to rethink my workflow to make it a bit more frictionless. I started with the writing experience that I wanted in my IDE and moved on to the markdown compilation tooling.

§ Write where you are

January 3, 2022

Publishing has always been my bottleneck. During stints on Wordpress or Medium, I was overly focused on how articles looked that it often got in the way of what they said. This year I want to change that trend.

It's an underemphasized asset of successful engineering startups: they make developing enjoyable. More companies need to follow their lead and treat their internal teams like users. Give them a UX that they can enjoy.

Most confusion when building ML features comes at the beginning of a project. The goals are vague, the data isn’t in the expected format, or the metrics are ill-defined. This is a key place for product managers to articulate user needs in a way that machine learning researchers can translate into a well-defined research problem.

People label AI as anything and everything these days. You have search systems, you have process automation, you have spam filters. If motion activated supermarket doors were invented today, I guarantee they’d be branded AI too.

NFT's have exploded into mainstream conversation over the last few weeks. Like with everything in crypto, you have strong bulls and equally strong bears on the investment thesis. Are the principles behind NFTs anything new? And what can collector culture tell us about the investment opportunities with these new tokens?