~ ~ ~

Passthrough above all

# February 15, 2024

Being in VR in the Vision Pro is a composite of two different realities. One is the spacial background (either your actual surroundings or the simulated environments). The other is the window context that shows all your apps, the menu bar, and system functionality. Windows are linked to a particular location in space. If you put one in a room, walk out of the room, and walk back, they'll be exactly where you put them. I haven't tried it, but my hunch is if you leave the country and come back, they'll also still be in the same place. The mapping of physical space is incredibly impressive.

As a result of being able to position these windows however you like, and view them from any angle, there's sometimes a conflict between the window's existence and your own passthrough reality. Try to place one in a room and then walk through a doorway, peeking back at the room from within the door frame. The window will typically remain un-rendered, even if you can see a partial location of where it should be. Practically speaking, it's better to keep the reality of what people are actually seeing1 than to keep the reality of the augmented reality.

I have a hunch there's an explicit tradeoff codified into VisionOS: Prioritize passthrough visuals, even at the tradeoff of breaking the reality of the window frames.

This is even more acute in hand tracking. Vision Pro overlays a cut-out vision of your hands over the screens that you have displayed. This is a harder task than just the windows with passthrough, since now you have a foreground, alongside the mid-ground, and background. The OS tries to dynamically cut out just your hands to extract them to a higher layer. This works well the majority of the time, but sometimes you see flickers of the scene around you, surrounding your hands as they move. It looks like a ghost hand at the edges, with a fuzzy fade out of reality. As far as I've seen, it always captures more hands than less.2

It's an interesting case of technology interacting with an environment that it can't completely control. Through a screen, the OS can manage everything for a cohesive user operating experience. When you're rendering in spaces you haven't seen before, you need a clear hierarchy of priorities. In the case of VisionOS, that seems to be maintaining reality. I think that's the right tradeoff.


  1. Or think that they're seeing, I suppose. 

  2. I'm assuming when it's in doubt of the right bounding box to draw, it decreases its decision threshold. Once again, better to break the reality of the screens a bit than to take a chunk out of your hands. 

Stay in Touch

I write mostly about engineering, machine learning, and company building. If you want to get updated about longer essays, subscribe here.

I hate spam so I keep these infrequent - once or twice a month, maximum.