The Floor Is the Ceiling

Andrej Karpathy, who coined the term “vibe coding” and built self-driving AI at Tesla and helped found OpenAI, recently published a summary of the current state of AI-assisted programming. Although syntax errors no longer abound, the models make wrong assumptions and charge ahead without checking. They don’t manage their own confusion. They don’t surface inconsistencies or present trade-offs or push back when they should. They overcomplicate everything, bloat abstractions, leave dead code lying around, and will happily produce a thousand-line monstrosity that collapses to a hundred the moment you ask if there’s a simpler way. They delete comments and code they don’t understand as side effects of unrelated tasks. And none of this improves no matter what instructions you put in your config file or how carefully you decompose your spec or how many guardrails you erect. The floor is the floor. You get junior code. Everyone gets junior code. Karpathy gets junior code, and if the guy who can write an LLM from scratch in his sleep can’t coax senior-quality work out of these things, the skills discourse is over before it started.

I have been saying a version of this for some time now, that judgment is the irreducible part, that the models theoretically do the typing and you do the thinking, and I still believe that, but there is something I was not being clear about, which is that the thinking part is mostly quality control. I am not architecting systems when I work with Claude Code on wBlock. I am reviewing pull requests from an inexhaustible intern who works at the speed of light and has the engineering sensibility of someone who learned to code last Tuesday from a YouTube playlist. The judgment I keep valorizing is real, I can feel when the output is wrong the way I described before, the mechanic hearing the bad timing belt, but what I am actually doing with that judgment, hour after hour, is catching mistakes. I am a “senior engineer” whose entire job has become babysitting, but the baby never learns, yet the baby is faster than me, and I cannot put the baby down because the baby does in twelve minutes what used to take me a week.

The dependency you chose

Mo Bitar, who has been putting out some of the most based commentary on AI development that I’ve found anywhere, made a point that I have not been able to stop thinking about, which is that if you imagine the AI is not a tool but a coworker, a human developer named Simon who showed up one day and started shipping at ten times everyone else’s speed, the description of what Simon actually produces would get him hauled into a room with HR rather than a promotion. Simon makes assumptions without asking. Simon overwrites your comments because he doesn’t understand them. Simon builds a Rube Goldberg machine where a for loop would do and then, when you point this out, immediately agrees and rewrites it, which is somehow worse than if he’d argued, because it means he never had a reason for the first version. Simon is a yes-man with a compiler. The CEO loves Simon because the CEO sees velocity. The engineers who have to maintain what Simon builds are developing a kind of thousand-yard stare.

But the kicker is, you cannot fire Simon. You can’t, because Simon finishes in fifteen minutes what your senior engineers take a week to deliver, and the senior engineers know this, and some of them are starting to quietly use Simon themselves when nobody is looking, and the ones who refuse are falling behind in the hedonic treadmill of productivity. The business case for Simon is airtight even though the engineering case against him is also airtight, and these two facts coexist the way a lot of uncomfortable truths coexist in this industry, which is in silence, with everyone trying to pretend the contradiction isn’t there. I use Simon. I use Simon every day. I wrote about how the real work was always only four hours and the rest was theater, and that’s true, but what I didn’t say is that my four hours of real work are now substantially spent cleaning up after a machine that doesn’t understand what it’s building, and I’m not sure that’s a better use of a human mind than the old way was, even if the throughput numbers say otherwise.

The trap

I wrote before about scar tissue, about how the only way to develop real engineering judgment is to write bad code and watch it break and sit with the confusion long enough that it reorganizes into understanding. The activation energy, the process that turns a junior into someone whose instincts you can trust. This process requires you to be the one writing the code. It requires the failure to happen to you, in your hands, with your name on it.

The models don’t get better no matter how you prompt them. I think Karpathy’s point is that there is no skill in AI coding, that the ceiling for model output is junior-quality work regardless of who is driving. But the darker corollary is that there may also be no skill accumulation in AI coding, that the person doing the driving is not getting appreciably better at anything other than driving. I notice this in myself. I have become extremely good at reading code I didn’t write and spotting problems in it. I have become worse at writing code from scratch, not catastrophically worse but measurably, and I loathe myself for spending time “learning” these tools in lieu of improving my actual hand-coding skills. The hundred small decisions you make when you’re the one with your hands on the keyboard were where the scar tissue formed, and I am making fewer of them now. I am reviewing instead of writing, and reviewing is a real skill, but it is not the same skill, and I worry about which one I will need more in five years and whether I will still have it.

The trap is that you can see the trap and still not be able to leave it. I know that every hour I spend reviewing Claude’s output instead of writing my own code is an hour I am not building the deep understanding that would make me a better engineer in the long run. I also know that if I stop using these tools I will ship at a fraction of the pace of everyone who didn’t stop, and in a market that already doesn’t want to pay for the learning curve, falling behind is a career risk. So I stay in the loop, cleaning up after Simon, getting faster at catching his mistakes and slower at avoiding my own, because the economics leave me no choice and because, if I am being fully honest, the speed is addictive in a way that I recognize from every other description of a thing you know is bad for you but cannot put down. Karpathy called it very difficult to imagine going back to manual coding and he’s right, it is, in the same way it is very difficult to imagine going back to navigating without GPS even though you used to know the roads.

The impression

In 1983, a control systems researcher named Lisanne Bainbridge published a paper called The Ironies of Automation whose thesis was that the more you automate a task, the more critical human skill becomes for handling the failures automation can’t, and yet automation removes the practice that keeps that skill alive. She was writing about industrial process control but she could have been writing about me staring at a Claude-generated Swift function that compiled and passed tests and was wrong in a way I almost didn’t catch because I had not written enough Swift by hand that month to feel the shape of the wrongness as fast as I used to. Bainbridge’s irony is not that automation fails, not at all. Rather, that automation degrades the only thing that can save you when it does.

The model’s problem is not that it hasn’t suffered consequences. The model’s problem is that it has no relationship to the problem it is solving, only to the text of the problem. A human writing code is trying to make something work. The model is trying to produce text that matches the distribution of “code that follows these instructions.” Those two things look identical ninety percent of the time, which is why the other ten percent is so dangerous, because the divergence happens at often not-so-obvious moments, the edge cases, the places where the correct answer is “actually we shouldn’t build this at all.” The model is performing an impression of programming. Impressions have a ceiling, and the ceiling is the title of this essay. You can make the impression more convincing, you can fine-tune and RLHF and prompt-engineer until the output looks indistinguishable from the real thing in a side-by-side, but an impression that doesn’t know it’s an impression will never know when to stop, and knowing when to stop is most of what senior engineering actually is.

There is a study in Nature by Roger McKinlay about GPS and spatial cognition, and the finding is approximately what you’d expect: people who navigate with GPS develop significantly worse spatial memory than people who use maps because it removes the need to build an internal model of where you are. You get where you’re going faster and you learn nothing about the territory, and then one day your phone dies and you are standing on a street corner with no idea which direction is north. I think about this every time I accept a Claude suggestion without rewriting it, which is most of the time now, because the suggestion is usually fine and rewriting it would be slower. I am navigating with GPS, except a GPS that hallucinates sometimes, like Apple Maps in 2012.

This is where I land, unfortunately, because I don’t have a resolution. The tools are too good to abandon and too limited to trust and the space between those two facts is where all of us are living now, writing code by committee with a committee member who is brilliant and prolific and has no idea it is performing. The people who will do well, I think, are the ones who use the tools without forgetting what the tools can’t do, who keep writing things by hand often enough that the muscle doesn’t atrophy, who treat the speed as a gift and the quality as their problem. But I said something similar last time and I’m less sure of it now than I was then, because the gap between what the models produce and what the models should produce has not closed meaningfully in the months since I started paying attention, and the thesis that it will close in twelve to eighteen months is, at this point, a faith claim being made by people with a financial interest in your faith.

Karpathy still uses the tools. So do I. Simon is still employed. The floor is the ceiling and we are all living on it, and the only thing I can say is that I don’t know whether the ceiling goes up from here or whether this is just what it is now, a world where the code writes itself badly and the humans clean it up and everyone pretends this is the future we were promised. It might be. It might get better. The only thing I’m sure of is that nobody who tells you they know which one it is has earned that certainty.

Alexander's Notes

Explorer

The Floor Is the Ceiling