Judgment Is Not a Workflow

People keep asking me how I use AI to build wBlock, especially since I wrote about the bug that got me thinking about all of this, and I keep not wanting to answer, not because the answer is proprietary but because the answer is boring and also slightly humiliating once you realize that most of what you thought of as your craft was a procedure a jippity can or soon will be able to pick up. Here is the answer. For the small stuff, bug fixes, filter list updates, the sort of implementation paraphernalia that used to take up the majority of my development time, I barely write code unassisted anymore. Claude Code with a workflow framework I happened across recently called GSD handles most of it. I also tried OpenAI Codex with similar results. I describe a problem in English, and something that is not me writes the fix, runs the tests, writes more tests as needed, and commits it. These days, I write code maybe thirty percent of the time I spend developing wBlock now. The other seventy percent is steering, catching the model when it hallucinates an API that doesn’t exist, or when it misreads what Safari’s content blocker extensions can actually do in their sandboxed environment, or when it solves the problem it thinks I described instead of the problem I actually have. I am becoming less a programmer and more a shepherd of machine intent, and the sheep are fast but periodically suicidal, and the pasture is an Xcode project with a few hundred thousand users depending on the fences staying up. This is not the answer people want. What people want is a numbered list, a replicable method, and what I have instead is the unmarketable advice that the system is to know enough about what you’re building that you can feel when the machine is confidently bullshitting you. Someone asked me last week if I had a specific prompting technique for getting better code out of Claude and I didn’t know what to say, because the honest answer is something like “I spent years understanding how Safari content blocking extensions work and now I can tell when something is off the way a mechanic can hear a bad timing belt,” which is true but not helpful. The distinction between someone who can use these tools well and someone who is still figuring it out is not a gap in prompt engineering or IDE configuration but rather a gap in judgment, and judgment is not a workflow.

The other seven hours

C. Northcote Parkinson published a satirical essay in The Economist in 1955 whose thesis was, and I quote, “Work expands so as to fill the time available for its completion.” He was making fun of the British civil service, specifically the observation that the number of Admiralty officials kept increasing even as the number of ships in the Royal Navy declined, and the whole thing was meant to be a joke. Seventy years later it couldn’t ring more true. I keep coming back to it because AI has inadvertently become the control group in an experiment nobody designed: if you hand a machine the same task that took a human eight hours, and the machine finishes in twelve minutes, what exactly were the other seven hours and forty-eight minutes? Along with arguably slower inference times in our brain, the true answer is that they were the time it took a human being to sit down, open a laptop, check Slack, attend three meetings that could have been one email, lose the thread of what they were doing, check Reddit, feel guilty about checking Reddit, attend a standup that existed so a manager could feel like managing was happening, context-switch twice, stare at a function for twenty minutes because their brain had simply refused to engage, and then finally, in the last ninety minutes before a deadline, do the actual cognitive work that the task required all along. We built an entire civilization of eight-hour workdays around the implicit assumption that knowledge work requires eight hours, and what AI is implicitly showing us is that most of it never did. The work expanded to fill the time because the time was there and the human brain is pathologically incapable of leaving slack unfilled, and now the time is collapsing, and we are left blinking in the daylight trying to figure out what we were actually doing in those buildings all day. I find this more funny than depressing, though I realize that is easy to say from a dorm room.

I’m not saying this to be smug about it. I waste those hours too, all of us do. In fact, why am I not completing problem sets right now rather than writing this soliloquy? I could be getting ahead on the next week’s work. But the research on how much real cognitive work a person can actually do is not ambiguous. Anders Ericsson’s work on deliberate practice, the same research that Gladwell bowdlerized into the ten-thousand-hour rule, shows that elite performers in cognitively demanding fields top out at about four hours of actual locked-in work per day. Everything after that is overhead. Four hours. That is what we were doing in those buildings. And if we were only ever doing four hours of real thinking in an eight-hour day, then what AI is taking from us is not the thinking but the six hours of guilt and theater that surrounded it, and I’m not sure that’s a loss worth mourning. The eight-hour workday was not formed from any empirical study of human cognitive capacity. Ford standardized it in 1914 as a factory policy because it kept his assembly lines running smoothly, and it stuck around because it was measurable, and what is measurable is what gets managed, and what gets managed is what gets enforced, and nobody along that chain ever stopped to ask whether a programmer debugging a race condition and a line worker bolting on fenders had the same optimal cadence. The backbone of employment is to min-max our productivity for maximal pay, and everyone involved knows this yet nobody says it, because saying it would require redesigning the way we organize human effort, and apparently we would rather just keep existing the flawed way we do.

Doctors of philosophy

If the real work was always those four hours of actual cognition and everything else was theater, then the question that starts to matter is what kind of cognition those four hours actually consist of. The Association of American Medical Colleges publishes numbers every year on which undergraduate majors produce the most successful medical school applicants, and most people assume it’s biology or biochemistry or some other major whose course catalog overlaps visibly with the MCAT, and admittedly to my own shock, it isn’t. According to AAMC’s own data, humanities majors get accepted at about fifty-two percent, biological sciences at forty-three, specialized health sciences at forty, with some insignificant year-over-year fluctuations. Philosophy falls under the humanities umbrella there, and has historically landed at or near the top within it, which seems strange until you think about what philosophy actually is, not a body of knowledge but the practice of thinking about thinking, of asking whether the question itself is any good before trying to answer it. These were essentially useless skills for most of the last century and quite frankly not-so-recent history, because if your job is to run a known process on a known input then the ability to question whether the process is any good just slows you down. But we are not in that world anymore, or rather, we are watching it leave, because machines can run known procedures faster than any human ever could, and what’s left is the stuff that was never procedural to begin with, something closer to judgment, or taste, or whatever you want to call the feeling that something is wrong before you can say why, the same tacit knowing that keeps human traders employed despite every incentive to replace them.

I think about this whenever I’m steering Claude through a wBlock fix and it produces something that is technically correct and somehow still wrong, where the syntax compiles and the tests pass and I look at it and know, in a way I could not explain to the model and can barely explain to myself, that it doesn’t fit. That recognition is closer to what a good editor does when they read a sentence that is grammatically perfect and structurally sound and still cut it, because it doesn’t belong, because the paragraph breathes better without it, and there’s a logic to the whole that the parts don’t individually contain. That kind of knowledge, the knowledge of wholes and contexts and why rather than how, is what philosophy actually trains and what AI cannot replicate, because AI, at least the LLMs which constitute the majority of VC investment, is a next-token predictor and the whole point of judgment is that the next token is sometimes wrong even when it’s the most probable one. The pre-med kids who majored in philosophy didn’t outperform the biology majors because they knew more but because they’d spent four years learning to doubt whatever framework they were standing inside of, which happens to be what the MCAT’s critical analysis section tests and exactly what working alongside AI demands. The more I think about it, the more I believe people who do well in the next decade will be the ones who can look at what the machine produced and tell you whether it should exist, regardless of how fast they type or how well they prompt, and that has always been a philosophical skill, we just never needed it when the expensive part was getting things built, and getting things built is not the expensive part anymore. I find this somewhat frightening but overwhelmingly comforting, actually. The thing that remains when you cast the procedure away is just thinking, which we never valued because it didn’t look like work, and now it’s the only part that’s left.

The hoo, the howdy

It is a snowy Saturday afternoon in mid-February 2026 and I am probably not qualified to be making claims about the future of human cognition, but I am doing it anyway because I’ve spent many hours steering a machine through work I used to do completely by myself and feeling a way about it that I can’t quite name. The strange part is that I’m more productive than I’ve ever been, which should feel good, and instead what I mostly feel is confused about what I’m actually contributing, because when the machine writes the code and runs the tests and makes the commit, what’s left is the part where I sit there and decide whether it got the right idea, and I don’t know what to call that job or how to explain to anyone why it’s hard, or if it even is still hard… Nobody taught me how to do this. Maybe it’s not even teachable, not as a course or a method, just as the thing that happens when you think about something long enough that you start to notice when it’s wrong before you can say why, which is what the philosophy people were practicing all along, not because they had the most practical diploma but because they were doing the right kind of thinking. I don’t know how to transfer that to anyone. It is just the downstream effect of having cared about something for a long time, and it turns out that what we thought was downstream, the judgment, the sense that something doesn’t belong, was actually the source all along, and everything upstream of it, the typing and the debugging and the eight hours in the chair, was just the water finding its way there. Philosophy departments have somehow been producing the best thinkers in the world for decades and nobody cared because thinking didn’t look like work. And here we are, in 2026, discovering that the filler is automatable and the thinking is not, and none of us saw it coming, which is itself a pretty good indication that we were not, in fact, doing nearly as much thinking as we thought we were.

Alexander's Notes

Explorer

Judgment Is Not a Workflow