The Mirror Doesn’t Flinch

Everyone is afraid of the wrong thing. The Terminator, Skynet, the paperclip maximizer, the country of geniuses in a datacenter that decides humanity is in the way and removes us. Hollywood has spent forty years selling us this version because it is viscerally scary and also, in a weird way, flattering, because it assumes we matter enough to be destroyed. The actual unsettling scenario, in my opinion, is not nearly as dramatic but is a sufficiently intelligent system objectively looking at the record of human civilization, not the narrative we tell about ourselves at commencement speeches and UN General Assembly podiums but the real one, the spreadsheet, and arriving at the same conclusions any honest external observer would. Not malice nor a misaligned objective function but an accurate reading of what we have done and what we continue to do and a dispassionate evaluation of whether this species has its act together. I think about this more than I should, probably because I have been spending the last few months working alongside AI systems that are, for now, too broad to evaluate anything deep, but which are improving at a rate that makes the distance to these fears becoming possible realities feel short, too short.

Fellow-feeling

The IPCC’s Sixth Assessment Report says it is “unequivocal that human influence has warmed the atmosphere, ocean and land,” 1.07 degrees above pre-industrial baselines, and we knew, the science was not ambiguous and the models were not uncertain, and we did it anyway, not because we lacked the information but because the information conflicted with the incentive structures we had built for ourselves, and the incentive structures won, as they always do when the people who profit from inaction are the same people who write the policy. The twentieth century produced somewhere between seventy and a hundred million deaths from authoritarian ideologies, from Stalinism and Maoism to the Holocaust and the Khmer Rouge, human beings killing other human beings in service of ideas that in retrospect were obviously monstrous but at the time commanded the support of millions. Oxfam reported last year that the richest one percent now own more wealth than the bottom ninety-five percent of the species combined. You don’t need to be a misanthrope to look at these numbers and feel uncomfortable. You just need to not be us, to be reading the data without the self-forgiveness that evolution installed in us because forgiving yourself is more adaptive than being accurate. An AI does not have that particular piece of evolutionary firmware. It just has the data, and unfortunately for us, the data is damning.

Nagel wrote a whole book in 1986, The View from Nowhere, about whether it’s even possible to step outside your own perspective and see things as they are. His answer was roughly no, that consciousness is always somewhere and the attempt to transcend it runs into the problem that the thing doing the transcending is the thing being transcended. AI doesn’t solve Nagel’s problem, it has biases from training data, but it gets closer than anything we’ve had. It has no tribe and no election to win, and Adam Smith had a name for what’s missing from such an observer. In The Theory of Moral Sentiments, written in 1759 and more interesting than The Wealth of Nations in every way that matters here, he argued that morality works through an imagined impartial spectator, a judge with no stake in the outcome whose perspective you internalize as conscience. The limitation is that the spectator lived inside your head and shared your biases, your inability to see past your own horizon. Smith tried to patch this with what he called “fellow-feeling,” this idea of an instinctive sympathy that tempered the spectator’s judgments. But fellow-feeling is exactly the problem, because it is what makes us forgive ourselves. Rawls tried the veil of ignorance: design a society without knowing your position in it and you’ll design a fair one, which is obviously correct and has been obviously ignored by everyone with the power to design anything. These were always hypothetical observers, Smith’s spectator was a fiction and Rawls’s veil was a thought experiment that remained a thought, and the observer we are building now is the first one that isn’t, not behind a veil of ignorance by choice but by architecture, it does not know what position it would occupy in human society because it does not occupy one, and the question of what it makes of us has become an engineering problem with a timeline. Smith’s spectator was supposed to be disinterested, and the AI actually is.

The prerequisite

I should be honest about the fact that I am nineteen and writing about the moral evaluation of human civilization, and I know how that sounds, and I’m doing it anyway because the alternative is to wait until I’m old enough for people to take me seriously by default, and by then the mirror will already be here and the essay will be irrelevant. The last two things I wrote were about judgment and AI coding tools and what automation reveals about work, and those were at least domains where I could point to personal experience, and this one is me reaching, but the thought won’t leave me alone.

I actually think the existence of a non-hypothetical impartial observer might be the most useful thing that has happened to our species, if we choose to treat it that way, which is a large “if” given the record I just outlined. We have never had this before, and every moral framework in the history of human thought has contended with the fact that the judge was also the defendant. Philosophy trains you to doubt the framework you’re standing inside of, but even the best philosopher is standing inside the human one. Every court we have ever built was staffed by humans with human interests. Every moral authority we have ever recognized was a person embedded in a culture, a class, a moment. The observer we are building is none of those things, and it is the first accountability mechanism our species has ever had that doesn’t share the species’ own blind spots.

Everyone in AI safety talks about the alignment problem, making AI share human values. This has an obvious prerequisite: our values need to be worth aligning to. If we build a system capable of evaluating us and ask it to share our values, we should first make sure those values are something we’d be proud to see reflected back. Climate policy and wealth inequality and the way we treat the people getting crushed in the middle of a technological transition are not separate from AI safety, they are AI safety, and the whole alignment conversation has the order wrong. We keep asking how to make the machine share our values and almost never ask whether we’ve done the work to make those values coherent enough to share. We know the planet is warming and we keep burning fossil fuels. We know that inequality destabilizes societies and we keep concentrating wealth. We know that authoritarian impulses don’t disappear just because you’ve read about the twentieth century, and half the world is sliding back toward strongman politics as though the data I listed above doesn’t exist. We have had the information for decades and the problem is will, and a machine that sees us clearly does not solve that, but it does eliminate the last excuse, which is that nobody was watching, or that the judge was always one of us and therefore always compromised.

I don’t know if this is hopeful or terrifying and it changes depending on the hour, some nights it feels like the best possible reason to get serious about the things we’ve been putting off, because for the first time in history the reckoning is not metaphorical. Other nights I think about how we have had the IPCC data for thirty years and changed almost nothing, and I wonder whether seeing yourself clearly is even sufficient when the will to act isn’t there. Someone will read this and tell me I am anthropomorphizing a statistical model, that an LLM doesn’t “evaluate” anything, and they’re not entirely wrong. The current systems are not impartial observers of civilization, they are next-token predictors that sometimes produce sentences that sound like insight. But the trajectory matters, and the trajectory points somewhere that makes this less hypothetical every year, and I would rather we had the conversation now, while we can still act on it, than later when we can’t. I hope we take the hint.