Without a constraint, AI’s energy bill may never shrink

Software efficiency is the only remaining route to a lower AI energy bill, absent a hardware breakthrough or an institutional constraint on demand. Cheaper compute generates the demand that consumes it.

Without a constraint, AI’s energy bill may never shrink

The path a reasoning model takes to an answer traces a damped wave. Overshoot, correct, overshoot, correct again, and so on. Each reasoning pass swings the prediction past where it should land, dampening as the model converges. The amplitude is high — the peaks and valleys are jagged — and the period is long, settling as the model gets closer to an output. The architecture treats convergence as something to be approximated through iteration rather than computed in a single stroke. The center line is the correct answer; everything above and below it is the model getting there by being wrong. That is by design, and it is how these models reach good responses to prompts where the answer has to be constructed, not retrieved.

These models are inefficient at being effective. Producing the wrong answers needed to arrive at the right one takes real work. And the resource cost — the energy that the oscillations consume on their way to center — is harder to see and even harder to constrain. That cost lands on hardware that has spent decades being pushed against its physical limits: how many transistors fit on a chip, how much heat a system can dissipate, how densely components can be packed before quantum effects start to interfere. The paradigm that produced these models may not continue indefinitely on the architecture that has carried it this far.

Quantum computing is the cleanest alternative, but how close it is remains hard to know. If hardware is nearing its limits and exotic alternatives are uncertain, software efficiency is the only route to a lower energy bill within the architecture that exists today. Either the oscillations dampen, or the energy bill outgrows what hardware can deliver.

But in resource-using systems, efficiency gains have a long history of producing the opposite. Wider highways do not reduce traffic; they generate trips that did not exist before. Cheaper electricity surfaces uses that were too expensive at the prior price. More efficient engines shift the cost-benefit of driving, and people drive more. The same pattern applies to AI compute. Every reduction in the per-query energy cost expands the range of queries being run. A query that was too expensive to execute today becomes viable tomorrow. A use case that did not pencil out at the current cost-per-token becomes inevitable. Each efficiency gain finds new demand to fill the headroom it created. Software efficiency will not lower the energy bill.

In the older cases, something eventually pushes back. Highway expansion runs into zoning, congestion pricing, transit investment, and the political constituencies that organize around them. Electricity consumption is shaped by rate structures, efficiency standards, and grid operators with regulatory mandates. AI compute has no comparable layer of constraint. No agency has jurisdiction over training runs or inference at scale. No standard measures the energy cost of a model or the water footprint of a query. No trade body sets norms, and no political constituency has organized around the resource consumption.

Software efficiency improves the per-query cost, but cheaper compute generates demand that consumes the new capacity. Whether that cost grows on a trajectory the planet can carry is a question the current institutional setup cannot answer.


Related: The Clock Mismatch in Enterprise Legal Innovation · What a Vendor's Posture Toward Services Reveals