The Silicon Ledger: Why AI Unit Economics Are Decoupling from Moore’s Law

The era of “AI magic” has officially met the reality of the balance sheet. For the past two years, boardrooms have been captivated by the raw capability of large language models. But as we move deeper into 2026, a colder, more analytical metric is taking center stage: the Silicon Ledger.

For CEOs and Boards, the initial promise was simple: Moore’s Law would drive the cost of intelligence to zero, and productivity would skyrocket. However, the data suggests a more complex structural shift. While the price of a single “token” of AI output has collapsed by nearly 500x since the launch of GPT-4, enterprise AI spending is not falling. It is escalating.

We are witnessing a profound decoupling. The unit cost of compute is plummeting, but the total cost of a business outcome is often rising. Understanding why this decoupling is happening is now the primary mandate for any leadership team moving beyond pilot paralysis.

The Jevons Paradox of Artificial Intelligence

The primary driver of this counter-intuitive spend is a classic economic phenomenon: the Jevons Paradox. In the 19th century, economist William Stanley Jevons observed that as steam engines became more coal-efficient, total coal consumption didn’t drop, it exploded. Greater efficiency made coal-powered applications more viable, leading to massive, widespread adoption.

AI is currently undergoing its own “Jevons moment.” As inference costs for frontier models dropped from $30 per million tokens in 2023 to roughly $0.06 today, why aren’t organizations AI costs going down? Because they stopped using simple chatbots and started building multi-step agentic loops.

In a typical 2026 enterprise workflow, a single user request no longer triggers a single model call. It triggers a “reasoning chain”: a master agent delegates tasks to specialized sub-agents, which search the web, audit internal documents, cross-reference legal templates, and verify their own work before presenting a result.

A task that once looked cheap on a token basis can become expensive once the model starts looping through retries, tool calls, and long reasoning traces. The useful unit is not the token. It is the successful outcome. In practice, leaders should think in terms of cost-of-pass: the dollars required to get to one correct answer that can actually be used. That shifts the question from token price to $ per correct result. The Board-level issue is no longer “What does this model charge per token?” It is “What does it cost this system to produce a successful outcome?”

The Inference Audit: Distinguishing Price from Efficiency

The Silicon Ledger requires a more disciplined lens than simple price per token comparison. What matters is inference efficiency: how much useful work a model completes per action, per turn, per token consumed. That is where the economics start to diverge sharply.

The first problem is the cost of correction, or what many teams discover too late as the rework tax. A low posted token price can be economically misleading if the model needs repeated prompting, longer chains of reasoning, or multiple tool-mediated retries before it reaches a correct answer. Research is now showing a clear price reversal phenomenon. In production settings, models marketed as cheap, including lightweight variants such as Gemini Flash, can end up costing 28x more than frontier models because they require up to 900% more thinking tokens and as many as 10x more interaction turns to achieve the same correct result. The raw token is cheaper. The successful outcome is not.

That is why the real unit of cost is the successful outcome, not the token. Finance teams should be measuring cost-of-pass: the total dollars required to reach one correct, usable answer. This is the right way to compare models across tasks that require retries, tool use, and orchestration. The model with the lowest API price may still have the highest cost-of-pass if it produces too much rework.

The second problem is that model efficiency is now partly geopolitical. The current U.S. lead is not just a function of larger capital budgets or more GPUs. It is increasingly a total factor productivity (TFP) advantage. Recent analysis points to a 63x training and inference efficiency gap between top U.S. systems and rivals. A large part of that gap comes from architectural choices such as Mixture-of-Experts (MoE), which activate only a subset of parameters per token instead of running the full model every time. That lowers the compute burden per inference step while preserving capability. Put differently, some systems are becoming materially smarter per unit of action.

For management teams, this changes the audit. You must understand which model reaches the right answer with the least rework. At LBZ Advisory, we advise leaders to route work based on outcome economics, not headline price. Before a human reviews anything, many systems have already accumulated hidden cost through retries, prompt expansion, reasoning loops, and tool chatter. The right model is the one that minimizes the cost-of-pass.

The Verification Layer: From Reviewer to Auditor

Even if tokens were free, the final cost floor would remain. In high-stakes work such as legal, medical, financial, engineering, a human still must carry the accountability.

Someone signs the brief, approves the diagnosis, releases the filing, or certifies the analysis. In the MIT framing, this is cH, the cost to verify. It is the final liability floor. More concretely, it is the non-negotiable human labor cost that sits underneath any high-stakes decision because the machine cannot own the legal risk.

A mortgage model can assemble the file, flag inconsistencies, and summarize borrower history in seconds. It still cannot sign a $500,000 loan. The underwriter does that, because the institution, the regulator, and the legal system still attach responsibility to a human role, not a probabilistic system. When enterprises miss this, they confuse cheap generation with cheap outcomes.

The mistake is to treat verification as line-by-line review. That simply recreates the original labor model on top of an AI system. The more scalable approach is high-leverage auditing. The human should review exceptions, inspect failure points, and validate the small set of claims or decisions that actually carry risk. That reduces the amount of human time spent acting as human middleware for a system that still cannot own the consequence of being wrong.

There are concrete levers here. Use confidence-based routing so only low-probability or high-risk outputs are escalated for human review. Use provenance trails so the reviewer can see, visually and quickly, where the model sourced its facts, clauses, or calculations. Add semantic caching so previously verified outputs, patterns, and source-grounded answers do not trigger the same human verification work over and over again. These controls do not eliminate the liability floor. They lower the amount of costly human attention required to operate above it.

Case Studies: The Winners in 2026

The companies separating from the pack are using AI to change the throughput math of the business while respecting the liability floor.

Klarna is one example. The company has reported a shift from roughly $369,000 to $1.24 million in revenue per employee, driven in part by using AI to absorb Tier-1 customer support and improve resolution speed by roughly 10x. The important detail is where the boundary sits. Klarna did not pretend every customer interaction had become low-risk. It pushed repetitive, high-volume requests into automation and kept humans on the escalations where judgment, exceptions, and customer trust still matter. That is what good implementation looks like in the real world. You automate the commodity lane and preserve scarce human attention for the moments where the business is still carrying actual downside.

TD Bank shows the same pattern in a more regulated setting. The reported reduction in mortgage review time from 15 hours to 3 minutes sounds, on first pass, like a labor elimination story. It is not. Mortgage approval still sits on top of the liability floor. A bank still needs underwriters because a machine cannot own the legal and regulatory exposure of a bad lending decision. The payoff comes through capacity recovery. If the review stack becomes dramatically faster, the institution can process more loans, reduce backlog, improve customer experience, and capture more revenue with the same specialized workforce. That is a much more serious economic story than simple headcount reduction.

EY adds a third pattern: multi-agent orchestration when the workflow itself is full of handoffs. In Finance Ops, EY has reported a 37% reduction in operational costs through multi-agent orchestration. That figure points to a structural source of waste inside large enterprises—not just labor, but coordination. Many finance organizations are still full of human glue; people reconciling systems, moving information between teams, checking whether the previous step happened correctly, and cleaning up edge cases after the fact. Multi-agent systems, when designed well, can absorb some of that process janitor work.

The Rise of Efficient AI and the New KPIs

Historically, companies competed on raw model power. That is becoming a weaker moat. The more durable advantage is outcome efficiency: the ability to produce a correct, verified result with less compute, less orchestration overhead, and less expert review time.

That requires a tighter operating system and better board-level metrics. The first is Cost per Successful Outcome (CPSO): the total cost of tokens plus human time required to reach a verified result.

The second is Effective Token Usage (ETU): accuracy divided by raw tokens used. This gives leadership a way to distinguish real reasoning efficiency from expensive verbosity.

The third is the V:E Ratio (Verification-to-Execution): the number of human minutes required per AI-generated output.

The fourth is OOI (Orchestration Overhead Indicator): the tax created by retries, reasoning loops, tool chatter, and agent-to-agent handoffs inside complex systems.

These are not academic metrics. A system with low token prices but poor ETU and high OOI is not efficient. A system with strong CPSO and a falling V:E ratio is improving in ways that matter operationally.

Leaders also need operating metrics that connect AI activity to business performance.

Revenue per Employee is now the gold-standard productivity measure because it captures whether automation is actually expanding output, not just generating internal excitement. Klarna’s move from $369,000 to $1.24 million per employee is the kind of benchmark boards pay attention to because it ties AI deployment to revenue density.

Time to Resolution (TTR) matters wherever service speed drives satisfaction, churn, or labor load; Klarna’s 10x improvement in support resolution is a concrete example of how companies track whether AI is removing friction or just adding another interface.

Capacity Recovery is especially useful in high-stakes workflows where the liability floor remains in place. TD Bank’s mortgage workflow is a good case: the point is not fewer underwriters on paper, but how much more throughput the same team can handle once review time falls from 15 hours to 3 minutes.

And in healthcare revenue-cycle work, Denial/Recovery Rates are often the sharpest signal of whether AI is improving economic outcomes. Northeast Georgia Health System, for example, has used this lens to recover $3 million, which is a far more grounded measure than generic productivity claims.

Strategic FAQ

How should a Board evaluate AI “Pilot Fatigue” in light of these economics?
Pilot fatigue often stems from a failure to account for the “Verification Tax.” Boards should stop asking if an AI pilot “works” (most do) and start asking for the “Verification-to-Execution Ratio.” If a pilot requires a human to check every single output for high-stakes accuracy, it is likely unscalable. True ROI comes from moving tasks into the “Commodity Lane”: low-stakes workflows where the token price is the only relevant cost: or restructuring high-stakes workflows so a single human can verify ten times the volume of AI-generated work.

What is the “Marginal Token Allocator” and why do we need one?
In the same way enterprises have cloud cost management teams (FinOps), they now need “Inference-Ops.” The Marginal Token Allocator is a role: or a function: that audits the total cost of an outcome. They determine whether a workflow should use a recursive, reasoning-heavy agent or a simple, one-shot prompt. They manage the “Silicon Salary” of the organization, ensuring that the explosion in token usage is actually translating into a competitive advantage rather than just filling the pockets of GPU providers.

Is the declining price of tokens a permanent structural shift?
Not necessarily. Current token prices are heavily subsidized by unprecedented capital expenditure and a fierce market share war among providers like OpenAI, Google, and Anthropic. Research suggests that while algorithmic efficiency is improving, the cost of training the next generation of models is doubling every eight months. If capital markets begin demanding immediate profitability from inference providers, we could see “Token Inflation” or the emergence of “Priority Pricing” for high-reasoning models. The smart move is to build a model-agnostic architecture now to hedge against future pricing volatility.

The Take-Aways

The economics of AI now require understanding how much does it cost to produce a correct result that a business can actually stand behind. That is why the center of gravity is shifting away from raw model power and toward outcome efficiency.

For boards and leadership teams, the implication is practical. Audit the cost-of-pass before approving scale. Measure the rework tax before assuming a cheaper model will save money. Treat cH, the human verification layer, as a permanent design constraint in any regulated or high-stakes workflow. And instrument the system with the right KPIs: CPSO, ETU, V:E, and OOI.

The next moat will belong to companies that can orchestrate AI with less waste. They will know which models are smarter per action, where retries are destroying margin, and how to move humans from line reviewers to high-leverage auditors. That is the operating standard for 2026. Efficient AI that produces verified outcomes at a lower total cost.

LBZ Advisory helps leadership teams navigate these shifts. If your organization is struggling to translate AI potential into measurable business outcomes, get in touch.

The Silicon Ledger: Why AI Unit Economics Are Decoupling from Moore’s Law

May 30, 2026

Ditch the Chatbot: How to Build an AI-Native Operating Model That Actually Acts

May 26, 2026

The Silicon Salary: Why Humans are Suddenly the Low-Cost Option

Tags & Categories

Subscribe for more

Liat Ben-Zur

Oy Gevalt! is a blog dedicated to my grandmother, Guta Gantz. An Aushwitz and Buchenwald survivor, she is not only the strongest women I've ever known, she also invented "Leaning In". As in, leaning into her grandkids to get married already! She said Oy Gevalt! a lot. For those of you non-Yiddish speakers, 'Oy Gevalt' is an expression of utmost anxiety, frustration or shock. Similar to how we might use "Good Grief!" or "OMG!" Often used while kvetching, it's a very poignant expression for any working mother of two and/or women in tech. I am both. www.linkedin.com/in/lbenzur www.twitter.com/lbenzur