AI Is Sprinting, The Rest of the World Is Still Tying Its Shoes

By Faraz Khan April 29, 2026 5 min read

Stay connected with BizTech Community—follow us on Instagram and Facebook for the latest news and reviews delivered straight to you.

Stanford University’s Institute for Human-Centred Artificial Intelligence publishes a report that cuts through the promotional noise of product launches and earnings calls to ask what is actually happening to this technology. The 2026 edition, released this week, delivers a finding that is simultaneously remarkable and unsettling. AI is not plateauing. It is not entering a consolidation phase. It is accelerating — and the institutions designed to understand and govern it are being left behind.

The numbers are striking enough to warrant extended attention. On Humanity’s Last Exam — a benchmark constructed from questions contributed by subject-matter experts, designed to represent the most difficult problems across scientific disciplines — the top-scoring model answered just 8.8 per cent of questions correctly in 2025. By April 2026, the leading models from Anthropic and Google are surpassing 50 per cent. On SWE-bench Verified, a software engineering evaluation, top scores leapt from around 60 per cent in 2024 to near-perfect performance in 2025. As Yolanda Gil, a computer scientist at the University of Southern California who co-authored the report, put it with admirable candour: “I am stunned that this technology continues to improve, and it’s just not plateauing in any way.”

The Race at the Top

Perhaps the most geopolitically consequential finding in this year’s Index is the near-elimination of the performance gap between American and Chinese AI models. In early 2023, OpenAI commanded a commanding lead with ChatGPT. That lead eroded through 2024 as Google and Anthropic released competitive systems. By February 2025, China’s DeepSeek R1 briefly matched the top American model. As of March 2026, Anthropic leads the Arena rankings — a community-driven platform that measures real-world model performance —, but it is trailed closely by xAI, Google, and OpenAI. Chinese models from DeepSeek and Alibaba lag only modestly. With frontier models now separated by margins thin enough to dissolve in a single training run, the competition has shifted to cost, reliability, and practical usefulness rather than raw capability.

The two powers, however, retain different structural advantages. The United States hosts an estimated 5,427 AI data centres — more than ten times the count of any other country — and commands a decisive lead in capital deployment and model performance. China leads in AI research publications, patent filings, and robotics. Neither set of advantages is permanent, and the pace at which both sides are investing suggests that both understand the stakes.

What is troubling about this competitive dynamic is not the competition itself but the opacity that surrounds it. As the Index notes, leading companies, including OpenAI, Anthropic, and Google, have ceased disclosing their training code, parameter counts, and dataset sizes. The independent researchers best positioned to study AI safety and model behaviour are operating without the information required to do so rigorously. The race is being run in the dark.

The Cost Nobody Is Pricing In

The environmental mathematics of the current AI buildout deserves more scrutiny than it typically receives in the business press. Stanford’s Index reports that AI data centres worldwide can now draw 29.6 gigawatts of power — enough to run the entire state of New York at peak demand. Annual water consumption from operating OpenAI’s GPT-4 alone may exceed the drinking needs of 1.2 million people. These are not distant projections; they are present-tense operational figures.

The energy profile of AI is, in other words, no longer an abstract sustainability concern. It is a supply constraint, a grid management problem, and increasingly a political liability. Across the United States, local governments have begun rejecting data centre permits. At least sixteen major projects worth a combined $64 billion have been blocked or delayed. The communities hosting the infrastructure of the AI economy are beginning to calculate what they are receiving in return — and finding the balance unfavourable.

When Benchmarks Stop Answering the Real Questions

The deeper epistemological problem identified by Stanford’s report may be the most underappreciated finding in its 400-plus pages. The benchmarks used to measure AI capability are being outpaced by the models themselves, while remaining poorly matched to the practical contexts in which the technology is actually being deployed.

As one of the report’s co-authors observes, knowing that a model achieves 75 per cent accuracy on a legal reasoning benchmark says very little about how it would perform in an actual law practice. The same logic applies across medicine, finance, and engineering. The evaluations we have were built for a prior era of AI; the AI we are deploying has moved far beyond them.

This is not merely a technical inconvenience. It is a governance gap with material consequences. Regulators are being asked to assess risks they cannot measure, using frameworks designed for systems that no longer represent the frontier. Companies are deploying agents — AI systems that execute multi-step tasks autonomously across software environments — without established standards for when they fail and why.

Stanford’s Index does not offer easy remedies for these gaps. What it does offer is clarity about the scale of the challenge. AI, it turns out, is not a problem that any single policy cycle, benchmark suite, or regulatory framework will resolve. It is a permanent condition of accelerating change — one that demands institutions capable of keeping pace. We do not yet have those institutions. Building them, before the next generation of models makes the task still harder, is now among the most consequential challenges in technology policy.

Faraz Khan

Faraz Khan is a freelance journalist and lecturer with a Master’s in Political Science, offering expert analysis on international affairs through his columns and blog. His insightful content provides valuable perspectives to a global audience.

210 articles

The Race at the Top

The Cost Nobody Is Pricing In

When Benchmarks Stop Answering the Real Questions

Faraz Khan

Related Articles

COIN is the first Crypto Exchange Ecosystem in the world to be listed on the Indonesia Stock Exchange

Venture Capitalists Predict AI Startup Shakeout in 2026 Amid Market Maturation

Human Trafficking Networks in ASEAN Bring in $82 Billion in Illegal Money

Bhutan Revealed to Have Bitcoin Worth USD 780 Million, Surpassing El Salvador

Cardano Token Drops 4%, Impact of its X Account Being Hacked for Promoting Scam Tokens

Standard Chartered Predict for tokenized RWA will reach $2 trillion by 2028