News

Automated LLM reasoning cuts token costs by 70 percent

bendee983@gmail.com (Ben Dickson)May 29, 2026 · about 2 months ago

Researchers from Meta, Google, and universities have developed AutoTTS, a framework that automatically discovers optimal test-time scaling strategies for large language models. Rather than relying on manually crafted heuristics, AutoTTS uses an explorer LLM to algorithmically search for resource-allocation policies. In trials, the approach reduced token consumption by up to 69.5% without sacrificing accuracy, offering enterprises a way to lower inference costs.

TL;DR

AutoTTS automates the design of test-time scaling strategies, replacing manual human-crafted heuristics with algorithmic search
The framework achieved 69.5% token reduction in experimental trials while maintaining model accuracy
An explorer LLM iteratively proposes and refines computational budget allocation policies within a defined control space
The approach shifts engineer focus from strategy design to defining the discovery environment, boundaries, and optimization objectives

Why It Matters

Test-time scaling improves LLM performance by allocating extra compute at inference time, but current strategies are manually designed and suboptimal. AutoTTS breaks this bottleneck by automating strategy discovery, potentially unlocking significant efficiency gains across the width-depth control space that human intuition has left unexplored. This matters because inference costs are a major operational constraint for deploying advanced reasoning models at scale.

Business Impact

For enterprises running LLMs in production, inference costs directly impact margins and deployment viability. A 69.5% reduction in token usage translates directly to lower operational expenses without requiring manual tuning of heuristics. This automation enables dynamic optimization of compute allocation across different workloads and models without ongoing human engineering effort.

Key Implications

Manual strategy design for test-time scaling may become obsolete as automated discovery proves more effective and scalable
Organizations can achieve significant cost reductions in LLM inference without sacrificing accuracy, improving the business case for reasoning-heavy applications
The shift from human-crafted rules to algorithmic search opens a much larger strategy space, potentially yielding further optimization gains beyond current methods

What to Watch

Monitor whether AutoTTS generalizes across different model architectures, domains, and inference budgets in production environments. Watch for adoption by major cloud providers and whether competing frameworks emerge with similar automation capabilities. Track whether the 69.5% token reduction holds up at scale and whether the approach becomes standard practice in LLM deployment pipelines.

Research LLMs AI for Business Infrastructure

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

PsiQuantum, a UK-founded quantum computing startup, is building a photonic quantum computer designed to solve problems current machines would take millions of years to address. The company has raised $1 billion, is constructing facilities in Chicago and Australia, and is one of only two firms (alongside Microsoft) to reach the third stage of a government quantum evaluation program. Its claims are bold, from reducing drug development timelines to four minutes, but the company now faces a critical prove-it moment as it approaches commercialization.

by James O'Donnellabout 11 hours ago· MIT Technology Review

ResearchTrendingNews

X Square Robot Proposes Integrated Stack as Recipe for General-Purpose Robots

X Square Robot, a Chinese embodied-AI company, proposes an integrated software stack as the foundational recipe for general-purpose robots, combining data collection, world models, and action models rather than assembling separate perception and control systems. The company emphasizes data quality over scale, using a wearable rig for human demonstrations with physical validation on real robots, achieving performance comparable to all-robot datasets at roughly 20-fold lower collection cost. This approach challenges the field's lack of consensus on how to build robots with transferable intelligence across tasks and machines.

by X Square Robot1 day ago· IEEE Spectrum AI

ResearchNews

Multi-Model AI Systems Fail More Often Than Enterprises Realize

A study of 67 frontier models from 21 providers reveals that enterprises using multiple AI models significantly underestimate failure rates by 2.25x due to a phenomenon called the co-failure ceiling. The research shows that combining diverse models based on low pairwise error correlation does not reliably improve performance, and in some cases can degrade it when models have unequal capabilities. Developers are investing in complex routing infrastructure and multi-model orchestration that often fails to deliver promised safety benefits.

by bendee983@gmail.com (Ben Dickson)4 days ago· VentureBeat AI

ResearchNews

69% of Enterprises Deploy AI Agents With Shared Credentials

VentureBeat research of 107 enterprises found that 69% run AI agents with shared API keys, a critical security gap where a single compromised agent gains access to all permissions tied to that credential. The finding has triggered a $22 billion acquisition spree by Palo Alto Networks, CrowdStrike, and Cisco targeting non-human identity management. Only 32% of enterprises give each AI agent its own scoped identity, leaving the majority exposed to lateral movement and forensic blind spots.

by louiswcolumbus@gmail.com (Louis Columbus)4 days ago· VentureBeat AI

Automated LLM reasoning cuts token costs by 70 percent

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Subscribe to the newsletter

PsiQuantum's Quantum Bet: From Lab to Commercial Reality

X Square Robot Proposes Integrated Stack as Recipe for General-Purpose Robots

Multi-Model AI Systems Fail More Often Than Enterprises Realize

69% of Enterprises Deploy AI Agents With Shared Credentials

Related stories

PsiQuantum's Quantum Bet: From Lab to Commercial Reality

X Square Robot Proposes Integrated Stack as Recipe for General-Purpose Robots

Multi-Model AI Systems Fail More Often Than Enterprises Realize

69% of Enterprises Deploy AI Agents With Shared Credentials