VFF - The signal in the noise
Research

RecursiveMAS cuts multi-agent costs by 75% with latent-space communication

Read original
Share
RecursiveMAS cuts multi-agent costs by 75% with latent-space communication

Researchers at University of Illinois Urbana-Champaign and Stanford University have developed RecursiveMAS, a framework that enables multi-agent systems to communicate through embedding space rather than text sequences. The approach achieves 2.4x faster inference, 75% reduction in token usage, and improved accuracy across code generation, medical reasoning, and search tasks while being significantly cheaper to train than standard fine-tuning methods. By treating agents as layers in a recursive system that pass latent representations rather than text, RecursiveMAS eliminates sequential bottlenecks and enables the entire system to evolve as a unified whole.

  • RecursiveMAS enables agents to communicate via latent embeddings instead of text, eliminating sequential generation bottlenecks
  • Framework achieves 2.4x speedup in inference and 75% reduction in token usage while improving accuracy across multiple domains
  • Training costs are significantly lower than standard fine-tuning or LoRA approaches, making custom multi-agent systems more scalable
  • System operates by passing continuous latent representations through agents in recursive loops, with only final output as text

Multi-agent systems face a fundamental efficiency problem: text-based communication between agents creates latency, inflates token costs, and makes training the entire system as a cohesive unit computationally prohibitive. RecursiveMAS addresses this by shifting communication to latent space, which is a meaningful step toward making multi-agent systems practical for real-world applications where cost and speed matter. This work demonstrates that architectural changes to how agents interact can yield substantial efficiency gains without sacrificing performance.

For teams building custom multi-agent systems, RecursiveMAS offers a path to lower training costs and faster inference, both critical factors in production deployment. The 75% reduction in token usage directly translates to operational cost savings, while the 2.4x speedup improves user experience and reduces infrastructure requirements. This makes sophisticated multi-agent reasoning more accessible to organizations that previously found the computational overhead prohibitive.

  • Text-based agent communication may become a legacy pattern as latent-space interaction proves more efficient, potentially reshaping how multi-agent architectures are designed
  • Training entire multi-agent systems as unified wholes becomes more feasible, enabling better co-optimization and emergent behaviors across agents
  • Cost barriers to deploying multi-agent systems lower significantly, potentially accelerating adoption in enterprise and specialized domains like medical reasoning and code generation

Monitor whether RecursiveMAS gains adoption in production systems and whether other research groups extend or improve upon the latent-space communication approach. Watch for benchmarks comparing RecursiveMAS to other multi-agent frameworks on real-world tasks, and track whether the training cost advantages hold at scale. Also observe whether this pattern influences how commercial multi-agent platforms are architected going forward.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

Anthropic Moves Into Drug Development With Claude Science
TrendingNews

Anthropic Moves Into Drug Development With Claude Science

Anthropic launched Claude Science, an AI workbench designed to consolidate scientific tools and datasets for researchers, at its 'The Briefing: AI for Science' event this week. The company framed the product around accelerating scientific discovery and healthcare development, citing existing biotech and pharma customers. Anthropic also announced it would develop drugs itself, expanding beyond its current role as an AI tool provider.

by Robert Hart· The Verge AI
Alibaba cuts agent token use 99% with smarter tool routing
TrendingNews

Alibaba cuts agent token use 99% with smarter tool routing

Alibaba researchers developed SkillWeaver, a framework that reduces token consumption by over 99% when routing AI agents to the correct tools from large libraries. The system uses a three-stage process (decompose, retrieve, compose) combined with Skill-Aware Decomposition to iteratively fetch and evaluate relevant tools rather than exposing agents to entire tool catalogs. This addresses a core challenge in enterprise AI systems where agents must orchestrate multiple tools to complete complex, multi-step workflows.

by bendee983@gmail.com (Ben Dickson)· VentureBeat AI
AI X-ray Scientist Autonomously Aligns Crystals at Synchrotron

AI X-ray Scientist Autonomously Aligns Crystals at Synchrotron

Researchers at Chen et al. have developed an AI X-ray scientist that autonomously aligns single crystals at a real synchrotron beamline, demonstrating how large language models can enable adaptive closed-loop experimentation at large-scale scientific facilities. The system operates without human intervention, representing a shift toward autonomous scientific discovery at major research infrastructure.

by Zhantao Chen· Nature Machine Intelligence
Why Every LLM Gives You the Same Answer

Why Every LLM Gives You the Same Answer

Large language models exhibit severe homogeneity in their responses to open-ended questions, converging on predictable answers across different providers. Australian startup Springboards has developed Flint, an LLM trained to generate more diverse outputs by embracing what traditional models treat as hallucinations. A November research paper won best paper at NeurIPS by documenting this phenomenon across 25 different models, finding that most responses to creative prompts cluster around identical phrases.

by Will Douglas Heaven· MIT Technology Review