NewsTrending

Alibaba cuts agent token use 99% with smarter tool routing

bendee983@gmail.com (Ben Dickson)Jul 3, 2026 · about 2 hours ago

Alibaba researchers developed SkillWeaver, a framework that reduces token consumption by over 99% when routing AI agents to the correct tools from large libraries. The system uses a three-stage process (decompose, retrieve, compose) combined with Skill-Aware Decomposition to iteratively fetch and evaluate relevant tools rather than exposing agents to entire tool catalogs. This addresses a core challenge in enterprise AI systems where agents must orchestrate multiple tools to complete complex, multi-step workflows.

TL;DR

SkillWeaver breaks complex user queries into sub-tasks, retrieves candidate tools via embedding comparison, and composes them into executable plans as directed acyclic graphs
Skill-Aware Decomposition uses a feedback loop to iteratively fetch and vet tool candidates rather than selecting tools in a single pass
Token consumption drops over 99% compared to exposing agents to entire tool libraries, while accuracy increases
The framework addresses compositional skill routing, where real-world business requests require sequencing multiple tools rather than selecting one

Why It Matters

Enterprise AI agents increasingly need to coordinate across hundreds of tools and skills to complete multi-step workflows. Exposing agents to entire tool libraries is inefficient, consumes hundreds of thousands of tokens, and overwhelms context limits. SkillWeaver's approach to iterative tool retrieval and composition directly solves this scaling bottleneck.

Business Impact

For organizations deploying AI agents in production, token efficiency directly impacts operational costs and latency. The 99% reduction in token consumption while improving accuracy makes multi-tool orchestration economically viable at scale. This enables agents to autonomously handle complex business operations like data pipeline management and report generation without manual intervention.

Key Implications

Task decomposition granularity emerges as the primary bottleneck in tool routing accuracy, shifting focus from single-tool selection to compositional planning
Iterative retrieval and vetting of tool candidates outperforms one-shot tool selection approaches, suggesting future frameworks should incorporate feedback loops
Compatibility checking between tools becomes critical as agents sequence multiple skills, requiring systems to validate inter-skill data flow

What to Watch

Monitor adoption of SkillWeaver and similar compositional routing frameworks in enterprise AI deployments. Watch for how organizations implement task decomposition strategies and whether iterative tool retrieval becomes standard practice. Track whether token efficiency gains translate to measurable cost reductions and performance improvements in production AI agent systems.

Research AI Agents AI for Business Generative AI

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Researchers at Chen et al. have developed an AI X-ray scientist that autonomously aligns single crystals at a real synchrotron beamline, demonstrating how large language models can enable adaptive closed-loop experimentation at large-scale scientific facilities. The system operates without human intervention, representing a shift toward autonomous scientific discovery at major research infrastructure.

by Zhantao Chenabout 22 hours ago· Nature Machine Intelligence

ResearchNews

Why Every LLM Gives You the Same Answer

Large language models exhibit severe homogeneity in their responses to open-ended questions, converging on predictable answers across different providers. Australian startup Springboards has developed Flint, an LLM trained to generate more diverse outputs by embracing what traditional models treat as hallucinations. A November research paper won best paper at NeurIPS by documenting this phenomenon across 25 different models, finding that most responses to creative prompts cluster around identical phrases.

by Will Douglas Heaven2 days ago· MIT Technology Review

ResearchTrendingNews

NVIDIA BioNeMo Integrates with Claude Science for Accelerated Life Sciences Research

Anthropic announced Claude Science, an AI workbench for scientific research that integrates with NVIDIA's BioNeMo Agent Toolkit to enable researchers to run computational workflows through natural language commands. The toolkit packages NVIDIA-accelerated capabilities as callable skills, allowing Claude Science agents to select appropriate tools, prepare inputs, and execute life sciences workflows while connecting to NVIDIA compute resources. Eighteen of the top 20 pharmaceutical companies currently use NVIDIA BioNeMo across drug discovery, genomics, and protein engineering applications.

by Anthony Costa2 days ago· NVIDIA Blog (AI)

ResearchTrendingNews

OpenAI Launches GeneBench-Pro for AI Genomics Testing

OpenAI has introduced GeneBench-Pro, a new benchmark designed to measure AI performance on genomics, biology, and scientific research tasks using complex, real-world datasets. The benchmark provides a standardized testing framework for evaluating how well AI systems handle domain-specific scientific challenges. This represents an effort to establish measurable standards for AI capability assessment in life sciences applications.

2 days ago· OpenAI

Alibaba cuts agent token use 99% with smarter tool routing

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Subscribe to the newsletter

AI X-ray Scientist Autonomously Aligns Crystals at Synchrotron

Why Every LLM Gives You the Same Answer

NVIDIA BioNeMo Integrates with Claude Science for Accelerated Life Sciences Research

OpenAI Launches GeneBench-Pro for AI Genomics Testing

Related stories

AI X-ray Scientist Autonomously Aligns Crystals at Synchrotron

Why Every LLM Gives You the Same Answer

NVIDIA BioNeMo Integrates with Claude Science for Accelerated Life Sciences Research

OpenAI Launches GeneBench-Pro for AI Genomics Testing