VFF - The signal in the noise
News

Alibaba trains agents without agent training, improves performance across seven benchmarks

Read original
Share
Alibaba trains agents without agent training, improves performance across seven benchmarks

Alibaba's Qwen team released Qwen-AgentWorld, two models trained to predict environment states rather than select agent actions across seven domains including search, terminal, web, and Android. The approach addresses a fundamental constraint in agent training: production environments cannot reliably surface edge cases. Agents trained in the resulting simulator outperformed those trained only on real environments, with warm-up training on world models improving performance across seven benchmarks, including three unseen during training.

  • Alibaba released Qwen-AgentWorld, models trained to predict what environments return rather than what agents should do next
  • Covers seven domains (MCP, Search, Terminal, Software Engineering, Android, Web, OS) under a single architecture
  • Agents trained in controlled simulation outperformed those trained in real environments, e.g., MCPMark improved from 24.6 to 33.8
  • World model warm-up before agentic fine-tuning improved performance across seven benchmarks, including three never seen during training

Agent training has hit a practical ceiling: real production environments cannot inject controlled edge cases or rare failure conditions on demand. Alibaba's approach inverts the training objective to build environment simulators that expose agents to conditions they would rarely encounter naturally. This addresses a structural gap in how autonomous agents learn to handle unexpected situations.

Teams building autonomous agents at scale face diminishing returns from training on production systems alone. World model pretraining offers a path to better agent performance without requiring changes to live infrastructure. The 35B model is open-source under Apache 2.0, making the approach accessible to organizations building agent systems.

  • World modeling may become a standard pretraining stage for agent systems, shifting how teams approach autonomous agent development
  • Simulator-based training can outperform real-world training for agents, potentially reducing reliance on production data for capability development
  • Single-architecture models spanning multiple domains suggest consolidation toward unified agent foundations rather than domain-specific models

Monitor whether other labs adopt world model pretraining as a standard practice for agent training. Track whether the open-source 35B model sees adoption in production agent systems and what performance gains practitioners report. Watch for extensions of this approach to additional domains beyond the current seven.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

Robotics AI Splits Over World Models vs Language Models
TrendingNews

Robotics AI Splits Over World Models vs Language Models

The robotics industry is splitting into two competing camps over which AI approach will power the next generation of physical robots. Vision-language-action models (VLAs), derived from large language models, compete against world models, which predict physical outcomes based on video training. Recent moves by Luma and 1X to launch world model labs signal growing momentum for the latter approach, even as major figures like Elon Musk and Jensen Huang predict a robotics ChatGPT moment is near.

by Rocket Drew· The Information
Xiaomi's HarnessX Automates AI Agent Scaffolding

Xiaomi's HarnessX Automates AI Agent Scaffolding

Xiaomi researchers introduced HarnessX, a framework that autonomously improves the software scaffolding connecting large language models to their operational environments. Rather than requiring manual rewrites, HarnessX treats the harness as a modular, composable object that can adapt mid-task based on execution data. Testing showed average performance gains of 14.5% across 15 model-benchmark combinations, with smaller models like Qwen3.5-9B seeing gains up to 44% on embodied planning tasks.

by bendee983@gmail.com (Ben Dickson)· VentureBeat AI
GPT-5 Pro Helps Immunologist Crack 3-Year T Cell Mystery

GPT-5 Pro Helps Immunologist Crack 3-Year T Cell Mystery

Immunologist Derya Unutmaz used GPT-5 Pro to resolve a three-year-old mystery about T cell behavior. The AI-assisted breakthrough could accelerate research in cancer and autoimmune disease treatment. The case demonstrates how large language models can support scientific discovery in specialized fields.

· OpenAI
Trump Signs Quantum Executive Orders

Trump Signs Quantum Executive Orders

President Trump signed two executive orders on Monday focused on quantum technology development. The first order, which has circulated in draft form for months, directs federal agencies to increase research investment in quantum. The orders represent a significant policy push for quantum as a priority area, though details on implementation and funding remain limited in available reporting.

by Leo Schwartz· The Information