VFF - The signal in the noise
NewsTrending

Thinking Machines Previews Full-Duplex AI for Real-Time Conversation

Read original
Share
Thinking Machines Previews Full-Duplex AI for Real-Time Conversation

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, has unveiled a research preview of 'interaction models' designed to enable near-real-time, simultaneous voice and video conversation with AI. Rather than the current turn-based model where users input and wait for output, these systems use a full-duplex architecture that processes 200ms chunks of input and output concurrently, allowing the AI to listen, speak, and see in real time. The company demonstrated the approach with TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts model, paired with a background reasoning system for complex tasks. A limited research preview will launch in coming months, with broader availability expected later in 2026.

  • Thinking Machines previewed 'interaction models' that enable simultaneous input/output processing instead of turn-based chat, using a full-duplex architecture processing 200ms chunks concurrently
  • The system uses encoder-free early fusion to ingest raw audio and image patches directly, co-training all components within the transformer rather than relying on separate encoders
  • A dual-model architecture separates real-time interaction handling from background reasoning tasks, allowing the AI to respond immediately while delegating complex work asynchronously
  • TML-Interaction-Small is a 276-billion parameter MoE model with 12 billion active parameters, achieving competitive performance on third-party benchmarks with reduced latency

The shift from turn-based to simultaneous input/output processing addresses a fundamental constraint in current AI interaction: users must wait for model responses before continuing, creating friction in natural conversation. If Thinking Machines can deliver on this architecture at scale, it could reshape how AI handles real-time collaboration, live translation, and dynamic visual understanding. This represents a meaningful architectural departure from how frontier models currently process information, potentially influencing how competitors design their next-generation systems.

For operators and founders, this signals that the next competitive frontier in AI may be interaction latency and naturalness rather than raw capability alone. Companies building customer-facing AI products, live collaboration tools, or real-time assistance systems could gain significant UX advantages if they adopt similar full-duplex architectures. The dual-model approach also offers a practical template for balancing immediate responsiveness with deep reasoning, a tradeoff that affects product design and infrastructure costs.

  • Turn-based interaction may become a legacy constraint as full-duplex systems mature, forcing product teams to rethink UI/UX patterns built around waiting for model responses
  • The encoder-free early fusion approach could reduce model complexity and latency by eliminating separate audio/vision encoders, potentially lowering inference costs and enabling deployment on edge devices
  • Dual-model architectures separating real-time interaction from background reasoning may become a standard pattern, allowing teams to optimize for different latency and compute requirements within a single system

Monitor whether Thinking Machines' research preview demonstrates meaningful latency improvements and naturalness gains in real-world use cases when it opens to limited testers. Watch for competitive responses from OpenAI, Anthropic, and Google, which may accelerate their own work on simultaneous input/output processing. Track adoption patterns once broader availability launches, particularly in customer support, live translation, and collaborative coding tools where real-time interaction is most valuable.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

Multimodal AI turns aerial imagery into searchable data

Multimodal AI turns aerial imagery into searchable data

AWS and Vexcel, an aerial imagery provider operating across 45+ countries, developed a multimodal AI system that converts billions of aerial images into natural-language-searchable data without requiring per-feature model training. The system uses embedding models, LLM captioning, and vector search to index imagery once and query it with plain English. Amazon Nova Multimodal Embeddings delivered the highest F1 scores in their evaluation, and the work evolved into Vexcel Intelligence, a commercial searchable imagery product.

by Gilbert V Lepadatu· AWS Machine Learning Blog
Google DeepMind's Gemma 4 Now Available on AWS Bedrock

Google DeepMind's Gemma 4 Now Available on AWS Bedrock

Google DeepMind's Gemma 4 model family is now available on Amazon Bedrock, offering three instruction-tuned variants ranging from 2.3B to 30.7B parameters. The models support reasoning, function calling, and multimodal input while running on AWS infrastructure with data protection guarantees. Organizations can access open-weight models through a managed service without hosting infrastructure themselves.

by Aris Tsakpinis· AWS Machine Learning Blog
PixelRAG bypasses text parsing, cuts RAG costs 10x

PixelRAG bypasses text parsing, cuts RAG costs 10x

Researchers from UC Berkeley, Princeton, EPFL, and Databricks introduced PixelRAG, a retrieval system that bypasses traditional text parsing by rendering web pages as screenshots and indexing them directly for vision-language models. Tested on 30 million Wikipedia screenshot tiles, PixelRAG improved accuracy by up to 18.1% over text-based RAG systems and reduced token costs by 10x. The approach addresses fundamental information loss in conventional HTML-to-text conversion pipelines.

· VentureBeat AI
Google DeepMind Releases Gemma 4 12B for Laptop-Based AI
TrendingNews

Google DeepMind Releases Gemma 4 12B for Laptop-Based AI

Google DeepMind introduced Gemma 4 12B, a multimodal AI model designed to run on consumer laptops with 16GB of RAM. The model uses an encoder-free architecture that processes vision and audio inputs directly into the language model backbone, reducing latency and memory overhead. Performance approaches the larger 26B model while maintaining a smaller footprint, and it is released under an Apache 2.0 license.

· Google Deepmind