vff — the signal in the noise
NewsTrending

Thinking Machines Previews Full-Duplex AI for Real-Time Conversation

carl.franzen@venturebeat.com (Carl Franzen)Read original
Share
Thinking Machines Previews Full-Duplex AI for Real-Time Conversation

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, has unveiled a research preview of 'interaction models' designed to enable near-real-time, simultaneous voice and video conversation with AI. Rather than the current turn-based model where users input and wait for output, these systems use a full-duplex architecture that processes 200ms chunks of input and output concurrently, allowing the AI to listen, speak, and see in real time. The company demonstrated the approach with TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts model, paired with a background reasoning system for complex tasks. A limited research preview will launch in coming months, with broader availability expected later in 2026.

TL;DR

  • Thinking Machines previewed 'interaction models' that enable simultaneous input/output processing instead of turn-based chat, using a full-duplex architecture processing 200ms chunks concurrently
  • The system uses encoder-free early fusion to ingest raw audio and image patches directly, co-training all components within the transformer rather than relying on separate encoders
  • A dual-model architecture separates real-time interaction handling from background reasoning tasks, allowing the AI to respond immediately while delegating complex work asynchronously
  • TML-Interaction-Small is a 276-billion parameter MoE model with 12 billion active parameters, achieving competitive performance on third-party benchmarks with reduced latency

Why it matters

The shift from turn-based to simultaneous input/output processing addresses a fundamental constraint in current AI interaction: users must wait for model responses before continuing, creating friction in natural conversation. If Thinking Machines can deliver on this architecture at scale, it could reshape how AI handles real-time collaboration, live translation, and dynamic visual understanding. This represents a meaningful architectural departure from how frontier models currently process information, potentially influencing how competitors design their next-generation systems.

Business relevance

For operators and founders, this signals that the next competitive frontier in AI may be interaction latency and naturalness rather than raw capability alone. Companies building customer-facing AI products, live collaboration tools, or real-time assistance systems could gain significant UX advantages if they adopt similar full-duplex architectures. The dual-model approach also offers a practical template for balancing immediate responsiveness with deep reasoning, a tradeoff that affects product design and infrastructure costs.

Key implications

  • Turn-based interaction may become a legacy constraint as full-duplex systems mature, forcing product teams to rethink UI/UX patterns built around waiting for model responses
  • The encoder-free early fusion approach could reduce model complexity and latency by eliminating separate audio/vision encoders, potentially lowering inference costs and enabling deployment on edge devices
  • Dual-model architectures separating real-time interaction from background reasoning may become a standard pattern, allowing teams to optimize for different latency and compute requirements within a single system

What to watch

Monitor whether Thinking Machines' research preview demonstrates meaningful latency improvements and naturalness gains in real-world use cases when it opens to limited testers. Watch for competitive responses from OpenAI, Anthropic, and Google, which may accelerate their own work on simultaneous input/output processing. Track adoption patterns once broader availability launches, particularly in customer support, live translation, and collaborative coding tools where real-time interaction is most valuable.

Share

vff Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Related stories

AI Discovers Security Flaws Faster Than Humans Can Patch Them

AI Discovers Security Flaws Faster Than Humans Can Patch Them

Recent high-profile breaches at startups like Mercor and Vercel, combined with Anthropic's disclosure that its Mythos AI model identified thousands of previously unknown cybersecurity vulnerabilities, underscore growing demand for AI-powered security solutions. The article argues that cybersecurity vendors CrowdStrike and Palo Alto Networks, which are integrating AI into their threat detection and response capabilities, represent undervalued investment opportunities as enterprises face mounting pressure to defend against both conventional and AI-discovered attack vectors.

13 days ago· The Information
AWS Launches G7e GPU Instances for Cheaper Large Model Inference
TrendingModel Release

AWS Launches G7e GPU Instances for Cheaper Large Model Inference

AWS has launched G7e instances on Amazon SageMaker AI, powered by NVIDIA RTX PRO 6000 Blackwell GPUs with 96 GB of GDDR7 memory per GPU. The instances deliver up to 2.3x inference performance compared to previous-generation G6e instances and support configurations from 1 to 8 GPUs, enabling deployment of large language models up to 300B parameters on the largest 8-GPU node. This represents a significant upgrade in memory bandwidth, networking throughput, and model capacity for generative AI inference workloads.

21 days ago· AWS Machine Learning Blog
Anthropic Launches Claude Design for Non-Designers
Model Release

Anthropic Launches Claude Design for Non-Designers

Anthropic has launched Claude Design, a new product aimed at helping non-designers like founders and product managers create visuals quickly to communicate their ideas. The tool addresses a gap for early-stage teams and individuals who need to share concepts visually but lack design expertise or resources. Claude Design integrates with Anthropic's Claude AI platform, leveraging its capabilities to streamline the visual creation process. The launch reflects growing demand for AI-powered design tools that lower barriers to entry for non-technical users.

22 days ago· TechCrunch AI
Google Splits TPUs Into Training and Inference Chips

Google Splits TPUs Into Training and Inference Chips

Google is splitting its eighth-generation tensor processing units into separate chips optimized for AI training and inference, a shift the company says reflects the rise of AI agents and their distinct computational needs. The training chip delivers 2.8 times the performance of its predecessor at the same price, while the inference processor (TPU 8i) achieves 80% better performance and includes triple the SRAM of the prior generation. Both chips will launch later this year as Google continues its effort to compete with Nvidia in custom AI silicon, though the company is not directly benchmarking against Nvidia's offerings.

20 days ago· Direct