News

Loka Cuts Voice AI Latency with Amazon Nova 2 Sonic

Bojan JakimovskiJun 24, 2026 · about 3 hours ago

Loka built a voice AI agent using Amazon Nova 2 Sonic that processes audio end-to-end rather than converting speech to text and back, reducing response latency from 3-5 seconds to near-real-time while lowering costs. The approach achieved a speech reasoning score of 87.0 on Big Bench Audio, outperforming Google's Gemini 2.5 Flash (71.0) and OpenAI's GPT Realtime (83.0). The solution addresses a core frustration with traditional voice assistants: robotic, slow responses that damage customer experience and increase support costs.

TL;DR

Loka deployed Amazon Nova 2 Sonic for native speech-to-speech processing, eliminating the traditional three-step pipeline (speech-to-text, LLM, text-to-speech) that introduces 3-5 second delays
Amazon Nova 2 Sonic scored 87.0 on Big Bench Audio speech reasoning benchmark, outperforming Gemini 2.5 Flash Native Audio (71.0) and GPT Realtime (83.0)
Native audio processing preserves tone, emotion, and subtle cues lost in text conversion, improving handling of complex requests like negation and scheduling constraints
End-to-end audio approach reduces costs at scale while enabling faster, more natural conversational experiences for customer-facing applications like automotive dealership support

Why It Matters

Voice AI has struggled with latency and cost at scale, making it impractical for many customer service applications. Native speech-to-speech models sidestep the compounding delays of traditional pipelines by processing audio directly, capturing nuance that text-based systems lose. This represents a fundamental shift in how conversational AI can be deployed for real-time customer interactions.

Business Impact

Slow voice assistants drive customers to hang up, damaging brand reputation and increasing support costs. Loka's approach delivers faster response times and lower operational costs, making voice AI economically viable for businesses serving thousands of locations. The performance advantage on benchmarks suggests native audio models can handle complex customer requests more accurately than traditional systems.

Key Implications

Native speech-to-speech models may become the standard for customer-facing voice applications, displacing traditional multi-step pipelines that introduce latency and information loss
Cost efficiency at scale could accelerate voice AI adoption across industries like automotive, retail, and customer support where real-time responsiveness is critical
Benchmark performance differences between models (Amazon Nova 2 Sonic at 87.0 vs competitors at 71-83) will likely influence enterprise purchasing decisions for voice AI infrastructure

What to Watch

Monitor adoption rates of native speech-to-speech models across customer service platforms and whether latency improvements translate to measurable business outcomes like reduced call abandonment. Track whether other cloud providers release competing native audio models and how pricing evolves as the technology matures. Watch for real-world accuracy and cost data from Loka and other early adopters to validate benchmark performance claims.

Voice & Video AI AI for Business Generative AI AWS

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

ByteDance Upgrades Video AI Model to Seedance 2.5

ByteDance unveiled Seedance 2.5, an upgraded AI video generation model, at a Beijing conference on Tuesday. The new model improves upon Seedance 2.0, which was previously recognized as a significant breakthrough in AI video generation.

by Juro Osawa1 day ago· The Information

Voice & Video AITrendingNews

Fika Jobs raises $4M for AI-powered video hiring platform

Fika Jobs, a Stockholm-based startup, has raised $4 million to develop a video-first hiring platform that uses AI interview agents alongside short-form video candidate profiles. The platform blends elements of LinkedIn and TikTok to streamline recruitment. The funding supports the company's expansion of its AI-driven interview and candidate discovery capabilities.

by Lauren Forristal1 day ago· TechCrunch AI

Voice & Video AITrendingNews

Alibaba's HappyHorse Rises as Sora and Seedance Retreat

Alibaba Cloud released HappyHorse 1.1, an upgraded AI video generation model now ranked No. 2 globally on independent benchmarks. The release capitalizes on market consolidation following OpenAI's discontinuation of Sora and ByteDance's indefinite shelving of Seedance 2.0 due to financial and copyright pressures. HappyHorse is positioned as an enterprise-grade, API-first product backed by Alibaba's infrastructure, targeting integration into corporate content production workflows.

by michael.nunez@venturebeat.com (Michael Nuñez)1 day ago· VentureBeat AI

Voice & Video AINews

Google Replaces Assistant with Gemini in New $99.99 Home Speaker

Google launched a new $99.99 Home Speaker that replaces the Google Assistant's rigid command structure with conversational interactions powered by Gemini. The move represents Google's effort to revitalize the smart speaker category through generative AI capabilities. The device marks a shift in how users interact with smart home devices, moving away from precise voice commands toward more natural dialogue.

by Sarah Perez6 days ago· TechCrunch AI