{"author":{"name":"Gleb Geinke","slug":"gleb-geinke","article_count":1,"latest_published_at":"2026-04-23T14:40:54.589+00:00","profile_url":"https://vff.ai/authors/gleb-geinke","api_url":"https://vff.ai/api/authors/gleb-geinke"},"articles":[{"slug":"cost-effective-multilingual-audio-transcription-at-scale-with-parakeet-tdt-and-a","title":"Open-source speech recognition cuts transcription costs to fractions of a cent","url":"https://vff.ai/article/2026/04/23/cost-effective-multilingual-audio-transcription-at-scale-with-parakeet-tdt-and-a","content_type":"aggregated_news","summary":"AWS and NVIDIA have published a guide for cost-effective multilingual audio transcription using the open-source Parakeet-TDT-0.6B-v3 model deployed on AWS Batch with GPU acceleration. The approach achieves transcription costs of fractions of a cent per hour of audio by leveraging the model's Token-and-Duration Transducer architecture, which predicts text tokens and their duration to skip silence and redundant processing, enabling inference speeds orders of magnitude faster than real-time. The solution supports 25 European languages with automatic language detection and integrates with S3, EventBridge, and EC2 Spot Instances to create a fully automated, event-driven pipeline that scales to zero when idle.","published_at":"2026-04-23T14:40:54.589+00:00","updated_at":"2026-04-23T14:40:52.785887+00:00","source":{"url":"https://aws.amazon.com/blogs/machine-learning/cost-effective-multilingual-audio-transcription-at-scale-with-parakeet-tdt-and-aws-batch/","name":"AWS Machine Learning Blog"},"featured_image":{"url":"https://ossels.ai/wp-content/uploads/2025/08/NVIDIA-Canary-1B-and-Parakeet-TDT-06B-728x485.jpg","alt":null},"categories":[{"name":"Voice & Video AI","slug":"voice-video-ai"},{"name":"AI for Business","slug":"ai-for-business"},{"name":"Infrastructure","slug":"infrastructure"},{"name":"Open Source","slug":"open-source"}]}]}