VFF - The signal in the noise
News

Google DeepMind's Gemma 4 Now Available on AWS Bedrock

Read original
Share
Google DeepMind's Gemma 4 Now Available on AWS Bedrock

Google DeepMind's Gemma 4 model family is now available on Amazon Bedrock, offering three instruction-tuned variants ranging from 2.3B to 30.7B parameters. The models support reasoning, function calling, and multimodal input while running on AWS infrastructure with data protection guarantees. Organizations can access open-weight models through a managed service without hosting infrastructure themselves.

  • Gemma 4 family includes three variants: Gemma 4 31B (dense), Gemma 4 26B-A4B (mixture-of-experts), and Gemma 4 E2B (dense with PLE architecture)
  • Gemma 4 31B achieves an Intelligence Index of 39 on Artificial Analysis benchmarks, significantly above the 15 median for the 4B-40B open-weights class
  • All variants support built-in reasoning, native function calling, multimodal text and image input, and pre-training across 140+ languages
  • Models are available on Amazon Bedrock as a fully managed service with no third-party data sharing and no use of prompts or completions for model training

Open-weight models have traditionally forced organizations to choose between accessing leading models and maintaining data control. Gemma 4 on Bedrock removes this trade-off by offering competitive open-weight models through AWS infrastructure with built-in security and privacy controls, enabling broader adoption of capable models without operational overhead.

Organizations can now deploy capable open-weight models without provisioning infrastructure, hosting model weights, or managing inference stacks. The range of parameter sizes allows cost and latency optimization for different use cases, from lightweight applications to complex multimodal agents and document understanding pipelines.

  • AWS strengthens its position in the managed open-weight model market by offering Google DeepMind's latest models with full infrastructure abstraction
  • The availability of mixture-of-experts variants enables efficient inference by activating only a fraction of parameters per request, reducing computational costs
  • Organizations can now build production applications with open-weight models while maintaining data sovereignty and regulatory compliance through AWS infrastructure

Monitor adoption patterns across different model variants to understand whether organizations prioritize dense models for simplicity or mixture-of-experts for efficiency. Track how Gemma 4's performance on real-world workloads compares to proprietary alternatives, and observe whether the open-weight availability accelerates migration away from closed-source model APIs.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

PixelRAG bypasses text parsing, cuts RAG costs 10x

PixelRAG bypasses text parsing, cuts RAG costs 10x

Researchers from UC Berkeley, Princeton, EPFL, and Databricks introduced PixelRAG, a retrieval system that bypasses traditional text parsing by rendering web pages as screenshots and indexing them directly for vision-language models. Tested on 30 million Wikipedia screenshot tiles, PixelRAG improved accuracy by up to 18.1% over text-based RAG systems and reduced token costs by 10x. The approach addresses fundamental information loss in conventional HTML-to-text conversion pipelines.

· VentureBeat AI
Google DeepMind Releases Gemma 4 12B for Laptop-Based AI
TrendingNews

Google DeepMind Releases Gemma 4 12B for Laptop-Based AI

Google DeepMind introduced Gemma 4 12B, a multimodal AI model designed to run on consumer laptops with 16GB of RAM. The model uses an encoder-free architecture that processes vision and audio inputs directly into the language model backbone, reducing latency and memory overhead. Performance approaches the larger 26B model while maintaining a smaller footprint, and it is released under an Apache 2.0 license.

· Google Deepmind
Google Launches Near Real-Time Voice Translation in Gemini 3.5
TrendingNews

Google Launches Near Real-Time Voice Translation in Gemini 3.5

Google has launched Gemini 3.5 Live Translate, a near real-time speech translation feature now available in Google AI Studio, Google Translate, and Google Meet. The system delivers natural-sounding voice translation with minimal latency. The rollout represents a significant step toward breaking down language barriers in professional and consumer communication.

· Google Deepmind
Google's Gemma 4 12B Brings Multimodal AI to Offline Laptops
TrendingNews

Google's Gemma 4 12B Brings Multimodal AI to Offline Laptops

Google released Gemma 4 12B, an 11.95-billion-parameter open-source model that runs entirely on a standard 16GB enterprise laptop without requiring cloud connectivity. The model uses an encoder-free architecture that processes audio and video directly without secondary processing modules, reducing latency and memory overhead. It includes a 256K token context window, native tool-use capabilities, and step-by-step reasoning mode, making it suitable for enterprises with strict data privacy requirements.

by carl.franzen@venturebeat.com (Carl Franzen)· VentureBeat AI