VFF - The signal in the noise
News

Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization

Read original
Share
Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization

Cohere released Command A+, a 218-billion-parameter sparse mixture-of-experts language model under an Apache 2.0 open-source license, marking the company's first fully open-weight model release. The model achieves near-lossless compression through 4-bit quantization while maintaining reasoning performance, enabling deployment on a single NVIDIA Blackwell B200 GPU or two H100s. The release reflects Cohere's strategic bet on sovereign AI, allowing enterprises and governments to run frontier-grade models within their own secure environments without relying on proprietary cloud services.

  • Command A+ is a 218B-parameter sparse MoE model with only 25B active parameters per inference step, delivering efficiency comparable to much smaller models while retaining reasoning capabilities
  • The model supports multiple quantization formats (BF16, FP8, W4A4), with the 4-bit W4A4 variant achieving 375 tokens per second and 113ms time-to-first-token latency, a 63% speed increase over the previous Command A Reasoning model
  • Apache 2.0 licensing makes this Cohere's first fully open-weight model release, enabling on-premises deployment and customization without vendor lock-in
  • The tokenizer supports 48 languages with improved efficiency for non-European languages, addressing a key gap in enterprise global deployment

Command A+ represents a significant shift in the open-source LLM landscape by demonstrating that sparse architectures combined with careful quantization can deliver frontier-class reasoning performance at substantially lower computational cost. This challenges the assumption that only trillion-parameter dense models can handle complex reasoning tasks, potentially reshaping infrastructure requirements and cost structures across the industry. The Apache 2.0 release also signals growing enterprise demand for models that can run entirely on-premises without cloud dependencies.

For operators and founders, Command A+ reduces the infrastructure barrier to deploying large language models in production. Running a 218B-parameter model on one or two GPUs instead of requiring massive distributed clusters cuts deployment costs, latency, and operational complexity significantly. The open-source licensing enables companies to avoid vendor lock-in, customize models for domain-specific tasks, and maintain data sovereignty, addressing critical concerns for regulated industries and enterprises with strict data residency requirements.

  • Sparse MoE architectures with selective quantization may become the standard for open-source reasoning models, shifting focus from parameter count to active parameter efficiency and inference speed
  • On-premises deployment of frontier-grade models becomes economically viable for mid-market and enterprise organizations, reducing reliance on cloud API providers and their associated costs and latency
  • Cohere's open-source strategy positions it as a credible alternative to proprietary model providers for enterprises prioritizing sovereignty and customization, potentially accelerating adoption in regulated sectors like finance and healthcare
  • The improved multilingual tokenizer efficiency could enable better performance for non-English use cases, expanding the addressable market for open models in non-Western regions

Monitor whether Command A+ achieves comparable performance to proprietary reasoning models like GPT-5.5 and Claude Opus 4.7 in independent benchmarks, as this will validate the sparse MoE approach for complex reasoning. Watch for enterprise adoption patterns and whether the on-premises deployment model gains traction in regulated industries. Also track whether other labs follow Cohere's lead in releasing open-weight models under permissive licenses, which could signal a broader industry shift toward sovereign AI.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

Z.ai launches ZCode to undercut Cursor and Claude Code
TrendingNews

Z.ai launches ZCode to undercut Cursor and Claude Code

Z.ai, a Beijing-based AI lab, launched ZCode, a free desktop application designed as an agent-first development environment for its GLM-5.2 model. The tool competes directly with Cursor, Claude Code, GitHub Copilot, and Google's Antigravity in the AI coding market. ZCode's pricing undercuts competitors significantly, with plans starting at $16.20 per month, and includes features like remote control via WeChat and Feishu, reflecting the company's focus on the Chinese developer market.

by michael.nunez@venturebeat.com (Michael Nuñez)· VentureBeat AI
Why Every LLM Gives You the Same Answer
News

Why Every LLM Gives You the Same Answer

Large language models exhibit severe homogeneity in their responses to open-ended questions, converging on predictable answers across different providers. Australian startup Springboards has developed Flint, an LLM trained to generate more diverse outputs by embracing what traditional models treat as hallucinations. A November research paper won best paper at NeurIPS by documenting this phenomenon across 25 different models, finding that most responses to creative prompts cluster around identical phrases.

by Will Douglas Heaven· MIT Technology Review
Anthropic Cuts Prices on Claude Sonnet 5 to Challenge Agent Market
TrendingNews

Anthropic Cuts Prices on Claude Sonnet 5 to Challenge Agent Market

Anthropic has launched Claude Sonnet 5, a model positioned as a more affordable alternative to its Opus offering and competitors like GPT-5.5 and Gemini Pro. The new model delivers stronger agentic capabilities, lower pricing, and improved safety features. The release targets organizations looking to deploy AI agents at reduced operational cost.

by Rebecca Bellan· TechCrunch AI
Anthropic wins approval to restore Claude Fable 5 after Trump talks
TrendingNews

Anthropic wins approval to restore Claude Fable 5 after Trump talks

Anthropic has received clearance from the U.S. Department of Commerce to restore Claude Fable 5 and Mythos 5 after weeks of negotiations with the Trump administration. The company plans to begin restoring global access on Wednesday across Claude platforms, with availability on AWS, Google Cloud, and Microsoft Foundry to follow without a set timeline.

by Hayden Field· The Verge AI