News

Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization

carl.franzen@venturebeat.com (Carl Franzen)May 21, 2026 · about 2 months ago

Cohere released Command A+, a 218-billion-parameter sparse mixture-of-experts language model under an Apache 2.0 open-source license, marking the company's first fully open-weight model release. The model achieves near-lossless compression through 4-bit quantization while maintaining reasoning performance, enabling deployment on a single NVIDIA Blackwell B200 GPU or two H100s. The release reflects Cohere's strategic bet on sovereign AI, allowing enterprises and governments to run frontier-grade models within their own secure environments without relying on proprietary cloud services.

TL;DR

Command A+ is a 218B-parameter sparse MoE model with only 25B active parameters per inference step, delivering efficiency comparable to much smaller models while retaining reasoning capabilities
The model supports multiple quantization formats (BF16, FP8, W4A4), with the 4-bit W4A4 variant achieving 375 tokens per second and 113ms time-to-first-token latency, a 63% speed increase over the previous Command A Reasoning model
Apache 2.0 licensing makes this Cohere's first fully open-weight model release, enabling on-premises deployment and customization without vendor lock-in
The tokenizer supports 48 languages with improved efficiency for non-European languages, addressing a key gap in enterprise global deployment

Why It Matters

Command A+ represents a significant shift in the open-source LLM landscape by demonstrating that sparse architectures combined with careful quantization can deliver frontier-class reasoning performance at substantially lower computational cost. This challenges the assumption that only trillion-parameter dense models can handle complex reasoning tasks, potentially reshaping infrastructure requirements and cost structures across the industry. The Apache 2.0 release also signals growing enterprise demand for models that can run entirely on-premises without cloud dependencies.

Business Impact

For operators and founders, Command A+ reduces the infrastructure barrier to deploying large language models in production. Running a 218B-parameter model on one or two GPUs instead of requiring massive distributed clusters cuts deployment costs, latency, and operational complexity significantly. The open-source licensing enables companies to avoid vendor lock-in, customize models for domain-specific tasks, and maintain data sovereignty, addressing critical concerns for regulated industries and enterprises with strict data residency requirements.

Key Implications

Sparse MoE architectures with selective quantization may become the standard for open-source reasoning models, shifting focus from parameter count to active parameter efficiency and inference speed
On-premises deployment of frontier-grade models becomes economically viable for mid-market and enterprise organizations, reducing reliance on cloud API providers and their associated costs and latency
Cohere's open-source strategy positions it as a credible alternative to proprietary model providers for enterprises prioritizing sovereignty and customization, potentially accelerating adoption in regulated sectors like finance and healthcare
The improved multilingual tokenizer efficiency could enable better performance for non-English use cases, expanding the addressable market for open models in non-Western regions

What to Watch

Monitor whether Command A+ achieves comparable performance to proprietary reasoning models like GPT-5.5 and Claude Opus 4.7 in independent benchmarks, as this will validate the sparse MoE approach for complex reasoning. Watch for enterprise adoption patterns and whether the on-premises deployment model gains traction in regulated industries. Also track whether other labs follow Cohere's lead in releasing open-weight models under permissive licenses, which could signal a broader industry shift toward sovereign AI.

LLMs Infrastructure Model Releases Open Source

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Subscribe to the newsletter

Z.ai launches ZCode to undercut Cursor and Claude Code

Why Every LLM Gives You the Same Answer

Anthropic Cuts Prices on Claude Sonnet 5 to Challenge Agent Market

Anthropic wins approval to restore Claude Fable 5 after Trump talks

Related stories

Z.ai launches ZCode to undercut Cursor and Claude Code

Why Every LLM Gives You the Same Answer

Anthropic Cuts Prices on Claude Sonnet 5 to Challenge Agent Market

Anthropic wins approval to restore Claude Fable 5 after Trump talks