Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization

Cohere released Command A+, a 218-billion-parameter sparse mixture-of-experts language model under an Apache 2.0 open-source license, marking the company's first fully open-weight model release. The model achieves near-lossless compression through 4-bit quantization while maintaining reasoning performance, enabling deployment on a single NVIDIA Blackwell B200 GPU or two H100s. The release reflects Cohere's strategic bet on sovereign AI, allowing enterprises and governments to run frontier-grade models within their own secure environments without relying on proprietary cloud services.
TL;DR
- Command A+ is a 218B-parameter sparse MoE model with only 25B active parameters per inference step, delivering efficiency comparable to much smaller models while retaining reasoning capabilities
- The model supports multiple quantization formats (BF16, FP8, W4A4), with the 4-bit W4A4 variant achieving 375 tokens per second and 113ms time-to-first-token latency, a 63% speed increase over the previous Command A Reasoning model
- Apache 2.0 licensing makes this Cohere's first fully open-weight model release, enabling on-premises deployment and customization without vendor lock-in
- The tokenizer supports 48 languages with improved efficiency for non-European languages, addressing a key gap in enterprise global deployment
Why It Matters
Command A+ represents a significant shift in the open-source LLM landscape by demonstrating that sparse architectures combined with careful quantization can deliver frontier-class reasoning performance at substantially lower computational cost. This challenges the assumption that only trillion-parameter dense models can handle complex reasoning tasks, potentially reshaping infrastructure requirements and cost structures across the industry. The Apache 2.0 release also signals growing enterprise demand for models that can run entirely on-premises without cloud dependencies.
Business Impact
For operators and founders, Command A+ reduces the infrastructure barrier to deploying large language models in production. Running a 218B-parameter model on one or two GPUs instead of requiring massive distributed clusters cuts deployment costs, latency, and operational complexity significantly. The open-source licensing enables companies to avoid vendor lock-in, customize models for domain-specific tasks, and maintain data sovereignty, addressing critical concerns for regulated industries and enterprises with strict data residency requirements.
Key Implications
- Sparse MoE architectures with selective quantization may become the standard for open-source reasoning models, shifting focus from parameter count to active parameter efficiency and inference speed
- On-premises deployment of frontier-grade models becomes economically viable for mid-market and enterprise organizations, reducing reliance on cloud API providers and their associated costs and latency
- Cohere's open-source strategy positions it as a credible alternative to proprietary model providers for enterprises prioritizing sovereignty and customization, potentially accelerating adoption in regulated sectors like finance and healthcare
- The improved multilingual tokenizer efficiency could enable better performance for non-English use cases, expanding the addressable market for open models in non-Western regions
What to Watch
Monitor whether Command A+ achieves comparable performance to proprietary reasoning models like GPT-5.5 and Claude Opus 4.7 in independent benchmarks, as this will validate the sparse MoE approach for complex reasoning. Watch for enterprise adoption patterns and whether the on-premises deployment model gains traction in regulated industries. Also track whether other labs follow Cohere's lead in releasing open-weight models under permissive licenses, which could signal a broader industry shift toward sovereign AI.
Subscribe to the newsletter
The latest stories and analysis, delivered to your inbox.
Free. No spam. Unsubscribe any time.

