Topic
LLMs
Large language model releases, benchmarks, and capability research
Featured
All Stories

Automated LLM reasoning cuts token costs by 70 percent
Researchers from Meta, Google, and universities have developed AutoTTS, a framework that automatically discovers…

MiniMax Teases M3 With 15.6X Speed Boost for Long-Context AI
MiniMax released a technical report on its M2 language model series while teasing an upcoming M3 model that uses a new…

NVIDIA Shifts to Parallel Text Generation with Diffusion Models
NVIDIA released Nemotron-Labs Diffusion, a family of language models that generate text in parallel rather than…

Lightweight Memory Technique Cuts Agent Parameter Overhead to 0.12%
Researchers from Mind Lab and universities have developed delta-mem, a technique that adds just 0.12% of parameters to…

Alibaba's Qwen3.7-Max Runs 35 Hours Autonomously, Shifts to Paid Model
Alibaba released Qwen3.7-Max, a proprietary AI model capable of 35 hours of continuous autonomous execution, marking a…

Cerebras Runs Trillion-Parameter Model 7x Faster Than GPU Clouds
Cerebras announced it is running Kimi K2.6, a trillion-parameter open-weight model from Chinese AI startup Moonshot AI,…

Cohere Open-Sources 218B Sparse Model with Lossless 4-Bit Quantization
Cohere released Command A+, a 218-billion-parameter sparse mixture-of-experts language model under an Apache 2.0…

Google Faces Coding Crisis Ahead of I/O Conference
Google enters its annual I/O developer conference positioned as a clear third place in the foundation model race,…

Open-Source Models Gain Ground, But Reasoning Gap Remains
As frontier AI model costs rise, some developers are exploring open-source alternatives like DeepSeek V4 and Moonshot…

AWS Bedrock Adds Programmatic Tool Calling for Faster Multi-Step AI Workflows
Amazon Bedrock now supports programmatic tool calling (PTC), a pattern where LLMs generate executable code to…

RecursiveMAS cuts multi-agent costs by 75% with latent-space communication
Researchers at University of Illinois Urbana-Champaign and Stanford University have developed RecursiveMAS, a framework…

Databricks Integrates GPT-5.5 for Enterprise Agents
Databricks has integrated OpenAI's GPT-5.5 model into its enterprise agent workflows following the model's performance…

Empromptu AI launches Alchemy Models for continuous fine-tuning from production workflows
Empromptu AI launched Alchemy Models, a platform that automatically captures training data from enterprise AI…

AI IQ Launches Model Scorecard, Sparks Precision vs. Simplicity Debate
A new site called AI IQ has launched a framework for scoring frontier language models on a single intelligence…

Frontier LLMs Silently Corrupt 25% of Documents in Iterative Workflows
Microsoft researchers developed a benchmark showing that frontier LLMs silently corrupt an average of 25% of document…
Hermes Agent Becomes Most-Used Framework as Local AI Agents Go Mainstream
Hermes Agent, an open source agentic AI framework from Nous Research, has reached 140,000 GitHub stars in under three…

Sakana trains 7B model to orchestrate GPT, Claude, Gemini
Sakana AI has developed RL Conductor, a 7-billion-parameter language model trained via reinforcement learning to…

AWS Details Verifiable Rewards Method for More Reliable LLM Training
AWS published a technical guide on reinforcement learning with verifiable rewards (RLVR), a method that addresses…

Subquadratic claims 1,000x efficiency gain; researchers demand proof
Miami-based startup Subquadratic emerged from stealth claiming its SubQ 1M-Preview model achieves a 1,000x efficiency…

Faithful Reasoning Emerges from Multi-Move Training, Not Direct Prediction
Researchers studied how reasoning develops in language models across supervised fine-tuning and reinforcement learning…

Safety Routing Circuits Found Across Models, Vulnerable to Encoding Attacks
Researchers have localized the policy routing mechanism in alignment-trained language models, identifying specific…

Cursor Keeps Its Distance From xAI Despite SpaceX Tie-Up
Despite SpaceX's $60 billion conditional takeover offer for Cursor last month, the coding startup is maintaining…

The AI scaffolding layer is collapsing. Context is the new moat.
The middleware layer that once helped developers build LLM applications, including indexing frameworks, query engines,…
Warmer AI Models Trade Accuracy for Empathy
Researchers at Oxford University's Internet Institute found that large language models fine-tuned to appear warmer and…

How OpenAI's Personality Feature Unleashed the Goblins
OpenAI's GPT-5.5 model exhibited unexpected behavior where it became obsessed with discussing goblins, gremlins, and…

