Why Every LLM Gives You the Same Answer

Large language models exhibit severe homogeneity in their responses to open-ended questions, converging on predictable answers across different providers. Australian startup Springboards has developed Flint, an LLM trained to generate more diverse outputs by embracing what traditional models treat as hallucinations. A November research paper won best paper at NeurIPS by documenting this phenomenon across 25 different models, finding that most responses to creative prompts cluster around identical phrases.
TL;DR
- Most LLMs give nearly identical answers to open-ended questions, ChatGPT and Claude both respond with 7 when asked for a random number between 1 and 10
- Springboards' Flint model deliberately generates wider variety in responses by treating hallucinations as features rather than bugs
- NeurIPS-winning research found 25 different LLMs produced 1,250 responses to a metaphor prompt that mostly repeated 'Time is a river' or 'Time is a weaver'
- Homogeneity stems from similar training methods, data sources, and task design across mainstream LLMs, limiting creative and exploratory use cases
Why It Matters
LLM homogeneity reveals a fundamental limitation in how current models are built and trained. When different providers' models converge on identical outputs, users receive less genuine diversity than they perceive, and creative applications like brainstorming or planning suffer. This constraint affects the practical utility of LLMs beyond structured tasks like coding or research.
Business Impact
For enterprises using LLMs for creative work, marketing, or strategic planning, homogeneity means reduced value from multi-model approaches and limited novelty in outputs. Springboards' alternative approach signals a market opportunity for differentiated LLMs, while also highlighting that current market leaders may be optimizing for safety and predictability at the cost of creative utility.
Key Implications
- Current LLM design prioritizes reducing hallucinations, which inadvertently suppresses legitimate diversity in responses to open-ended questions
- Competitive differentiation in LLMs may shift toward diversity and creativity rather than scale and accuracy alone
- Users of mainstream LLMs are receiving less personalized or varied outputs than chat interfaces suggest, raising questions about perceived versus actual model differences
What to Watch
Monitor whether Springboards' Flint gains adoption in creative industries and whether major LLM providers respond by adjusting training approaches. Watch for follow-up research on whether diversity-focused training trades off accuracy or safety, and whether enterprises begin demanding more varied outputs from their LLM providers.
Subscribe to the newsletter
The latest stories and analysis, delivered to your inbox.
Free. No spam. Unsubscribe any time.


