Aggregating Zero-Shot LLMs Beats Single Models for Financial Disclosure Analysis

A new paper demonstrates that a lightweight supervised aggregator can effectively combine outputs from multiple zero-shot LLMs to improve corporate disclosure classification and stock return prediction. Researchers tested three fixed zero-shot classifiers reading financial disclosures from different perspectives, then trained a logistic meta-classifier to aggregate their outputs. Using 9,860 U.S. corporate disclosures from January 2025 to March 2026, the trained aggregator achieved 60.6% balanced accuracy compared to 56.6% for the best single classifier, with the largest gains appearing in mixed-signal cases where classifiers disagreed.
TL;DR
- →Supervised aggregation of zero-shot LLM outputs outperforms single classifiers, majority voting, and confidence-weighted voting for financial disclosure analysis
- →Balanced accuracy improved from 56.6% to 60.6% when combining three diverse zero-shot LLM perspectives through a trained meta-classifier
- →The approach works best on disclosures with mixed signals where individual classifiers disagree, suggesting complementary financial insights across models
- →Evaluation used post-release data (Jan 2025-Mar 2026) to avoid contamination from training data in the base LLMs
Why it matters
This work addresses a practical challenge in deploying LLMs for financial analysis: zero-shot models produce variable outputs, but simple voting schemes leave performance on the table. The paper shows that lightweight supervised aggregation can extract complementary signals from diverse model perspectives without expensive fine-tuning, offering a scalable approach for financial institutions seeking to leverage multiple LLM outputs.
Business relevance
For fintech and asset management firms, this demonstrates a cost-effective way to improve prediction accuracy on corporate disclosures without retraining large models. The approach is particularly valuable for capturing nuanced financial signals in ambiguous disclosures where different analytical perspectives yield different conclusions, potentially improving trading signals and risk assessment.
Key implications
- →Ensemble methods combining zero-shot LLM outputs can outperform individual models and traditional voting schemes, suggesting that model diversity itself carries signal value
- →Supervised aggregation of LLM outputs requires minimal additional training and infrastructure compared to fine-tuning, making it accessible to organizations with limited ML resources
- →The largest performance gains occur on ambiguous or mixed-signal inputs, indicating that aggregation is most valuable where uncertainty is highest rather than on straightforward cases
What to watch
Monitor whether this aggregation pattern generalizes to other financial tasks beyond disclosure classification, such as earnings call analysis or regulatory filing interpretation. Also track whether similar lightweight aggregation approaches prove effective in other domains where zero-shot LLMs produce variable outputs, and whether financial institutions adopt this method in production systems.
vff Briefing
Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.
No spam. Unsubscribe any time.



