VFF - The signal in the noise
Research

Aggregating Zero-Shot LLMs Beats Single Models for Financial Disclosure Analysis

Read original
Share
Aggregating Zero-Shot LLMs Beats Single Models for Financial Disclosure Analysis

A new paper demonstrates that a lightweight supervised aggregator can effectively combine outputs from multiple zero-shot LLMs to improve corporate disclosure classification and stock return prediction. Researchers tested three fixed zero-shot classifiers reading financial disclosures from different perspectives, then trained a logistic meta-classifier to aggregate their outputs. Using 9,860 U.S. corporate disclosures from January 2025 to March 2026, the trained aggregator achieved 60.6% balanced accuracy compared to 56.6% for the best single classifier, with the largest gains appearing in mixed-signal cases where classifiers disagreed.

  • Supervised aggregation of zero-shot LLM outputs outperforms single classifiers, majority voting, and confidence-weighted voting for financial disclosure analysis
  • Balanced accuracy improved from 56.6% to 60.6% when combining three diverse zero-shot LLM perspectives through a trained meta-classifier
  • The approach works best on disclosures with mixed signals where individual classifiers disagree, suggesting complementary financial insights across models
  • Evaluation used post-release data (Jan 2025-Mar 2026) to avoid contamination from training data in the base LLMs

This work addresses a practical challenge in deploying LLMs for financial analysis: zero-shot models produce variable outputs, but simple voting schemes leave performance on the table. The paper shows that lightweight supervised aggregation can extract complementary signals from diverse model perspectives without expensive fine-tuning, offering a scalable approach for financial institutions seeking to leverage multiple LLM outputs.

For fintech and asset management firms, this demonstrates a cost-effective way to improve prediction accuracy on corporate disclosures without retraining large models. The approach is particularly valuable for capturing nuanced financial signals in ambiguous disclosures where different analytical perspectives yield different conclusions, potentially improving trading signals and risk assessment.

  • Ensemble methods combining zero-shot LLM outputs can outperform individual models and traditional voting schemes, suggesting that model diversity itself carries signal value
  • Supervised aggregation of LLM outputs requires minimal additional training and infrastructure compared to fine-tuning, making it accessible to organizations with limited ML resources
  • The largest performance gains occur on ambiguous or mixed-signal inputs, indicating that aggregation is most valuable where uncertainty is highest rather than on straightforward cases

Monitor whether this aggregation pattern generalizes to other financial tasks beyond disclosure classification, such as earnings call analysis or regulatory filing interpretation. Also track whether similar lightweight aggregation approaches prove effective in other domains where zero-shot LLMs produce variable outputs, and whether financial institutions adopt this method in production systems.

Share

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Related stories

Tencent Backs Alibaba's Former Qwen Researcher in $20M AI Lab Deal
TrendingNews

Tencent Backs Alibaba's Former Qwen Researcher in $20M AI Lab Deal

Tencent Holdings has invested $20 million in an AI lab founded by Junyang Lin, the former lead researcher behind Alibaba's Qwen models. Lin's new venture raised several hundred million dollars in its first funding round. The investment signals Tencent's interest in backing independent AI research talent and reflects ongoing competition among Chinese tech giants for AI expertise.

by Jing Yang· The Information
PixelRAG bypasses text parsing, cuts RAG costs 10x

PixelRAG bypasses text parsing, cuts RAG costs 10x

Researchers from UC Berkeley, Princeton, EPFL, and Databricks introduced PixelRAG, a retrieval system that bypasses traditional text parsing by rendering web pages as screenshots and indexing them directly for vision-language models. Tested on 30 million Wikipedia screenshot tiles, PixelRAG improved accuracy by up to 18.1% over text-based RAG systems and reduced token costs by 10x. The approach addresses fundamental information loss in conventional HTML-to-text conversion pipelines.

· VentureBeat AI
Google's 'Faithful Uncertainty' Lets LLMs Hedge Instead of Hallucinate
TrendingNews

Google's 'Faithful Uncertainty' Lets LLMs Hedge Instead of Hallucinate

Google researchers propose 'faithful uncertainty,' a technique that allows large language models to express qualified guesses rather than either confidently hallucinating or refusing to answer. The approach reframes hallucinations as 'confident errors' and enables models to hedge responses appropriately, preserving utility while maintaining trustworthiness. This addresses a core tradeoff in LLM deployment where eliminating factual errors typically forces models to abstain from answering questions they actually know.

by bendee983@gmail.com (Ben Dickson)· VentureBeat AI
Researcher Develops Method to Train Robots on Uncertain Tasks

Researcher Develops Method to Train Robots on Uncertain Tasks

Yen-Ling Kuo, an assistant professor at the University of Virginia, received the IEEE Robotics and Automation Society's inaugural Outstanding Women in Robotics and Automation Early Career Contribution Award for her work on uncertainty estimation in robotic manipulation. Her research method, detailed in the paper 'Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation,' enables robots to make informed decisions in unfamiliar scenarios while reducing the need for human supervision. The approach improves task completion rates and creates pathways for more complex models in interactive robot learning.

by Liz Wegerer· IEEE Spectrum AI