Financial reports and earnings calls contain critical information that investors, regulators, and analysts rely on to understand a company's health and future prospects. However, these documents are notoriously complex, filled with specialized jargon and nuanced language that makes manual analysis time-consuming and expensive. A new study shows that artificial intelligence can now accurately detect financial stances in these documents, potentially transforming how financial information is processed.
The researchers discovered that large language models (LLMs) can effectively identify whether sentences in financial documents express positive, negative, or neutral stances toward key financial targets like earnings per share (EPS), debt, and sales. Using a carefully constructed dataset of SEC Form 10-K annual reports and earnings call transcripts from MATIV Holdings Inc. and Co., the team found that GPT-4.1-Mini achieved the highest accuracy at 87.79%, outperforming other models like Llama 3.3 (83.02%), Gemma3-27B (81.21%), and Mistral Small (68.6%).
The methodology involved creating a sentence-level financial stance detection corpus where sentences explicitly referencing the three financial targets were extracted and labeled using ChatGPT-o3-pro with rigorous human validation. Human annotators showed 97% agreement with the AI's annotations, establishing high confidence in the labeling quality. The researchers then systematically evaluated multiple LLMs using different prompting strategies: zero-shot (no examples), few-shot (with examples), and chain-of-thought prompting (where models generate intermediate reasoning steps).
Results showed that chain-of-thought prompting consistently improved performance across all models, with an average 4.23% accuracy improvement compared to standard few-shot prompting. The study also revealed that providing contextual information—either the full Management's Discussion and Analysis section from SEC filings or summarized versions—significantly boosted performance for most models. Interestingly, selecting the most semantically similar examples for few-shot learning outperformed random example selection by an average of 2% across datasets.
For practical applications, this breakthrough means financial institutions could automate the analysis of complex financial documents without requiring extensive, expensive training data for each specific financial target. The research demonstrates that modern AI can handle the nuanced language of financial reporting, where the same event—like an increase in debt—might be framed as either a strategic opportunity or a major risk depending on context.
The study did identify limitations. The dataset focused exclusively on two companies, which may limit generalizability to broader financial contexts. Additionally, while human validation showed strong agreement with AI annotations, relying on ChatGPT-o3-pro for labeling could introduce model-specific biases. The researchers also noted that earnings call transcripts proved easier for AI analysis than SEC filings, with accuracy gaps attributed to the more conversational nature of transcripts versus the formal, numerically dense structure of SEC documents.
What remains unknown is how well these models would perform across different industries, company sizes, or international financial reporting standards. The research opens the door to more automated financial analysis but also highlights the need for continued development of AI's quantitative reasoning capabilities, particularly for complex financial calculations embedded in formal documents.
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn