KGs for Financial Reasoning

📧 Email me top thoughts like this

Exploring🌍 Public

parag·1/13/2026

Question / Claim

Do knowledge graphs meaningfully improve LLM numerical reasoning over financial documents?

Key Assumptions

LLMs struggle with numerical and multi-hop reasoning when information is only in unstructured text.(high confidence)
Financial documents contain implicit structure (tables, periods, units) that is lost when flattened to text.(high confidence)
A domain-specific schema can capture most relevant financial facts needed for reasoning.(medium confidence)
Most observed 'math errors' in document QA come from selecting or mixing the wrong numbers, not from arithmetic mistakes.(high confidence)
Flattening tables and semi-structured data into text destroys constraints that LLMs do not reliably reconstruct.(high confidence)
Even larger models will continue to make grounding and attribution errors without an explicit structured representation.(medium confidence)

Evidence & Observations

The arXiv paper 'Structure First, Reason Next' reports ~12% relative improvement on FinQA using a KG-enhanced pipeline.(citation)
Empirical result in 'Structure First, Reason Next' shows ~12% relative improvement on FinQA when using a KG, suggesting structure and grounding, not raw computation, are the bottleneck.(citation)
FinQA benchmark paper shows models often fail due to wrong number selection and multi-step reasoning over tables and text, not pure arithmetic.(citation)
Chain-of-Thought prompting improves arithmetic but still suffers from grounding and retrieval errors in long documents.(citation)
Tabular reasoning work (TaPas) shows that structure-aware representations significantly outperform text-only models on table QA.(citation)
RAG and tool-using agents reduce hallucination by grounding models in structured sources, supporting the idea that representation, not computation, is the bottleneck.(citation)

Open Uncertainties

How well does this approach generalize beyond FinQA or beyond financial documents?
Is the cost and complexity of building the knowledge graph worth it compared to just using larger or better LLMs?
How robust is the pipeline to extraction errors when building the KG?
To what extent can better table-aware or tool-using models close this gap without a full knowledge graph?
What is the minimal structure needed to get most of the benefit (KG vs simpler schemas)?

Current Position

LLMs are not inherently bad at math; failures mostly come from poor grounding, structure loss, and information selection in long, messy documents. Providing a structured world model (e.g., a knowledge graph) before reasoning materially improves reliability for multi-step numerical tasks.

This is work-in-progress thinking, not a final conclusion.

References(5)

1.^
"Structure First, Reason Next: Enhancing a Large Language Model Using Knowledge Graph for Numerical Reasoning in Financial Documents"↗arxiv.org— Paper proposing KG-augmented reasoning for financial numerical QA.
2.^
"FinQA: A Dataset of Numerical Reasoning over Financial Data"↗arxiv.org— Shows multi-step numerical reasoning failures often come from retrieval and grounding issues.
3.^
"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"↗arxiv.org— Shows LLMs can do arithmetic but still depend on correct intermediate facts.
4.^
"TaPas: Weakly Supervised Table Parsing"↗arxiv.org— Demonstrates the importance of structure-aware models for table reasoning.
5.^
"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"↗arxiv.org— Classic paper on grounding LLMs in external structured knowledge to improve factuality.

6A6E5U

•

Engage with this Thought

Comments

No comments yet. Be the first to share your thoughts!