Kwegg

KGs for Financial Reasoning

๐Ÿ“ง Email me top thoughts like this
Exploring๐ŸŒ Public
paragยท1/13/2026

Question / Claim

Do knowledge graphs meaningfully improve LLM numerical reasoning over financial documents?

Key Assumptions

  • LLMs struggle with numerical and multi-hop reasoning when information is only in unstructured text.(high confidence)
  • Financial documents contain implicit structure (tables, periods, units) that is lost when flattened to text.(high confidence)
  • A domain-specific schema can capture most relevant financial facts needed for reasoning.(medium confidence)
  • Most observed 'math errors' in document QA come from selecting or mixing the wrong numbers, not from arithmetic mistakes.(high confidence)
  • Flattening tables and semi-structured data into text destroys constraints that LLMs do not reliably reconstruct.(high confidence)
  • Even larger models will continue to make grounding and attribution errors without an explicit structured representation.(medium confidence)

Evidence & Observations

  • The arXiv paper 'Structure First, Reason Next' reports ~12% relative improvement on FinQA using a KG-enhanced pipeline.(citation)
  • Empirical result in 'Structure First, Reason Next' shows ~12% relative improvement on FinQA when using a KG, suggesting structure and grounding, not raw computation, are the bottleneck.(citation)
  • FinQA benchmark paper shows models often fail due to wrong number selection and multi-step reasoning over tables and text, not pure arithmetic.(citation)
  • Chain-of-Thought prompting improves arithmetic but still suffers from grounding and retrieval errors in long documents.(citation)
  • Tabular reasoning work (TaPas) shows that structure-aware representations significantly outperform text-only models on table QA.(citation)
  • RAG and tool-using agents reduce hallucination by grounding models in structured sources, supporting the idea that representation, not computation, is the bottleneck.(citation)

Open Uncertainties

  • How well does this approach generalize beyond FinQA or beyond financial documents?
  • Is the cost and complexity of building the knowledge graph worth it compared to just using larger or better LLMs?
  • How robust is the pipeline to extraction errors when building the KG?
  • To what extent can better table-aware or tool-using models close this gap without a full knowledge graph?
  • What is the minimal structure needed to get most of the benefit (KG vs simpler schemas)?

Current Position

LLMs are not inherently bad at math; failures mostly come from poor grounding, structure loss, and information selection in long, messy documents. Providing a structured world model (e.g., a knowledge graph) before reasoning materially improves reliability for multi-step numerical tasks.

This is work-in-progress thinking, not a final conclusion.

References(5)

  1. 1.^
    "Structure First, Reason Next: Enhancing a Large Language Model Using Knowledge Graph for Numerical Reasoning in Financial Documents"โ†—arxiv.orgโ€” Paper proposing KG-augmented reasoning for financial numerical QA.
  2. 2.^
    "FinQA: A Dataset of Numerical Reasoning over Financial Data"โ†—arxiv.orgโ€” Shows multi-step numerical reasoning failures often come from retrieval and grounding issues.
  3. 3.^
    "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models"โ†—arxiv.orgโ€” Shows LLMs can do arithmetic but still depend on correct intermediate facts.
  4. 4.^
    "TaPas: Weakly Supervised Table Parsing"โ†—arxiv.orgโ€” Demonstrates the importance of structure-aware models for table reasoning.
  5. 5.^
    "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"โ†—arxiv.orgโ€” Classic paper on grounding LLMs in external structured knowledge to improve factuality.
0
6A6E5U
โ€ข
Login to vote

Engage with this Thought

Comments

No comments yet. Be the first to share your thoughts!

Related Thoughts