Skip to content

SakanaAI/EDINET-Bench

Repository files navigation

EDINET-Bench

📚 Paper | 📝 Blog | 📁 Dataset | 🧑‍💻 Code

This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.

Overview of EDINET-Bench.

For the dataset construction code, please visit https://github.com/SakanaAI/edinet2dataset.

Install

Install the dependencies using uv.

uv sync

You also need to configure the API keys for each LLM provider in the .env file.

Evaluation

Accounting Fraud Detection and Earnings Forecast

Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports.

$ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Use logistic model as a baseline.

$ python src/edinet_bench/logistic.py --task earnings_forecast

Create a leaderboard for each model.

$ python src/edinet_bench/make_leaderboard.py --task fraud_detection

Industry Prediction

Predict a company's industry type (e.g., Banking) based on its current annual report.

$ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Create a leaderboard for each model.

$ python src/edinet_bench/industry_prediction/make_leaderboard.py 

Citation

@misc{sugiura2025edinet,
  author={Issa Sugiura and Takashi Ishida and Taro Makino and Chieko Tazuke and Takanori Nakagawa and Kosuke Nakago and David Ha},
  title={{EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements}},
  year={2025},
  eprint={2506.08762},
  archivePrefix={arXiv},
  primaryClass={q-fin.ST},
  url={https://arxiv.org/abs/2506.08762}, 
}

About

Evaluating the performance of LLMs on Japanese challenging financial tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages