📚 Paper | 📝 Blog | 📁 Dataset | 🧑💻 Code
This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.
Overview of EDINET-Bench.For the dataset construction code, please visit https://github.com/SakanaAI/edinet2dataset.
Install the dependencies using uv.
uv sync
You also need to configure the API keys for each LLM provider in the .env file.
Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports.
$ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
Use logistic model as a baseline.
$ python src/edinet_bench/logistic.py --task earnings_forecast
Create a leaderboard for each model.
$ python src/edinet_bench/make_leaderboard.py --task fraud_detection
Predict a company's industry type (e.g., Banking) based on its current annual report.
$ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary
Create a leaderboard for each model.
$ python src/edinet_bench/industry_prediction/make_leaderboard.py
@misc{sugiura2025edinet,
author={Issa Sugiura and Takashi Ishida and Taro Makino and Chieko Tazuke and Takanori Nakagawa and Kosuke Nakago and David Ha},
title={{EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements}},
year={2025},
eprint={2506.08762},
archivePrefix={arXiv},
primaryClass={q-fin.ST},
url={https://arxiv.org/abs/2506.08762},
}