EDINET-Bench

📚 Paper | 📝 Blog | 📁 Dataset | 🧑‍💻 Code

This code can be used to evaluate LLMs on EDINET-Bench, a Japanese financial benchmark designed to evaluate the performance of LLMs on challenging financial tasks including accounting fraud detection, earnings forecasting, and industry prediction. This dataset is built leveraging EDINET, a platform managed by the Financial Services Agency (FSA) of Japan that provides access to disclosure documents such as securities reports.

Overview of EDINET-Bench.

For the dataset construction code, please visit https://github.com/SakanaAI/edinet2dataset.

Install

Install the dependencies using uv.

uv sync

You also need to configure the API keys for each LLM provider in the .env file.

Evaluation

Accounting Fraud Detection and Earnings Forecast

Use Claude 3.5 Sonnet to predict whether a report is fraudulent based on the Balance Sheet (BS), Cash Flow (CF), Profit and Loss (PL), and summary items from annual reports.

$ python src/edinet_bench/predict.py --task fraud_detection --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Use logistic model as a baseline.

$ python src/edinet_bench/logistic.py --task earnings_forecast

Create a leaderboard for each model.

$ python src/edinet_bench/make_leaderboard.py --task fraud_detection

Industry Prediction

Predict a company's industry type (e.g., Banking) based on its current annual report.

$ python src/edinet_bench/industry_prediction/predict.py --model claude-3-5-sonnet-20241022 --sheets bs cf pl summary

Create a leaderboard for each model.

$ python src/edinet_bench/industry_prediction/make_leaderboard.py

Citation

@misc{sugiura2025edinet,
  author={Issa Sugiura and Takashi Ishida and Taro Makino and Chieko Tazuke and Takanori Nakagawa and Kosuke Nakago and David Ha},
  title={{EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements}},
  year={2025},
  eprint={2506.08762},
  archivePrefix={arXiv},
  primaryClass={q-fin.ST},
  url={https://arxiv.org/abs/2506.08762}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
prompt		prompt
src/edinet_bench		src/edinet_bench
.env.sample		.env.sample
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EDINET-Bench

Install

Evaluation

Accounting Fraud Detection and Earnings Forecast

Industry Prediction

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

SakanaAI/EDINET-Bench

Folders and files

Latest commit

History

Repository files navigation

EDINET-Bench

Install

Evaluation

Accounting Fraud Detection and Earnings Forecast

Industry Prediction

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages