This repository provides an implementation for solving the ARC-AGI-2 public evaluation set with Multi-LLM AB-MCTS, leveraging frontier LLMs. It is powered by TreeQuest. See our blog post and our paper for details.
-
Clone the repository with its submodules:
git clone --recurse-submodules https://github.com/SakanaAI/ab-mcts-arc2.git cd ab-mcts-arc2
-
Install
parallel
,graphviz
andgraphviz-dev
:For Mac users,
brew install parallel graphviz
For Linux users, use your distribution's package manager, e.g.,
sudo apt install parallel graphviz graphviz-dev
-
Install dependencies using
uv
:uv sync
If you are a Mac user and encountered an error in
pygraphviz
installation, please set env vars accordingly:CFLAGS="-I $(brew --prefix graphviz)/include" LDFLAGS="-L $(brew --prefix graphviz)/lib" uv sync
The experiments can be run using the run_experiments.sh
script, which executes the ARC-AGI-2 problems in parallel.
./scripts/run_experiments.sh
Key parameters (You can modify these parameters directly in run_experiments.sh
):
EXP_ID
: Experiment name you can configure to distinguish several experiementsMAX_NUM_NODES
: Maximum number of nodes to expand in the search treeALGO_CLASS_NAME
: Algorithm class to use (default:ABMCTSA
)DIST_TYPE
: Distribution type for the algorithm (default:beta
)N_JOBS
: Number of parallel jobs to runINDICES_FILE
: Path to a txt file which lists task ids to be solved in this experiemnt
Each experiment attempts to solve a problem from the ARC-AGI-2 benchmark. It uses LLMs to generate solutions, which will later be evaluated to get the final results. You can see the logs in the outputs
directory.
After running the experiments, you can evaluate the results using:
./scripts/run_eval.sh
This script:
- Processes the experimental results using
eval/proc_results.py
- Generates visualization plots using
eval/visualize.py
The final result plots are generated in the outputs/plots
directory.
The LLM configuration file is located at experiments/arc2/configs/config.yaml
. Multi-LLM AB-MCTS uses the LLMs listed in this file, applying the specified temperature setting for each.
- experiments/arc2
run.py
- The main script that leverages TreeQuest to generate answers for ARC-AGI-2 problems.prompt.py
- Contains the prompts used to instruct LLMs to solve ARC-AGI-2 problems or refine existing answers based on feedback.
- eval
proc_results.py
- An evaluation script that processes the generated answers to produce the final results.visualize.py
- A visualization script for generating result plots.
If you use this code in your research, please cite our paper:
@article{inoue2025wider,
title={Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search},
author={Inoue, Yuichi and Misaki, Kou and Imajuku, Yuki and Kuroki, So and Nakamura, Taishi and Akiba, Takuya},
journal={arXiv preprint arXiv:2503.04412},
year={2025}
}