Skip to content

SakanaAI/ab-mcts-arc2

Repository files navigation

Multi-LLM AB-MCTS for ARC-AGI-2

Paper Blog Algorithm

Multi-LLM AB-MCTS Coverage

This repository provides an implementation for solving the ARC-AGI-2 public evaluation set with Multi-LLM AB-MCTS, leveraging frontier LLMs. It is powered by TreeQuest. See our blog post and our paper for details.

Installation

  1. Clone the repository with its submodules:

    git clone --recurse-submodules https://github.com/SakanaAI/ab-mcts-arc2.git
    cd ab-mcts-arc2
  2. Install parallel, graphviz and graphviz-dev:

    For Mac users,

    brew install parallel graphviz

    For Linux users, use your distribution's package manager, e.g.,

    sudo apt install parallel graphviz graphviz-dev
  3. Install dependencies using uv:

    uv sync

    If you are a Mac user and encountered an error in pygraphviz installation, please set env vars accordingly:

    CFLAGS="-I $(brew --prefix graphviz)/include" LDFLAGS="-L $(brew --prefix graphviz)/lib" uv sync

Running Experiments

The experiments can be run using the run_experiments.sh script, which executes the ARC-AGI-2 problems in parallel.

./scripts/run_experiments.sh

Key parameters (You can modify these parameters directly in run_experiments.sh):

  • EXP_ID: Experiment name you can configure to distinguish several experiements
  • MAX_NUM_NODES: Maximum number of nodes to expand in the search tree
  • ALGO_CLASS_NAME: Algorithm class to use (default: ABMCTSA)
  • DIST_TYPE: Distribution type for the algorithm (default: beta)
  • N_JOBS: Number of parallel jobs to run
  • INDICES_FILE: Path to a txt file which lists task ids to be solved in this experiemnt

Each experiment attempts to solve a problem from the ARC-AGI-2 benchmark. It uses LLMs to generate solutions, which will later be evaluated to get the final results. You can see the logs in the outputs directory.

Running Evaluation

After running the experiments, you can evaluate the results using:

./scripts/run_eval.sh

This script:

  1. Processes the experimental results using eval/proc_results.py
  2. Generates visualization plots using eval/visualize.py

The final result plots are generated in the outputs/plots directory.

LLM Configs

The LLM configuration file is located at experiments/arc2/configs/config.yaml. Multi-LLM AB-MCTS uses the LLMs listed in this file, applying the specified temperature setting for each.

Folder Layout

  • experiments/arc2
    • run.py - The main script that leverages TreeQuest to generate answers for ARC-AGI-2 problems.
    • prompt.py - Contains the prompts used to instruct LLMs to solve ARC-AGI-2 problems or refine existing answers based on feedback.
  • eval
    • proc_results.py - An evaluation script that processes the generated answers to produce the final results.
    • visualize.py - A visualization script for generating result plots.

Citation

If you use this code in your research, please cite our paper:

@article{inoue2025wider,
  title={Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search},
  author={Inoue, Yuichi and Misaki, Kou and Imajuku, Yuki and Kuroki, So and Nakamura, Taishi and Akiba, Takuya},
  journal={arXiv preprint arXiv:2503.04412},
  year={2025}
}

License

Apache License 2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published