ModelAuditor Agent

AI-powered model auditing agent with multi-agent debate for robust evaluation of machine learning models.

Setup

This repository has been tested extensively with Python 3.10.15. Typical install time via uv is less than a minute.

Using uv (recommended)

uv sync
uv run python main.py --model resnet50 --dataset CIFAR10 --weights path/to/weights.pth

Using pip

pip install -e .
python main.py --model resnet50 --dataset CIFAR10 --weights path/to/weights.pth

Medical AI dependencies (optional)

uv sync --extra medical  # or pip install -e ".[medical]"

Usage

General Usage

python main.py --model resnet50 --dataset CIFAR10 --weights models/model.pth

Medical Models

# ISIC skin lesion classification
python main.py --model siim-isic --dataset isic --weights models/isic/model.pth

# HAM10000 dataset
python main.py --model deepderm --dataset ham10000 --weights models/ham10000.pth

Toy Example

We prepared a small toy model, trained on CIFAR10 so the Auditor can be tested. All that is needed is a valid Anthropic API Key as can be seen below (see section 'Environment Variables').

python main.py --model resnet18 --dataset CIFAR10 --weights examples/cifar10/cifar10.pth

Expected runtime varies depending on user response speed and subset time but should take less than 10 minutes in total.

Options

--subset N: Use N samples for faster evaluation
--no-debate: Disable multi-agent debate
--single-agent: Use single agent instead of multi-agent debate
--device: Specify device (cpu, cuda, mps)

Environment Variables

Set your API keys:

export ANTHROPIC_API_KEY="your-key"
export OPENAI_API_KEY="your-key"  # if using non-Anthropic models

Project Structure

main.py - Interactive model auditor with multi-agent debate
testbench.py - Automated evaluation script
utils/agent.py - Multi-agent conversation system
architectures/ - Custom model architectures
prompts/ - System prompts for different evaluation phases
models/ - Pre-trained model weights
results/ - Evaluation results and conversation logs

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
architectures		architectures
examples/cifar10		examples/cifar10
prompts		prompts
utils		utils
.gitignore		.gitignore
License.md		License.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ModelAuditor Agent

Setup

Using uv (recommended)

Using pip

Medical AI dependencies (optional)

Usage

General Usage

Medical Models

Toy Example

Options

Environment Variables

Project Structure

About

Uh oh!

Releases

Packages

Languages

License

MLO-lab/ModelAuditor

Folders and files

Latest commit

History

Repository files navigation

ModelAuditor Agent

Setup

Using uv (recommended)

Using pip

Medical AI dependencies (optional)

Usage

General Usage

Medical Models

Toy Example

Options

Environment Variables

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages