This repo contains cookbooks demonstrating evaluations of AI Agents using the judgeval
package implemented by Judgment Labs.
Before running these examples, make sure you have:
-
Installed the latest version of the Judgeval package:
pip install judgeval
-
Set up your Judgeval API key and organization ID as environment variables:
export JUDGMENT_API_KEY="your_api_key" export JUDGMENT_ORG_ID="your_org_id"
To get your API key and Organization ID, make an account on the Judgment Labs platform.
Try Out | Notebook | Description |
---|---|---|
RL | Wikipedia Racer | Train agents with reinforcement learning |
Online Monitoring | Research Agent | Monitor agent behavior in production |
Custom Scorers | HumanEval | Build custom evaluators for your agents |
Offline Testing | [Get Started For Free] | Compare how different prompts, models, or agent configs affect performance across ANY metric |
You can find a list of video tutorials for Judgeval use cases.