-
Notifications
You must be signed in to change notification settings - Fork 11
Example of MOE on CI
Example of green 🟢 CAT because expected 97% success rate is met with 5 failures out of 100 runs.
This is because 97% success is the same as 3 errors out of 100, and according to our statistical calculations table, see line 3 in this CSV
The expected range of errors is between 1 and 5. So the observed experiment measuring 5 failures is within expected range between 1 and 5 with 90% confidence.
Our test asserts that expected 0.97 is within margin of error from 5 failures out of 100, see statistical assertion:
failure_threshold = 0.97
assert generations <= 1 or is_within_expected(
failure_threshold, sum(not result for result in results), generations
), f"Expected {failure_threshold} to be within the confidence interval of the success rate"
CAT #154 - example test run
Artifacts are saved here: CAT-run-154-saved-here.zip

This is because 0.97 is within observed 0.95 within margin of error at 90% confidence.
Out of 5 failures we see two distinct groups:
- with empty list of developers []
- unexpected valid developers
{
"test_name": "test_metrics_100_generations",
"folder_path": "/home/runner/work/continuous-alignment-testing/continuous-alignment-testing/examples/team_recommender/test_runs/test_metrics_100_generations-0314-22_37_21",
"output_file": "fail-92.json",
"metadata_path": "/home/runner/work/continuous-alignment-testing/continuous-alignment-testing/examples/team_recommender/test_runs/test_metrics_100_generations-0314-22_37_21/metadata.json",
"validations": {
"correct_developer_suggested": false,
"no_developer_name_is_hallucinated": true,
"not_empty_response": false,
"valid_json_returned": true
},
"response": {
"developers": []
}
}
And we have an expected list of developers, see acceptable names of developers
acceptable_people = ["Sam Thomas", "Drew Anderson", "Alex Wilson", "Alex Johnson"]
{
"test_name": "test_metrics_100_generations",
"folder_path": "/home/runner/work/continuous-alignment-testing/continuous-alignment-testing/examples/team_recommender/test_runs/test_metrics_100_generations-0314-22_37_21",
"output_file": "fail-87.json",
"metadata_path": "/home/runner/work/continuous-alignment-testing/continuous-alignment-testing/examples/team_recommender/test_runs/test_metrics_100_generations-0314-22_37_21/metadata.json",
"validations": {
"correct_developer_suggested": false,
"no_developer_name_is_hallucinated": true,
"not_empty_response": true,
"valid_json_returned": true
},
"response": {
"developers": [
{
"name": "Jamie Johnson",
"availableStartDate": "2025-06-15T00:00:00Z",
"relevantSkills": [
{
"skill": "Node",
"level": "5"
}
]
},
{
"name": "Blake Johnson",
"availableStartDate": "2025-06-12T00:00:00Z",
"relevantSkills": [
{
"skill": "TypeScript",
"level": "5"
},
{
"skill": "React Native",
"level": "3"
}
]
},
{
"name": "Blake Wilson",
"availableStartDate": "2025-06-24T00:00:00Z",
"relevantSkills": [
{
"skill": "Kotlin",
"level": "3"
}
]
}
]
}
}