-
Notifications
You must be signed in to change notification settings - Fork 0
feat(ai): add evaluation API reporting #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
commit: |
packages/ai/src/evals/eval.ts
Outdated
}); | ||
|
||
afterAll(async (suite) => { | ||
console.log('afterAll'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
console.log('afterAll'); |
packages/ai/src/evals/eval.ts
Outdated
successCases, | ||
erroredCases, | ||
durationMs, | ||
scorers: scorerNames, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scorers names could be collected during initialization of the evaluation, I think it doesn't needed to be part of the patch request.
packages/ai/src/evals/eval.ts
Outdated
|
||
type EvaluationStatus = 'running' | 'completed' | 'errored' | 'cancelled'; | ||
|
||
const postCreateEvaluation = async (payload: CreateEvaluationPayload): Promise<Response | null> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could extract the API calls to Axiom into a separate service file, there is already eval.service.ts
, we can use it.
…e evaluation creation logic
// aggregate success and scores | ||
successCases++; | ||
for (const s of scoreList) { | ||
const value = Number((s as unknown as { score: number }).score); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let evalId = ''; // get traceId | ||
let anyCaseFailed = false; | ||
const suiteStart = performance.now(); | ||
let successCases = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in afterAll()
we have access to the suite along with its children, I would say its safer to loop over the suite tasks and check state of each instead of counting them this way.
Another q: are these numbers going to be used in the UI?
No description provided.