feat(ai): add evaluation API reporting #91

gabrielelpidio · 2025-09-30T03:19:58Z

No description provided.

pkg-pr-new · 2025-09-30T03:21:36Z

npm i https://pkg.pr.new/axiomhq/ai/axiom@91

commit: e78c9a4

thesollyz · 2025-09-30T10:07:14Z

packages/ai/src/evals/eval.ts

      });

      afterAll(async (suite) => {
+        console.log('afterAll');


Suggested change

console.log('afterAll');

thesollyz · 2025-09-30T10:08:50Z

packages/ai/src/evals/eval.ts

+          successCases,
+          erroredCases,
+          durationMs,
+          scorers: scorerNames,


scorers names could be collected during initialization of the evaluation, I think it doesn't needed to be part of the patch request.

thesollyz · 2025-09-30T14:32:18Z

packages/ai/src/evals/eval.ts

+
+type EvaluationStatus = 'running' | 'completed' | 'errored' | 'cancelled';
+
+const postCreateEvaluation = async (payload: CreateEvaluationPayload): Promise<Response | null> => {


We could extract the API calls to Axiom into a separate service file, there is already eval.service.ts, we can use it.

…e evaluation creation logic

c-ehrlich · 2025-10-01T03:44:04Z

packages/ai/src/evals/eval.ts

+            // aggregate success and scores
+            successCases++;
+            for (const s of scoreList) {
+              const value = Number((s as unknown as { score: number }).score);


https://github.com/axiomhq/app/blob/fix-ci-4334534/frontend/lib/dash/util/objects.ts#L13-L23

thesollyz · 2025-10-01T11:29:42Z

packages/ai/src/evals/eval.ts

      let evalId = ''; // get traceId
+      let anyCaseFailed = false;
+      const suiteStart = performance.now();
+      let successCases = 0;


in afterAll() we have access to the suite along with its children, I would say its safer to loop over the suite tasks and check state of each instead of counting them this way.

Another q: are these numbers going to be used in the UI?

feat(ai): add evaluation API reporting

1390f33

gabrielelpidio marked this pull request as draft September 30, 2025 03:20

thesollyz reviewed Sep 30, 2025

View reviewed changes

gabrielelpidio added 2 commits September 30, 2025 17:02

feat(ai): fetcher for evals API

14a1604

feat(ai): add optional baselineId to evaluation API payload and updat…

e78c9a4

…e evaluation creation logic

c-ehrlich reviewed Oct 1, 2025

View reviewed changes

thesollyz reviewed Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai): add evaluation API reporting #91

feat(ai): add evaluation API reporting #91

gabrielelpidio commented Sep 30, 2025

Uh oh!

pkg-pr-new bot commented Sep 30, 2025 •

edited

Loading

Uh oh!

thesollyz Sep 30, 2025

Uh oh!

thesollyz Sep 30, 2025

Uh oh!

thesollyz Sep 30, 2025

Uh oh!

c-ehrlich Oct 1, 2025

Uh oh!

thesollyz Oct 1, 2025

Uh oh!

Uh oh!


		type EvaluationStatus = 'running' \| 'completed' \| 'errored' \| 'cancelled';

		const postCreateEvaluation = async (payload: CreateEvaluationPayload): Promise<Response \| null> => {

feat(ai): add evaluation API reporting #91

Are you sure you want to change the base?

feat(ai): add evaluation API reporting #91

Conversation

gabrielelpidio commented Sep 30, 2025

Uh oh!

pkg-pr-new bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thesollyz Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

thesollyz Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

thesollyz Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

c-ehrlich Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

thesollyz Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pkg-pr-new bot commented Sep 30, 2025 •

edited

Loading