Replies: 3 comments
-
yes i agree |
Beta Was this translation helpful? Give feedback.
0 replies
-
@usama-openai could we get an eval run, along w/ log containing accuracy for each eval, at each model release? And put the log files in a repo? I would prefer to not burn my own cash to see where eval performance is currently at for each of the models/versions. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
And which evals are currently failing? Maybe we could get a dashboard or something which gets updated once a week?
Beta Was this translation helpful? Give feedback.
All reactions