Skip to content

Conversation

mbertrand
Copy link
Member

@mbertrand mbertrand commented Sep 26, 2025

What are the relevant tickets?

Related to https://github.com/mitodl/hq/issues/7605

Description (What does it do?)

  • Adds a few options to the rag_evaluation management command:
    • a data file to read test cases from
    • an output file to write to
    • a timeout value to abort an individual test case (test cases were sometimes hanging and preventing the evalutation from completing)
    • a concurrency value to specify how many tests should run in parallel
  • Adds error and async configs to allow the evaluation to proceed if individual test cases raise an exception, and to allow tests to run simultaneously
  • Fixes incorrect summary statistics caused by certain metrics (i.e. hallucination) having an inverse score where 0 = best, 1 = worst, instead of the other way around.

How can this be tested?

Run the mgmt command and try out some of the new options, for example:

./manage.py rag_evaluation --models openai/gpt-5-mini,openai/gpt-4o-mini  \
  --eval_model gpt-4o-mini \
  --bots syllabus \
  --data-file test_json/rag_evaluation.json \
  --output-file rag_eval_results.md  \
  --timeout 120 \
  --max-concurrent 4

@shanbady shanbady self-requested a review September 29, 2025 14:16
@mbertrand mbertrand force-pushed the mb/eval_enhancements branch 3 times, most recently from 82c0082 to 2d6c16a Compare September 29, 2025 17:24
Copy link
Contributor

@shanbady shanbady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. It might be worth updating the readme with the new options

Also I was seeing this error show up a few times when running the eval for the tutorbot ( although the evaluation seemed to complete). wasnt sure if its expected:

[2025-09-29 18:12:07] ERROR 103 [ai_chatbots.chatbots] chatbots.py:642 - [fc0e0f7bf0a2] - Error running AI agent
Traceback (most recent call last):
  File "/src/ai_chatbots/chatbots.py", line 636, in get_completion
    await create_tutorbot_output_and_checkpoints(
        self.thread_id, json_output, self.edx_module_id
    )
  File "/opt/venv/lib/python3.13/site-packages/asgiref/sync.py", line 468, in __call__
    ret = await asyncio.shield(exec_coro)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/asgiref/current_thread_executor.py", line 40, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.13/site-packages/channels/db.py", line 13, in thread_handler
    return super().thread_handler(loop, *args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/asgiref/sync.py", line 522, in thread_handler
    return func(*args, **kwargs)
  File "/src/ai_chatbots/api.py", line 617, in create_tutorbot_output_and_checkpoints
    session=UserChatSession.objects.get(thread_id=thread_id),
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/django/db/models/query.py", line 637, in get
    raise self.model.DoesNotExist(
        "%s matching query does not exist." % self.model._meta.object_name
    )
ai_chatbots.models.UserChatSession.DoesNotExist: UserChatSession matching query does not exist.

@mbertrand
Copy link
Member Author

mbertrand commented Sep 29, 2025

Good catch on the exception, I hadn't tested the tutorbot evaluation. This was caused by bypassing the TutorbotConsumer class which creates a DjangoCheckpointer that typically creates the UserChatSession. I updated the chatbot function to try creating a TutorBotSession only if the bot's checkpointer is not None.

Also updated the README.

Copy link
Contributor

@shanbady shanbady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@mbertrand mbertrand merged commit d4417e7 into main Sep 30, 2025
7 checks passed
@odlbot odlbot mentioned this pull request Oct 6, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants