feat(eval): add run locomo eval script #28

Duguce · 2025-07-08T12:22:39Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

The last PR missed the run locomo eval script, submitting a supplement now. @Ki-Seki

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

Unit Test
Test Script (please provide)

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have checked my code and corrected any misspellings

Maintainer Checklist

closes #xxxx (Replace xxxx with the GitHub issue number)
Made sure Checks passed

Copilot

Pull Request Overview

This PR introduces a new shell script to sequentially run all steps of the Locomo evaluation pipeline.

Adds run_locomo_eval.sh to orchestrate ingestion, search, response generation, evaluation, and metric calculation.
Sets default parameters (LIB, VERSION, WORKERS, TOPK) and checks exit codes for each step.

Copilot · 2025-07-08T12:23:36Z

evaluation/scripts/run_locomo_eval.sh

@@ -0,0 +1,44 @@
+#!/bin/bash


Consider adding set -euo pipefail after the shebang to ensure the script exits on any error and detects undefined variables.

Copilot · 2025-07-08T12:23:36Z

evaluation/scripts/run_locomo_eval.sh

+echo "Running locomo_ingestion.py..."
+CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_ingestion.py --lib $LIB --version $VERSION --workers $WORKERS
+if [ $? -ne 0 ]; then
+    echo "Error running locomo_ingestion.py"
+    exit 1
+fi
+
+echo "Running locomo_search.py..."
+CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_search.py --lib $LIB --version $VERSION --top_k $TOPK --workers $WORKERS
+if [ $? -ne 0 ]; then
+    echo "Error running locomo_search.py"
+    exit 1
+fi
+
+echo "Running locomo_responses.py..."
+python scripts/locomo/locomo_responses.py --lib $LIB --version $VERSION
+if [ $? -ne 0 ]; then
+    echo "Error running locomo_responses.py."
+    exit 1
+fi
+
+echo "Running locomo_eval.py..."
+python scripts/locomo/locomo_eval.py --lib $LIB --version $VERSION --workers $WORKERS --num_runs 3
+if [ $? -ne 0 ]; then
+    echo "Error running locomo_eval.py"
+    exit 1
+fi
+
+echo "Running locomo_metric.py..."
+python scripts/locomo/locomo_metric.py --lib $LIB --version $VERSION
+if [ $? -ne 0 ]; then
+    echo "Error running locomo_metric.py"
+    exit 1
+fi


[nitpick] The error-checking and execution pattern is repeated for each script; consider extracting a helper function (e.g., run_step) to reduce duplication and improve readability.

Suggested change

echo "Running locomo_ingestion.py..."

CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_ingestion.py --lib $LIB --version $VERSION --workers $WORKERS

if [ $? -ne 0 ]; then

echo "Error running locomo_ingestion.py"

exit 1

fi

echo "Running locomo_search.py..."

CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_search.py --lib $LIB --version $VERSION --top_k $TOPK --workers $WORKERS

if [ $? -ne 0 ]; then

echo "Error running locomo_search.py"

exit 1

fi

echo "Running locomo_responses.py..."

python scripts/locomo/locomo_responses.py --lib $LIB --version $VERSION

if [ $? -ne 0 ]; then

echo "Error running locomo_responses.py."

exit 1

fi

echo "Running locomo_eval.py..."

python scripts/locomo/locomo_eval.py --lib $LIB --version $VERSION --workers $WORKERS --num_runs 3

if [ $? -ne 0 ]; then

echo "Error running locomo_eval.py"

exit 1

fi

echo "Running locomo_metric.py..."

python scripts/locomo/locomo_metric.py --lib $LIB --version $VERSION

if [ $? -ne 0 ]; then

echo "Error running locomo_metric.py"

exit 1

fi

# Helper function to run a script and check for errors

run_step() {

local script=$1

shift

echo "Running $script..."

"$@"

if [ $? -ne 0 ]; then

echo "Error running $script"

exit 1

fi

}

run_step "locomo_ingestion.py" CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_ingestion.py --lib $LIB --version $VERSION --workers $WORKERS

run_step "locomo_search.py" CUDA_VISIBLE_DEVICES=0 python scripts/locomo/locomo_search.py --lib $LIB --version $VERSION --top_k $TOPK --workers $WORKERS

run_step "locomo_responses.py" python scripts/locomo/locomo_responses.py --lib $LIB --version $VERSION

run_step "locomo_eval.py" python scripts/locomo/locomo_eval.py --lib $LIB --version $VERSION --workers $WORKERS --num_runs 3

run_step "locomo_metric.py" python scripts/locomo/locomo_metric.py --lib $LIB --version $VERSION

Duguce and others added 7 commits July 7, 2025 22:43

feat(eval): add eval dependencies

4ef7418

feat(eval): add configs example

ddad8c1

docs(eval): update README.md

d623791

Merge branch 'MemTensor:dev' into dev

3365de4

feat(eval): remove the dependency (pydantic)

ed68e36

Merge branch 'MemTensor:dev' into dev

4e99031

feat(eval): add run locomo eval script

41368b9

Copilot AI review requested due to automatic review settings July 8, 2025 12:22

Copilot AI reviewed Jul 8, 2025

View reviewed changes

Ki-Seki merged commit 6148813 into MemTensor:dev Jul 8, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): add run locomo eval script #28

feat(eval): add run locomo eval script #28

Uh oh!

Duguce commented Jul 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(eval): add run locomo eval script #28

feat(eval): add run locomo eval script #28

Uh oh!

Conversation

Duguce commented Jul 8, 2025

Description

Type of change

How Has This Been Tested?

Checklist:

Maintainer Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants