Skip to content

Add BenchmarkEvaluator with basic precision/recall computation #1870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

Muhammedswalihu
Copy link

Summary

This PR introduces a utility class BenchmarkEvaluator in supervision/metrics/benchmark.py to support benchmarking object detection results across different datasets or models.

Features

  • Computes basic precision and recall
  • Accepts Detections objects for ground truth and prediction
  • Optional support for class mapping and IoU thresholding (future extensions)
  • Includes a unit test at tests/metrics/test_benchmark.py

Motivation

Addresses Issue #1778: Improving object detection benchmarking process for unrelated datasets.

Let me know if you'd like me to extend this in future PRs with:

  • mAP, F1, or per-class metrics
  • Confusion matrix visualization
  • Colab notebook example

Thanks for the opportunity to contribute!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Muhammed Swalihu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Muhammedswalihu
Copy link
Author

Hi @SkalskiP @onuralpszr — I've submitted this PR for the BenchmarkEvaluator (Issue #1778 ). Let me know if you'd like me to fix the pre-commit error or extend this further. Thanks for reviewing!

@soumik12345
Copy link
Contributor

Hi @Muhammedswalihu, this seems like a really valuable feature!
Can you please replace the placeholder logic with a working one, provide a working example and testcases; and we can review the PR.

@Muhammedswalihu
Copy link
Author

Hi @soumik12345 , thanks for the review!

I’ll go ahead and:

Replace the placeholder logic in BenchmarkEvaluator with full precision/recall/mAP computation,

Add a working demo example (maybe in a Colab notebook for clarity), and

Improve the test coverage with more edge cases and per-class evaluation.

Let me know if there’s anything specific you’d like to see included. Appreciate the opportunity — excited to take this further!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Muhammedswalihu
Copy link
Author

Hi @soumik12345, I've added a Colab-style demo notebook BenchmarkEvaluator_Demo.ipynb!

It includes:

  • How to import and use the BenchmarkEvaluator

  • Per-class precision and recall visualization

  • A visual example comparing predicted and ground truth bounding boxes

This should help users understand and adopt the module more easily.

Let me know if you'd like me to polish or extend this notebook further!

Copy link
Contributor

@soumik12345 soumik12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Muhammedswalihu, thanks for providing the PoC!
Please feel free to proceed with the actual implementation.
Also, there's no need to commit the notebook to supervision, you can just attach a colab notebook in a comment when the PR is ready for review with the complete logic.

Comment on lines +26 to +29
# TODO: Add class alignment, matching using IoU
tp = len(self.predictions.xyxy) # Placeholder
fp = 0
fn = len(self.ground_truth.xyxy) - tp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is incomplete, please add the correct logic to compute precision and recall.

from supervision.metrics.benchmark import BenchmarkEvaluator


def test_basic_precision_recall():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too seems like a placeholder test; please proceed with the implementation and add comprehensive unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants