Add BenchmarkEvaluator with basic precision/recall computation #1870

Muhammedswalihu · 2025-07-06T17:07:55Z

Summary

This PR introduces a utility class BenchmarkEvaluator in supervision/metrics/benchmark.py to support benchmarking object detection results across different datasets or models.

Features

Computes basic precision and recall
Accepts Detections objects for ground truth and prediction
Optional support for class mapping and IoU thresholding (future extensions)
Includes a unit test at tests/metrics/test_benchmark.py

Motivation

Addresses Issue #1778: Improving object detection benchmarking process for unrelated datasets.

Let me know if you'd like me to extend this in future PRs with:

mAP, F1, or per-class metrics
Confusion matrix visualization
Colab notebook example

Thanks for the opportunity to contribute!

CLAassistant · 2025-07-06T17:08:01Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Muhammed Swalihu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Muhammedswalihu · 2025-07-06T17:16:02Z

Hi @SkalskiP @onuralpszr — I've submitted this PR for the BenchmarkEvaluator (Issue #1778 ). Let me know if you'd like me to fix the pre-commit error or extend this further. Thanks for reviewing!

soumik12345 · 2025-07-08T10:03:13Z

Hi @Muhammedswalihu, this seems like a really valuable feature!
Can you please replace the placeholder logic with a working one, provide a working example and testcases; and we can review the PR.

Muhammedswalihu · 2025-07-08T21:06:54Z

Hi @soumik12345 , thanks for the review!

I’ll go ahead and:

Replace the placeholder logic in BenchmarkEvaluator with full precision/recall/mAP computation,

Add a working demo example (maybe in a Colab notebook for clarity), and

Improve the test coverage with more edge cases and per-class evaluation.

Let me know if there’s anything specific you’d like to see included. Appreciate the opportunity — excited to take this further!

review-notebook-app · 2025-07-08T21:26:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Muhammedswalihu · 2025-07-08T21:30:34Z

Hi @soumik12345, I've added a Colab-style demo notebook BenchmarkEvaluator_Demo.ipynb!

It includes:

How to import and use the BenchmarkEvaluator
Per-class precision and recall visualization
A visual example comparing predicted and ground truth bounding boxes

This should help users understand and adopt the module more easily.

Let me know if you'd like me to polish or extend this notebook further!

soumik12345

Hi @Muhammedswalihu, thanks for providing the PoC!
Please feel free to proceed with the actual implementation.
Also, there's no need to commit the notebook to supervision, you can just attach a colab notebook in a comment when the PR is ready for review with the complete logic.

soumik12345 · 2025-07-09T14:03:12Z

supervision/metrics/benchmark.py

+        # TODO: Add class alignment, matching using IoU
+        tp = len(self.predictions.xyxy)  # Placeholder
+        fp = 0
+        fn = len(self.ground_truth.xyxy) - tp


The logic here is incomplete, please add the correct logic to compute precision and recall.

soumik12345 · 2025-07-09T14:04:18Z

tests/metrics/test_benchmark.py

+from supervision.metrics.benchmark import BenchmarkEvaluator
+
+
+def test_basic_precision_recall():


This too seems like a placeholder test; please proceed with the implementation and add comprehensive unit tests.

Add BenchmarkEvaluator with unit test for precision and recall

9cf68c7

Muhammedswalihu requested review from SkalskiP and onuralpszr as code owners July 6, 2025 17:07

fix(pre_commit): 🎨 auto format pre-commit hooks

3a748e6

Add demo notebook for BenchmarkEvaluator

bf7cd7a

fix(pre_commit): 🎨 auto format pre-commit hooks

d03470b

soumik12345 requested changes Jul 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BenchmarkEvaluator with basic precision/recall computation #1870

Add BenchmarkEvaluator with basic precision/recall computation #1870

Uh oh!

Muhammedswalihu commented Jul 6, 2025

Uh oh!

CLAassistant commented Jul 6, 2025

Uh oh!

Muhammedswalihu commented Jul 6, 2025

Uh oh!

soumik12345 commented Jul 8, 2025

Uh oh!

Muhammedswalihu commented Jul 8, 2025

Uh oh!

review-notebook-app bot commented Jul 8, 2025

Uh oh!

Muhammedswalihu commented Jul 8, 2025

Uh oh!

soumik12345 left a comment

Uh oh!

soumik12345 Jul 9, 2025

Uh oh!

soumik12345 Jul 9, 2025

Uh oh!

Uh oh!

		from supervision.metrics.benchmark import BenchmarkEvaluator


		def test_basic_precision_recall():

Add BenchmarkEvaluator with basic precision/recall computation #1870

Are you sure you want to change the base?

Add BenchmarkEvaluator with basic precision/recall computation #1870

Uh oh!

Conversation

Muhammedswalihu commented Jul 6, 2025

Summary

Features

Motivation

Uh oh!

CLAassistant commented Jul 6, 2025

Uh oh!

Muhammedswalihu commented Jul 6, 2025

Uh oh!

soumik12345 commented Jul 8, 2025

Uh oh!

Muhammedswalihu commented Jul 8, 2025

Uh oh!

review-notebook-app bot commented Jul 8, 2025

Uh oh!

Muhammedswalihu commented Jul 8, 2025

Uh oh!

soumik12345 left a comment

Choose a reason for hiding this comment

Uh oh!

soumik12345 Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

soumik12345 Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!