[fix][test] clear cuda cache before unittests automatically #5121

omera-nv · 2025-06-11T07:42:22Z

Clear torch CUDA cache before unittests

We've encountered a case in which a test failed due to OOM errors, that were resolved by adding torch.cuda.empty_cache at the start of the test. This PR adds this to all unittests, so each one starts with an empty torch cuda cache and can make full use of the available device memory.

omera-nv · 2025-06-11T07:43:35Z

/bot run

tests/unittest/conftest.py

tensorrt-cicd · 2025-06-11T07:49:07Z

PR_Github #8436 [ run ] triggered by Bot

omera-nv · 2025-06-11T07:51:40Z

/bot kill

omera-nv · 2025-06-11T07:52:03Z

/bot run

tensorrt-cicd · 2025-06-11T07:56:46Z

PR_Github #8441 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-11T07:56:47Z

PR_Github #8436 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-11T07:57:18Z

PR_Github #8441 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit b031afa

tensorrt-cicd · 2025-06-11T07:57:57Z

PR_Github #8445 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-11T13:49:09Z

PR_Github #8445 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6116 completed with status: 'FAILURE'

omera-nv · 2025-06-11T14:13:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-11T14:19:15Z

PR_Github #8501 [ run ] triggered by Bot

omera-nv · 2025-06-11T19:40:07Z

/bot run

tensorrt-cicd · 2025-06-11T19:48:11Z

PR_Github #8537 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-12T02:43:17Z

PR_Github #8537 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6190 completed with status: 'SUCCESS'

omera-nv · 2025-06-17T02:29:03Z

/bot run

tensorrt-cicd · 2025-06-17T02:35:44Z

PR_Github #9095 [ run ] triggered by Bot

omera-nv · 2025-06-17T02:44:09Z

/bot kill

tensorrt-cicd · 2025-06-17T02:49:43Z

PR_Github #9104 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-17T02:49:46Z

PR_Github #9095 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-17T02:50:15Z

PR_Github #9104 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 647f992

omera-nv · 2025-06-17T10:46:10Z

/bot run

tensorrt-cicd · 2025-06-17T10:51:31Z

PR_Github #9193 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T11:04:15Z

PR_Github #9193 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6737 completed with status: 'FAILURE'

omera-nv · 2025-06-17T11:54:45Z

/bot run

tensorrt-cicd · 2025-06-17T12:00:54Z

PR_Github #9202 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T12:13:49Z

PR_Github #9202 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6744 completed with status: 'FAILURE'

omera-nv · 2025-06-17T19:23:25Z

/bot run

tensorrt-cicd · 2025-06-17T19:28:34Z

PR_Github #9244 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-17T19:41:58Z

PR_Github #9244 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6780 completed with status: 'FAILURE'

omera-nv · 2025-06-17T23:57:27Z

/bot run

tensorrt-cicd · 2025-06-18T00:02:48Z

PR_Github #9250 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T00:20:04Z

PR_Github #9250 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6786 completed with status: 'FAILURE'

omera-nv · 2025-06-18T04:01:22Z

/bot run

tensorrt-cicd · 2025-06-18T04:06:57Z

PR_Github #9306 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T04:24:17Z

PR_Github #9306 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #6829 completed with status: 'FAILURE'

omera-nv · 2025-06-18T18:01:20Z

/bot run

tensorrt-cicd · 2025-06-18T18:07:04Z

PR_Github #9410 [ run ] triggered by Bot

Signed-off-by: Omer Ullman Argov <[email protected]>

omera-nv · 2025-06-18T18:15:50Z

/bot run

tensorrt-cicd · 2025-06-18T18:22:02Z

PR_Github #9411 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T18:22:04Z

PR_Github #9410 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-18T21:02:19Z

PR_Github #9411 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6904 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

omera-nv requested review from kaiyux and tomeras91 June 11, 2025 07:42

tomeras91 reviewed Jun 11, 2025

View reviewed changes

tests/unittest/conftest.py Outdated Show resolved Hide resolved

omera-nv changed the title ~~[fix] clear cuda cache before unittests automatically~~ [fix][test] clear cuda cache before unittests automatically Jun 11, 2025

tomeras91 approved these changes Jun 11, 2025

View reviewed changes

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from c676235 to b031afa Compare June 11, 2025 07:51

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from b031afa to 8102c47 Compare June 11, 2025 19:38

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from 8102c47 to 647f992 Compare June 16, 2025 20:51

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from 647f992 to 85e6939 Compare June 17, 2025 10:44

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from 5efca1d to 1abca40 Compare June 17, 2025 19:23

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from 1abca40 to 38eab7d Compare June 17, 2025 23:51

clear cuda cache before unittests automatically

09929bd

Signed-off-by: Omer Ullman Argov <[email protected]>

omera-nv force-pushed the fix/clear_cuda_cache_before_unittests branch from ed2b4ab to 09929bd Compare June 18, 2025 18:15

tburt-nv approved these changes Jun 18, 2025

View reviewed changes

omera-nv merged commit 0b6d005 into NVIDIA:main Jun 18, 2025
3 checks passed

[fix][test] clear cuda cache before unittests automatically #5121

[fix][test] clear cuda cache before unittests automatically #5121

Uh oh!

Conversation

omera-nv commented Jun 11, 2025

Clear torch CUDA cache before unittests

Uh oh!

omera-nv commented Jun 11, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

omera-nv commented Jun 11, 2025

Uh oh!

omera-nv commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

omera-nv commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

omera-nv commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 11, 2025

Uh oh!

tensorrt-cicd commented Jun 12, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

omera-nv commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

omera-nv commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

omera-nv commented Jun 18, 2025

Uh oh!