Skip to content

Vision_maskrcnn RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation #1264

@mengfei25

Description

@mengfei25

🐛 Describe the bug

python benchmarks/dynamo/torchbench.py --accuracy --float32 -d xpu -n10 --training  --only vision_maskrcnn --backend=inductor

xpu  train vision_maskrcnn                    
Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2751, in validate_model
    self.model_iter_fn(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 462, in forward_and_backward_pass
    self.grad_scaler.scale(loss).backward()
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/_tensor.py", line 648, in backward
    torch.autograd.backward(
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: roi_align_backward_kernel_xpu does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4886, in run
    ) = runner.load_model(
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 372, in load_model
    self.validate_model(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2753, in validate_model
    raise RuntimeError("Eager run failed") from e
RuntimeError: Eager run failed

eager_fail_to_run

Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: 18bcd9a
python: 3.10
TRITON_COMMIT_ID: e98b6fcb8df5b44eb0d0addb6767c573d37ba024
TORCH_COMMIT_ID: b9fbd65dfd5e703bacbc6c25258d1215108b4faf
TORCHBENCH_COMMIT_ID: 766a5e3a189384659fd35a68c3b17b88c761aaac
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: b6d4675c7aedc53ba04f3f55786aac1de32be6b4
DRIVER_VERSION: 1.23.10.49.231129.50 (803.61)
KERNEL_VERSION: 5.15.0-73-generic #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113 (DL-Essential 2025.0.1)
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions