Skip to content

Llava BF16 and FP16 inference accuracy got out of memory #1277

@mengfei25

Description

@mengfei25

🐛 Describe the bug

Ever pass in July 2024

python benchmarks/dynamo/torchbench.py --accuracy --bfloat16 -d xpu -n10 --inference --only llava --backend=inductor

xpu  eval  llava                              
Traceback (most recent call last):
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 4886, in run
    ) = runner.load_model(
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/torchbench.py", line 372, in load_model
    self.validate_model(model, example_inputs)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2747, in validate_model
    model = self.deepcopy_model(model)
  File "/home/sdp/actions-runner/_work/torch-xpu-ops/pytorch/benchmarks/dynamo/common.py", line 2707, in deepcopy_model
    return copy.deepcopy(model)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/home/sdp/miniforge3/envs/e2e_ci/lib/python3.10/site-packages/torch/nn/parameter.py", line 68, in __deepcopy__
    self.data.clone(memory_format=torch.preserve_format), self.requires_grad
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 48.00 GiB. Of the allocated memory 47.92 GiB is allocated by PyTorch, and 12.48 MiB is reserved by PyTorch but unallocated. Please use `empty_cache` to release all unoccupied cached memory.

eager_fail_to_run


Versions

Envirnoments:
Device: PVC 1100
torch-xpu-ops: 18bcd9a
python: 3.10
TRITON_COMMIT_ID: e98b6fcb8df5b44eb0d0addb6767c573d37ba024
TORCH_COMMIT_ID: b9fbd65dfd5e703bacbc6c25258d1215108b4faf
TORCHBENCH_COMMIT_ID: 766a5e3a189384659fd35a68c3b17b88c761aaac
TORCHVISION_COMMIT_ID: d23a6e1664d20707c11781299611436e1f0c104f
TORCHAUDIO_COMMIT_ID: b6d4675c7aedc53ba04f3f55786aac1de32be6b4
DRIVER_VERSION: 1.23.10.49.231129.50 (803.61)
KERNEL_VERSION: 5.15.0-73-generic #80 SMP Mon May 15 15:18:26 UTC 2023
BUNDLE_VERSION: 2025.0.1.20241113 (DL-Essential 2025.0.1)
OS_PRETTY_NAME: Ubuntu 22.04.2 LTS
GCC_VERSION: 11

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions