Skip to content

OOM (host) when running AWQ #1369

@zjnyly

Description

@zjnyly

Describe the bug
Running out of host memory when calibrating using AWQModifier.
The model I use is llama-3-instruct. Host memory is 126G.
Is this a exsisting problem that calibrating will consume too much memory? Thanks!

(py310_new) zjnyly@ric-MS-7D25:~/LLMs$ python awq_one_shot.py
INFO 04-22 23:14:37 __init__.py:190] Automatically detected platform cuda.
2025-04-22:23:14:38,762 INFO     [modeling.py:957] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.85it/s]
Repo card metadata block was not found. Setting CardData to empty.
2025-04-22:23:14:42,634 WARNING  [repocard.py:108] Repo card metadata block was not found. Setting CardData to empty.
2025-04-22T23:14:44.900211+0800 | reset | INFO - Compression lifecycle reset
Logging all LLM Compressor modifier-level logs to sparse_logs/22-04-2025_23.14.44.log
2025-04-22:23:14:44,900 INFO     [logger.py:391] Logging all LLM Compressor modifier-level logs to sparse_logs/22-04-2025_23.14.44.log
2025-04-22T23:14:44.900749+0800 | from_modifiers | INFO - Creating recipe from modifiers
2025-04-22T23:14:46.201364+0800 | _calibrate | INFO - Running AWQModifier calibration with 256 samples...
 56%|█████████████████████████████████████████████▏                                   | 143/256 [00:53<00:43,  2.62it/s] 
Killed

Environment
Include all relevant environment information:

  1. OS ubuntu 18.04
  2. Python version 3.10.16
  3. LLM Compressor version 0.5.1.dev26+g76af7af2
  4. compressed-tensors 0.9.4a20250414

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions