-
Notifications
You must be signed in to change notification settings - Fork 292
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Running out of host memory when calibrating using AWQModifier.
The model I use is llama-3-instruct. Host memory is 126G.
Is this a exsisting problem that calibrating will consume too much memory? Thanks!
(py310_new) zjnyly@ric-MS-7D25:~/LLMs$ python awq_one_shot.py
INFO 04-22 23:14:37 __init__.py:190] Automatically detected platform cuda.
2025-04-22:23:14:38,762 INFO [modeling.py:957] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.85it/s]
Repo card metadata block was not found. Setting CardData to empty.
2025-04-22:23:14:42,634 WARNING [repocard.py:108] Repo card metadata block was not found. Setting CardData to empty.
2025-04-22T23:14:44.900211+0800 | reset | INFO - Compression lifecycle reset
Logging all LLM Compressor modifier-level logs to sparse_logs/22-04-2025_23.14.44.log
2025-04-22:23:14:44,900 INFO [logger.py:391] Logging all LLM Compressor modifier-level logs to sparse_logs/22-04-2025_23.14.44.log
2025-04-22T23:14:44.900749+0800 | from_modifiers | INFO - Creating recipe from modifiers
2025-04-22T23:14:46.201364+0800 | _calibrate | INFO - Running AWQModifier calibration with 256 samples...
56%|█████████████████████████████████████████████▏ | 143/256 [00:53<00:43, 2.62it/s]
Killed
Environment
Include all relevant environment information:
- OS ubuntu 18.04
- Python version 3.10.16
- LLM Compressor version 0.5.1.dev26+g76af7af2
- compressed-tensors 0.9.4a20250414
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working