-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Description
🐛 Describe the bug
This issue is a parking lot for edge-cases related to shutdown and logging which will require additional changes in order to be handled correctly by vLLM v1, even after #11737 lands. The goal is that when vLLM shuts down - whether intentionally or due to an internal failure - the cause of shutdown should be logged with a useful level of detail, and the server's resources (especially GPU memory) should be freed.
-
Process monitor for engine core process. [V1][Frontend] Improve Shutdown And Logs #11737 adds this for the TP workers but currently I don't think things will shut down cleanly if you kill the engine core proc without warning.
-
A bug not addressed by [V1][Frontend] Improve Shutdown And Logs #11737 : when an
LLM
instance is created with multiprocessing disabled, deleting theLLM
instance usingdel
does not free the engine's weight memory on the GPU, resulting in OOM errors for subsequent tests. This appears to happen because the in-process engine core client does not free weight memory as part of shutdown. It may also be the case that the worker does have any logic for explicitly deleting the PyTorch model layers. In contrast, with multiprocessing enabled, GPU weight memory is freed when the worker process(es) get killed. -
While [V1][Frontend] Improve Shutdown And Logs #11737 mostly addresses clean shutdown of AsyncLLM when it is garbage collected, it's not yet completely robust. Removing the explicit calls to shutdown in
test_async_llm.py
now works most of the time but occasionally doesn't (which causes subsequent test to fail with OOM). -
Not technically a bug, but some of the exception stack traces associated with shutdown scenarios are extremely verbose and redundant and could be suppressed without reducing usefulness to the user (this is especially true as of [V1][Frontend] Improve Shutdown And Logs #11737 which adds more error handling logic around shutdown scenarios)
-
Edge-cases which are not unit-tested as of [V1][Frontend] Improve Shutdown And Logs #11737 , but should be:
- Add shutdown unit tests for handling hard-kill of worker processes and engine core proc (latter in TP and non-TP cases)
- End-to-end shutdown tests against API endpoint
- Shutdown unit tests for data-parallel (DP) scenario
- Tentatively: error during utility call, error during abort, handle errors in IPC mechanisms
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.