Fix(llm): Abort orphaned requests when llm.chat() batch fails Fixes #26081 #27420
+75
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Fixes #26081
This PR addresses a bug where
llm.chat()pollutes the scheduler queue if a batch of requests fails validation mid-way.The Bug:
When a batch (e.g., 4 requests) is sent to
llm.chat()and one request (3) is invalid (too long), the engine enqueues the valid requests (1, 2) and then throws an exception for 3. These enqueued requests (1, 2) become "orphaned" and are not cleaned up.The Consequence:
The next call to
llm.chat()(in a recursive retry logic) picks up these orphaned requests, leading to more outputs than inputs and breaking the API contract.The Fix:
This PR makes the batch addition in
_validate_and_add_requeststransactional.try...exceptblock.request_idof each successfully enqueued request from the current batch.exceptblock iterates through the tracked IDs and callsself.llm_engine.abort_request()on each "orphan" before re-raising the exception.Test Plan
infer()function from [Bug]: .chat() does not clean up in case of validation failure #26081 to demonstrate the bug (orphaned requests causing incorrect output counts) and verify the fix (correct output counts). See results below.test_chat_batch_failure_cleanuptotests/entrypoints/llm/test_chat.py. This test (a) triggers a batch failure with an invalid prompt, (b) immediately sends a new, valid batch, and (c) asserts that the output count for the new batch is correct, confirming the queue was cleaned.Test Result
The fix is confirmed by both the manual reproduction script and the new automated test.
1. Manual Reproduction (Before vs. After)
Before Fix (on
mainbranch): Bug ReproducedThe recursive call processed 5 real outputs instead of the expected 3, due to 2 orphaned requests.
After Fix (on this branch): Fix Verified
The recursive call now correctly processes 3 real outputs and 1 error, as the orphaned requests were successfully aborted.
2. Automated Regression Test (Pytest)
The new
pytestcase passes, ensuring this bug does not regress.commands:
pytest tests/entrypoints/llm/test_chat.py -k test_chat_batch_failure_cleanup