Skip to content

Conversation

@Flink-ddd
Copy link
Contributor

@Flink-ddd Flink-ddd commented Oct 23, 2025

Purpose

Fixes #26081

This PR addresses a bug where llm.chat() pollutes the scheduler queue if a batch of requests fails validation mid-way.

The Bug:
When a batch (e.g., 4 requests) is sent to llm.chat() and one request (3) is invalid (too long), the engine enqueues the valid requests (1, 2) and then throws an exception for 3. These enqueued requests (1, 2) become "orphaned" and are not cleaned up.

The Consequence:
The next call to llm.chat() (in a recursive retry logic) picks up these orphaned requests, leading to more outputs than inputs and breaking the API contract.

The Fix:
This PR makes the batch addition in _validate_and_add_requests transactional.

  1. It wraps the request-adding loop in a try...except block.
  2. It tracks the request_id of each successfully enqueued request from the current batch.
  3. If an exception occurs, the except block iterates through the tracked IDs and calls self.llm_engine.abort_request() on each "orphan" before re-raising the exception.

Test Plan

  1. Manual Reproduction (OP's Example): Used the recursive infer() function from [Bug]: .chat() does not clean up in case of validation failure #26081 to demonstrate the bug (orphaned requests causing incorrect output counts) and verify the fix (correct output counts). See results below.
  2. Automated Regression Test: Added test_chat_batch_failure_cleanup to tests/entrypoints/llm/test_chat.py. This test (a) triggers a batch failure with an invalid prompt, (b) immediately sends a new, valid batch, and (c) asserts that the output count for the new batch is correct, confirming the queue was cleaned.

Test Result

The fix is confirmed by both the manual reproduction script and the new automated test.

1. Manual Reproduction (Before vs. After)

Before Fix (on main branch): Bug Reproduced
The recursive call processed 5 real outputs instead of the expected 3, due to 2 orphaned requests.

Screenshot 2025-10-23 at 22 39 38

After Fix (on this branch): Fix Verified
The recursive call now correctly processes 3 real outputs and 1 error, as the orphaned requests were successfully aborted.

Screenshot 2025-10-23 at 22 41 21

2. Automated Regression Test (Pytest)

The new pytest case passes, ensuring this bug does not regress.

commands:
pytest tests/entrypoints/llm/test_chat.py -k test_chat_batch_failure_cleanup

Screenshot 2025-10-23 at 22 29 14

@mergify mergify bot added the frontend label Oct 23, 2025
@Flink-ddd Flink-ddd force-pushed the fix/26081-chat-batch-cleanup branch 3 times, most recently from 4a69509 to 31a6f39 Compare October 23, 2025 15:35
@Flink-ddd Flink-ddd force-pushed the fix/26081-chat-batch-cleanup branch from 31a6f39 to bd01420 Compare October 23, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: .chat() does not clean up in case of validation failure

1 participant