- 
        Couldn't load subscription status. 
- Fork 1.8k
feat: Refactor the fetching request logic #5786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e727517    to
    8df54fb      
    Compare
  
    8df54fb    to
    ce88c8f      
    Compare
  
    ce88c8f    to
    38708ce      
    Compare
  
    | /bot run --disable-fail-fast | 
| PR_Github #11779 [ run ] triggered by Bot | 
38708ce    to
    b498216      
    Compare
  
    | /bot run --disable-fail-fast | 
| PR_Github #11794 [ run ] triggered by Bot | 
| PR_Github #11779 [ run ] completed with state  | 
b498216    to
    0d3f197      
    Compare
  
    | /bot run --disable-fail-fast | 
| PR_Github #11796 [ run ] triggered by Bot | 
| PR_Github #11794 [ run ] completed with state  | 
| PR_Github #11796 [ run ] completed with state  | 
0d3f197    to
    8cb76d1      
    Compare
  
    | /bot run --disable-fail-fast | 
| PR_Github #11824 [ run ] triggered by Bot | 
| PR_Github #11824 [ run ] completed with state  | 
| PR_Github #12303 [ run ] completed with state  | 
| /bot run --disable-fail-fast | 
| PR_Github #12397 [ run ] triggered by Bot | 
| PR_Github #12397 [ run ] completed with state  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
9469ede    to
    f9060a9      
    Compare
  
    | /bot run --disable-fail-fast | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
tests/unittest/_torch/test_executor_request_queue.py (1)
103-117: Test documents a critical bug in the implementationThis test correctly verifies that when a query parameter is passed to
enqueue_request, it should be stored in thequeryfield of the RequestQueueItem. However, this test will currently fail due to a bug in the implementation where the query parameter is incorrectly passed as a positional argument.tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (3)
144-162: Fix critical bug inenqueue_requestmethodThe
queryparameter is incorrectly passed as a positional argument, which makes it theis_canceled_requestparameter instead ofquery.
164-168: Use try/finally pattern for lock handlingThe current implementation could leave the lock held if an exception occurs between acquire and release.
490-493: Replace print statement with proper logging and fix line lengthUse the logging framework instead of print statements for consistency and better control over output.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (1)
522-522: Fix line length violationThe assertion line exceeds the 120-character limit. Consider breaking it into multiple lines for better readability.
- assert anchor_block_size <= block_size, f'cp_anchor_size {anchor_block_size} should be smaller than block_size {block_size}' + assert anchor_block_size <= block_size, ( + f'cp_anchor_size {anchor_block_size} should be smaller than ' + f'block_size {block_size}' + )
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
- tensorrt_llm/_torch/pyexecutor/executor_request_queue.py(1 hunks)
- tensorrt_llm/_torch/pyexecutor/py_executor.py(16 hunks)
- tests/unittest/_torch/test_executor_request_queue.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
tests/unittest/_torch/test_executor_request_queue.py (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py
491-491: Line too long (133 > 120)
(E501)
522-522: Line too long (132 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (35)
tests/unittest/_torch/test_executor_request_queue.py (4)
1-11: LGTM: Well-structured test imports and setupThe imports are appropriate and the module correctly imports the target classes for testing.
14-31: LGTM: Comprehensive mock setup for distributed testingThe mock_dist fixture properly sets up all necessary distributed attributes for isolated testing.
319-347: Excellent thread safety test coverageThis test properly validates concurrent enqueue operations and ensures unique request ID generation without race conditions.
389-418: Comprehensive integration test workflowThe test effectively simulates a complete workflow from enqueueing requests through cancellation and validation, providing good end-to-end coverage.
tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (8)
22-36: LGTM: Well-designed RequestQueueItem dataclassThe dataclass design is clean with appropriate default values and useful property methods for request type checking.
38-67: LGTM: Comprehensive initialization with proper state managementThe constructor properly initializes all necessary attributes for queue management, distributed coordination, and performance tracking.
113-126: LGTM: Proper lock handling in enqueue_requestsThe method correctly uses try/finally to ensure the lock is always released, and properly tracks start times for performance metrics.
128-142: LGTM: Clean implementation of cancel and shutdown requestsBoth methods properly handle locking and create appropriate RequestQueueItem instances for their respective operations.
208-215: LGTM: Clean routing logic for attention DPThe method properly delegates to the appropriate fetch method based on the attention DP configuration.
293-315: LGTM: Robust request validation and filteringThe method properly handles shutdown signals, canceled requests, and beam width validation with clear error messages.
550-569: LGTM: Well-documented scheduler overlap logicThe method includes excellent documentation explaining the complex interaction between overlap scheduling and different sampler types.
571-601: LGTM: Comprehensive performance metrics interfaceThe getter methods provide clean access to internal queue state and performance metrics with appropriate return types.
tensorrt_llm/_torch/pyexecutor/py_executor.py (23)
23-23: LGTM! Proper import addition for SpecDecodingStats.The import of
SpecDecodingStatsis correctly added to support the refactored functionality.
31-31: LGTM! ExecutorRequestQueue import is properly added.The import statement correctly brings in the new
ExecutorRequestQueueandRequestQueueItemclasses for the refactored queue management.
35-35: LGTM! Import statement is properly formatted.The import for
LlmResponseis correctly maintained.
37-37: LGTM! Sampler imports are properly maintained.The import statement correctly includes all necessary sampler-related classes.
220-232: LGTM! ExecutorRequestQueue initialization is comprehensive and well-structured.The initialization properly delegates all queue management to the new
ExecutorRequestQueueclass with appropriate configuration parameters. The call toset_exclude_last_generation_logitscorrectly configures the queue based on scheduler and sampler settings.
295-295: LGTM! Proper delegation to ExecutorRequestQueue.The
enqueue_requestsmethod correctly delegates to the new queue implementation.
328-328: LGTM! Cancel request properly delegated.The cancel request functionality is correctly delegated to
ExecutorRequestQueue.
334-334: LGTM! Shutdown request properly delegated.The shutdown functionality is correctly delegated to
ExecutorRequestQueue.
349-349: LGTM! Can enqueue check properly delegated.The
can_enqueue_requestsmethod correctly delegates to the queue implementation.
387-388: LGTM! Single request enqueue properly delegated.The
enqueue_requestmethod correctly delegates toExecutorRequestQueueand maintains the same interface.
396-397: LGTM! Should stop processing logic properly updated.The property correctly uses the new queue's
get_waiting_queue_size()method to determine if processing should stop.
536-536: LGTM! Request queue access properly delegated.The statistics collection correctly accesses the request queue through the new interface.
553-554: LGTM! Queue size retrieval properly delegated.The iteration statistics correctly retrieve the queue size from the new
ExecutorRequestQueue.
667-668: LGTM! Queue latency metrics properly delegated.The performance statistics correctly retrieve queue latency metrics from the new queue implementation.
828-829: LGTM! Queue latency retrieval in executor loop.The executor loop correctly retrieves queue latency metrics from
ExecutorRequestQueue.
948-951: LGTM! Benchmark queue size check properly delegated.The benchmark logic correctly uses the new queue's size method for monitoring.
972-973: LGTM! Queue latency in overlap executor loop.The overlap executor loop correctly retrieves queue latency from the new implementation.
1107-1115: LGTM! Fetch new requests properly refactored.The
_fetch_new_requestsmethod is correctly refactored to delegate all operations toExecutorRequestQueue. The method properly:
- Fetches new requests from the queue
- Extends the active requests list
- Updates shutdown state
- Retrieves expected number of active requests
1405-1405: LGTM! Canceled request IDs check properly delegated.The check for canceled request IDs correctly uses the new queue's method.
1408-1409: LGTM! Waiting queue update properly delegated.The waiting queue update operation is correctly delegated to
ExecutorRequestQueue.
1413-1413: LGTM! Canceled request ID lookup properly delegated.The canceled request ID check correctly uses the new queue's method.
1423-1423: LGTM! Clear canceled request IDs properly delegated.The operation to clear canceled request IDs is correctly delegated to
ExecutorRequestQueue.
1515-1516: LGTM! Active requests update pattern is correct.The pattern of clearing and extending the active requests list is appropriate and maintains the same functionality while being more explicit about the operation.
| PR_Github #12417 [ run ] triggered by Bot | 
| PR_Github #12417 [ run ] completed with state  | 
Signed-off-by: Shunkang <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
Signed-off-by: Shunkang <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
Refactor the fetching request logic
Description
In the existing implementation, the request fetching logic is getting longer and longer. In this case, I refactor the logic to ExecutorRequestQueue where it's much easier to do test and integrate more advanced fetching policy.
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
New Features
Refactor
Chores