Skip to content

Commit d39bcb6

Browse files
authored
[nvbugs/5274894] fix: Moving finished context requests to generation (NVIDIA#4576)
fix: Moving finished context requests to generation - Unfinished chunked context requests appear at end of context requests vector. - Replaced std::find_if with std::partition to find the correct position to move finished context requests to generation. Signed-off-by: Robin Kobus <[email protected]>
1 parent 3d083b6 commit d39bcb6

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

cpp/tensorrt_llm/batch_manager/utils/inflightBatchingUtils.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,9 @@ void moveFinishedContextRequestsToGeneration(ScheduledRequests& scheduledRequest
6060

6161
auto& contextRequests = scheduledRequests.contextRequests;
6262
auto& generationRequests = scheduledRequests.generationRequests;
63-
auto firstFinished = std::find_if(
64-
contextRequests.begin(), contextRequests.end(), [](auto const& llmReq) { return llmReq->isContextFinished(); });
63+
64+
auto firstFinished = std::partition(contextRequests.begin(), contextRequests.end(),
65+
[](auto const& llmReq) { return !llmReq->isContextFinished(); });
6566
TLLM_LOG_DEBUG(
6667
"Moving %ld finished context requests to generation.", std::distance(firstFinished, contextRequests.end()));
6768
generationRequests.insert(generationRequests.begin(), std::make_move_iterator(firstFinished),

0 commit comments

Comments
 (0)