[nvbugs/5274894] fix: Moving finished context requests to generation (NVIDIA#4576)

Funatiq · web-flow · commit d39bcb6b4079 · 2025-05-22T17:49:40.000+02:00
fix: Moving finished context requests to generation

- Unfinished chunked context requests appear at end of context requests vector.
- Replaced std::find_if with std::partition to find the correct position to move finished context requests to generation.

Signed-off-by: Robin Kobus &lt;19427718+Funatiq@users.noreply.github.com&gt;
diff --git a/cpp/tensorrt_llm/batch_manager/utils/inflightBatchingUtils.cpp b/cpp/tensorrt_llm/batch_manager/utils/inflightBatchingUtils.cpp
@@ -60,8 +60,9 @@ void moveFinishedContextRequestsToGeneration(ScheduledRequests& scheduledRequest
 
     auto& contextRequests = scheduledRequests.contextRequests;
     auto& generationRequests = scheduledRequests.generationRequests;
-    auto firstFinished = std::find_if(
-        contextRequests.begin(), contextRequests.end(), [](auto const& llmReq) { return llmReq->isContextFinished(); });
+
+    auto firstFinished = std::partition(contextRequests.begin(), contextRequests.end(),
+        [](auto const& llmReq) { return !llmReq->isContextFinished(); });
     TLLM_LOG_DEBUG(
         "Moving %ld finished context requests to generation.", std::distance(firstFinished, contextRequests.end()));
     generationRequests.insert(generationRequests.begin(), std::make_move_iterator(firstFinished),