[Core] Bookkeeping optimization: Vectorize updates #25801

Jialin · 2025-09-27T06:38:04Z

Purpose

Currently, GPUModelRunner._bookkeeping_sync interleaves numpy updates and python logics which is inefficient, and we could see scattered tensor and numpy array updates which consumes significant amount of times.

In this change, we simply vectorize the tensor and numpy updates

compute update indexes and values in for loop in Python
apply buck updates to tensor and numpy for vectorization

Test Plan & Test Result

Correctness

VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py  --method ngram  --model-dir meta-llama/Llama-3.1-8B-Instruct  --prompt_lookup_min 2  --prompt_lookup_max 5  --num_spec_tokens 5  --dataset-name hf  --dataset-path philschmid/mt-bench  --num-prompts 80  --print-output

Output is exactly the same before and after the change
--------------------------------------------------
total_num_output_tokens: 17069
num_drafts: 2548
num_draft_tokens: 12711
num_accepted_tokens: 2587
mean acceptance length: 2.02
--------------------------------------------------
acceptance at token 0: 0.43
acceptance at token 1: 0.25
acceptance at token 2: 0.15
acceptance at token 3: 0.10
acceptance at token 4: 0.07

Optimization
~3x speedup with the change per trace

Per gptoss AIME 2025 eval runs

bookkeeping total elapsed time reduced by 60%+
bookkeeping elapsed time distribution is less skewed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jialin Ouyang <[email protected]>

Bookkeeping optimization

d470fde

Signed-off-by: Jialin Ouyang <[email protected]>

Jialin marked this pull request as ready for review September 27, 2025 06:38

Jialin requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners September 27, 2025 06:38

mergify bot added the v1 label Sep 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Bookkeeping optimization: Vectorize updates #25801

[Core] Bookkeeping optimization: Vectorize updates #25801

Jialin commented Sep 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

Uh oh!

[Core] Bookkeeping optimization: Vectorize updates #25801

Are you sure you want to change the base?

[Core] Bookkeeping optimization: Vectorize updates #25801

Conversation

Jialin commented Sep 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Test Result

Uh oh!

Uh oh!

Jialin commented Sep 27, 2025 •

edited by github-actions bot

Loading