Skip to content

Conversation

Jialin
Copy link
Contributor

@Jialin Jialin commented Sep 27, 2025

Purpose

Currently, GPUModelRunner._bookkeeping_sync interleaves numpy updates and python logics which is inefficient, and we could see scattered tensor and numpy array updates which consumes significant amount of times.

In this change, we simply vectorize the tensor and numpy updates

  • compute update indexes and values in for loop in Python
  • apply buck updates to tensor and numpy for vectorization
Screenshot 2025-09-26 at 11 40 03 AM

Test Plan & Test Result

Correctness

VLLM_USE_V1=1 python3 examples/offline_inference/spec_decode.py  --method ngram  --model-dir meta-llama/Llama-3.1-8B-Instruct  --prompt_lookup_min 2  --prompt_lookup_max 5  --num_spec_tokens 5  --dataset-name hf  --dataset-path philschmid/mt-bench  --num-prompts 80  --print-output

Output is exactly the same before and after the change
--------------------------------------------------
total_num_output_tokens: 17069
num_drafts: 2548
num_draft_tokens: 12711
num_accepted_tokens: 2587
mean acceptance length: 2.02
--------------------------------------------------
acceptance at token 0: 0.43
acceptance at token 1: 0.25
acceptance at token 2: 0.15
acceptance at token 3: 0.10
acceptance at token 4: 0.07

Optimization
~3x speedup with the change per trace
Screenshot 2025-09-26 at 2 36 37 PM

Per gptoss AIME 2025 eval runs

  • bookkeeping total elapsed time reduced by 60%+
  • bookkeeping elapsed time distribution is less skewed
Screenshot 2025-09-26 at 11 36 11 PM Screenshot 2025-09-26 at 11 35 31 PM
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jialin Ouyang <[email protected]>
@Jialin Jialin marked this pull request as ready for review September 27, 2025 06:38
@mergify mergify bot added the v1 label Sep 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant