Skip to content

Conversation

@zixi-qi
Copy link
Collaborator

@zixi-qi zixi-qi commented Sep 15, 2025

Purpose

Port suffix decoding implementation from ArcticInference(https://github.com/snowflakedb/ArcticInference) to vLLM main to test suffix decoding without depending on arctic inference

Test Plan

Run e2e and unit tests for suffix decoding based spec decode

Test Result

  • E2E test

suffix decode

VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py --num_spec_tokens 1 --num_prompts 80 --dataset-name hf --dataset-path philschmid/mt-bench --method suffix

Adding requests: 100%|██████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 9848.96it/s]
Processed prompts: 100%|████████| 80/80 [00:04<00:00, 17.43it/s, est. speed input: 1755.10 toks/s, output: 3703.51 toks/s]
--------------------------------------------------
total_num_output_tokens: 16993
num_drafts: 1986
num_draft_tokens: 1986
num_accepted_tokens: 1226
mean acceptance length: 1.62
--------------------------------------------------
acceptance at token 0: 0.62

ngram as comparison

VLLM_USE_V1=1 python examples/offline_inference/spec_decode.py --num_spec_tokens 1 --num_prompts 80 --dataset-name hf --dataset-path philschmid/mt-bench --method ngram

Adding requests: 100%|██████████████████████████████████████████████████████████████████| 80/80 [00:00<00:00, 9968.34it/s]
Processed prompts: 100%|████████| 80/80 [00:04<00:00, 18.30it/s, est. speed input: 1842.36 toks/s, output: 3915.31 toks/s]
--------------------------------------------------
total_num_output_tokens: 17114
num_drafts: 3426
num_draft_tokens: 3426
num_accepted_tokens: 1696
mean acceptance length: 1.50
--------------------------------------------------
acceptance at token 0: 0.50
  • Unit test
pytest tests/v1/e2e/test_spec_decode.py -k suffix -v

tests/v1/e2e/test_spec_decode.py::test_suffix_correctness PASSED                                                                                                                                                                                                                [ 25%]
tests/v1/e2e/test_spec_decode.py::test_suffix_with_configs[suffix_config0] PASSED                                                                                                                                                                                               [ 50%]
tests/v1/e2e/test_spec_decode.py::test_suffix_with_configs[suffix_config1] PASSED                                                                                                                                                                                               [ 75%]
tests/v1/e2e/test_spec_decode.py::test_suffix_with_configs[suffix_config2] PASSED  
pytest tests/v1/spec_decode/test_suffix_tree_cpp.py -v

tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_basic_operations PASSED                                                                                                                                                                                   [ 11%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_append_operations PASSED                                                                                                                                                                                  [ 22%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_multiple_sequences PASSED                                                                                                                                                                                 [ 33%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_speculation_parameters PASSED                                                                                                                                                                             [ 44%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_integrity_check PASSED                                                                                                                                                                                    [ 55%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_memory_estimation PASSED                                                                                                                                                                                  [ 66%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_empty_sequences PASSED                                                                                                                                                                                    [ 77%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_large_sequences PASSED                                                                                                                                                                                    [ 88%]
tests/v1/spec_decode/test_suffix_tree_cpp.py::TestSuffixTreeCpp::test_tree_vs_path_speculation PASSED 
pytest tests/v1/spec_decode/test_suffix_cache.py -v

tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_basic_operations PASSED                                                                                                                                                                                        [ 12%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_multiple_requests PASSED                                                                                                                                                                                       [ 25%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_cache_eviction PASSED                                                                                                                                                                                          [ 37%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_pattern_matching PASSED                                                                                                                                                                                        [ 50%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_empty_patterns PASSED                                                                                                                                                                                          [ 62%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_invalid_operations PASSED                                                                                                                                                                                      [ 75%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_max_depth_handling PASSED                                                                                                                                                                                      [ 87%]
tests/v1/spec_decode/test_suffix_cache.py::TestSuffixCache::test_speculation_parameters PASSED                                                                                                                                                                                  [100%]


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added documentation Improvements or additions to documentation ci/build speculative-decoding v1 labels Sep 15, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2025

No ciflow labels are configured for this repo.
For information on how to enable CIFlow bot see this wiki

@zixi-qi zixi-qi closed this Sep 15, 2025
@zixi-qi zixi-qi reopened this Sep 16, 2025
@mergify
Copy link

mergify bot commented Sep 17, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zixi-qi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: zixi-qi <[email protected]>
Signed-off-by: qizixi <[email protected]>
Signed-off-by: zixi-qi <[email protected]>
Signed-off-by: qizixi <[email protected]>
Signed-off-by: zixi-qi <[email protected]>
Signed-off-by: qizixi <[email protected]>
Signed-off-by: zixi-qi <[email protected]>
@zixi-qi
Copy link
Collaborator Author

zixi-qi commented Sep 30, 2025

Official implementation added in #25784

@zixi-qi zixi-qi closed this Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant