Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
195 commits
Select commit Hold shift + click to select a range
7937c2f
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) fo…
sunyicode0012 May 19, 2025
8171221
[Misc] Fix typo (#18330)
Unprincess17 May 19, 2025
dc1440c
Neuron up mistral (#18222)
aws-satyajith May 19, 2025
258bf62
fix CUDA_check redefinition in #17918 (#18287)
luccafong May 19, 2025
d565e09
[neuron] fix authorization issue (#18364)
liangfu May 19, 2025
f07a673
[Misc] Allow `AutoWeightsLoader` to skip loading weights with specifi…
Isotr0py May 20, 2025
9609327
[Core] [Bugfix]: tensor parallel with prompt embeds (#18171)
Nan2018 May 20, 2025
d981396
[release] Change dockerhub username for TPU release (#18389)
khluu May 20, 2025
bca55b5
[Bugfix] fix adding bias twice in ipex GPTQ quantization (#18363)
rand-fly May 20, 2025
1b1e8e0
[doc] update env variable export (#18391)
reidliu41 May 20, 2025
6b35cb1
[Misc] Add LoRA code owner (#18387)
jeejeelee May 20, 2025
d6c86d0
Update cpu.txt (#18398)
princepride May 20, 2025
8684770
[CI] Add mteb testing to test the accuracy of the embedding model (#1…
noooop May 20, 2025
be48360
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure T…
wulipc May 20, 2025
8f55962
[Misc] refactor prompt embedding examples (#18405)
reidliu41 May 20, 2025
f4a8a37
[Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356)
mgoin May 20, 2025
e1f5a71
[Model] use AutoWeightsLoader for bloom (#18300)
calvin0327 May 20, 2025
980a172
[Kernel] update comment for KV shape in unified triton attn (#18099)
haochengxia May 20, 2025
23baa21
fix:Build torch wheel inline rather than picking from nightly (#18351)
dilipgb May 20, 2025
3b17ea2
[TPU] Re-enable the Pallas MoE kernel (#18025)
mgoin May 21, 2025
0c15c2e
[Bugfix] config.head_dim is now explicitly set to None (#18432)
gshtras May 21, 2025
92247c5
[Bug] Fix moe_sum signature (#18440)
bnellnm May 21, 2025
ad0012a
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processin…
DarkLight1337 May 21, 2025
d06dd72
[Bugfix][Failing Test] Fix nixl connector test when promt size < bloc…
wwl2755 May 21, 2025
cd8dfc6
[Misc] MultiConnector._connectors type (#18423)
NickLucche May 21, 2025
5d7f545
[Frontend] deprecate `--device` arg (#18399)
kebe7jun May 21, 2025
907f935
[V1] Fix general plugins not loaded in engine for multiproc (#18326)
sarckk May 21, 2025
107f5fc
[Misc] refactor disaggregated-prefill-v1 example (#18474)
reidliu41 May 21, 2025
61acfc4
[Bugfix][Failing Test] Fix test_events.py (#18460)
rabi May 21, 2025
eca1869
[MODEL] FalconH1 (#18406)
dhiaEddineRhaiem May 21, 2025
c154d89
[Doc] fix arg docstring in linear layers (#18410)
giantcroc May 21, 2025
c6c10ca
[Bugfix] Reduce moe_sum test size to avoid OOM (#18484)
bnellnm May 21, 2025
371376f
[Build] fix Dockerfile shell (#18402)
kebe7jun May 21, 2025
2b16104
[Misc] Update deprecation message for `--enable-reasoning` (#18404)
Zerohertz May 21, 2025
16af49c
Merge remote-tracking branch 'upstream/main'
gshtras May 21, 2025
dd5fa7e
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1…
hyoon1 May 21, 2025
7c1213e
Remove incorrect env value
gshtras May 21, 2025
bb0a311
Revert "[v1] Support multiple KV cache groups in GPU model runner (#1…
markmc May 21, 2025
94d8ec8
[FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338)
vllmellm May 21, 2025
1f07954
[Bugfix] Consistent ascii handling in tool parsers (#17704)
schoennenbeck May 21, 2025
20bd6f4
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e…
dhiaEddineRhaiem May 22, 2025
176d62e
[MISC] update project urls in pyproject.toml (#18519)
andyxning May 22, 2025
6e0fd34
[CI] Fix race condition with StatelessProcessGroup.barrier (#18506)
russellb May 22, 2025
acb54ca
Intialize io_thread_pool attribute in the beginning. (#18331)
rabi May 22, 2025
d022115
[Bugfix] Inconsistent token calculation compared to HF in llava famil…
cyr0930 May 22, 2025
cf5984b
[BugFix][DP] Send DP wave completion only from `dp_rank==0` (#18502)
njhill May 22, 2025
5179777
[Bugfix][Model] Make Olmo2Model weight loading return loaded weights …
2015aroras May 22, 2025
db5a29b
[Bugfix] Fix LoRA test (#18518)
jeejeelee May 22, 2025
23b67b3
[Doc] Fix invalid JSON in example args (#18527)
DarkLight1337 May 22, 2025
e2d7d31
[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23)…
aws-satyajith May 22, 2025
ebed81f
Update default neuron config for speculation (#18274)
elaineyz May 22, 2025
fa72f9a
Order sequence ids + config update to support specifying custom quant…
elaineyz May 22, 2025
f6037d1
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure T…
wulipc May 22, 2025
a35a494
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatibl…
lk-chen May 22, 2025
ca86a7c
[CI/Build] Update bamba test model location (#18544)
hmellor May 22, 2025
7107502
[Doc] Support --stream arg in openai_completion_client.py script (#18…
googs1025 May 22, 2025
4e04ece
[Bugfix] Use random hidden states in dummy sampler run (#18543)
abmfy May 22, 2025
3f50523
[Doc] Add stream flag for chat completion example (#18524)
calvin0327 May 22, 2025
93f7167
[BugFix][CPU] Fix x86 SHM distributed module initialization (#18536)
bigPYJ1151 May 22, 2025
cb506ec
[Misc] improve Automatic Prefix Caching example (#18554)
reidliu41 May 22, 2025
54631f8
[Misc] Call `ndarray.tobytes()` directly instead of `ndarray.data.tob…
lgeiger May 22, 2025
1f3a120
[Bugfix] make `test_openai_schema.py` pass (#18224)
davidxia May 22, 2025
721fb9b
[Platform] Move platform check to right place (#18470)
wangxiyuan May 22, 2025
f8d2cc5
[Compile][Platform] Make PiecewiseBackend pluggable and extendable (#…
MengqingCao May 22, 2025
6e588da
[Build/CI] Fix CUDA 11.8 build (#17679)
tlrmchlsmth May 22, 2025
7b9d832
[Tool] Add NIXL installation script (#18172)
lk-chen May 22, 2025
a04720b
[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290)
ekagra-ranjan May 22, 2025
c91fe7b
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_…
wukaixingxp May 22, 2025
c32e249
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter seriali…
sangstar May 23, 2025
46791e1
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.…
rasmith May 23, 2025
04eb88d
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569)
huachenheli May 23, 2025
c6b636f
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
markmc May 23, 2025
4b0da7b
Enable hybrid attention models for Transformers backend (#18494)
hmellor May 23, 2025
fae453f
[Misc] refactor: simplify input validation and num_requests handling …
googs1025 May 23, 2025
93ecb81
[BugFix] Increase TP execute_model timeout (#18558)
njhill May 23, 2025
e44d8ce
[Bugfix] Set `KVTransferConfig.engine_id` in post_init (#18576)
lk-chen May 23, 2025
583507d
[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488)
benchislett May 23, 2025
ed5d408
[Neuron] Remove bypass on EAGLEConfig and add a test (#18514)
elaineyz May 23, 2025
4be2255
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use …
tishizaki May 23, 2025
9c1baa5
[Misc] Replace `cuda` hard code with `current_platform` (#16983)
shen-shanshan May 23, 2025
60cad94
[Hardware] correct method signatures for HPU,ROCm,XPU (#18551)
andyxning May 23, 2025
4c61134
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal …
RonaldBXu May 23, 2025
71ea614
[Feature]Add async tensor parallelism using compilation pass (#17882)
cascade812 May 23, 2025
54af915
[Doc] Update quickstart and install for cu128 using `--torch-backend=…
mgoin May 23, 2025
b046cf7
[Feature][V1]: suupports cached_tokens in response usage (#18149)
chaunceyjiang May 23, 2025
d0bc2f8
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 c…
zzzyq May 23, 2025
a1fe24d
Migrate docs from Sphinx to MkDocs (#18145)
hmellor May 23, 2025
fbb13a2
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for mul…
DarkLight1337 May 23, 2025
4ce64e2
[Bugfix][Model] Fix baichuan model loader for tp (#18597)
MengqingCao May 23, 2025
e493e48
[V0][Bugfix] Fix parallel sampling performance regression when guided…
shadeMe May 23, 2025
6526e05
Add myself as docs code owner (#18605)
hmellor May 23, 2025
7ab056c
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to …
yankay May 23, 2025
cd821ea
[CI] fix kv_cache_type argument (#18594)
andyxning May 23, 2025
38a95cb
[Doc] Fix indent of contributing to vllm (#18611)
Zerohertz May 23, 2025
2edb533
Replace `{func}` with mkdocs style links (#18610)
hmellor May 23, 2025
6dd51c7
[CI/Build] Fix V1 flag being set in entrypoints tests (#18598)
DarkLight1337 May 23, 2025
52fb23f
Fix examples with code blocks in docs (#18609)
hmellor May 23, 2025
6220f3c
[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602)
tristanleclercq May 23, 2025
d4c2919
Include private attributes in API documentation (#18614)
hmellor May 23, 2025
2cd1fa4
[Misc] add Haystack integration (#18601)
reidliu41 May 23, 2025
1068556
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORT…
simon-mo May 23, 2025
5221815
[Doc] Fix markdown list indentation for MkDocs rendering (#18620)
Zerohertz May 23, 2025
022d8ab
[Doc] Use a different color for the announcement (#18616)
DarkLight1337 May 23, 2025
6a7988c
Refactor pplx init logic to make it modular (prepare for deepep) (#18…
youkaichao May 23, 2025
3d28ad3
Fix figures in design doc (#18612)
hmellor May 23, 2025
9520a98
[Docs] Change mkdocs to not use directory urls (#18622)
mgoin May 23, 2025
6550114
[v1] Redo "Support multiple KV cache groups in GPU model runner (#179…
heheda12345 May 23, 2025
8ddd1cf
[Doc] fix list formatting (#18624)
davidxia May 23, 2025
273cb3b
[Doc] Fix top-level API links/docs (#18621)
DarkLight1337 May 23, 2025
15b45ff
[Doc] Avoid documenting dynamic / internal modules (#18626)
DarkLight1337 May 23, 2025
371f7e4
[Doc] Fix broken links and unlinked docs, add shortcuts to home sideb…
DarkLight1337 May 23, 2025
2628a69
[V1] Support Deepseek MTP (#18435)
YaoJiayi May 23, 2025
1645b60
Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI (#1…
huydhn May 23, 2025
0ddf88e
[CI] Enable test_initialization to run on V1 (#16736)
mgoin May 23, 2025
7d92164
[Doc] Update references to doc files (#18637)
DarkLight1337 May 23, 2025
f203673
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to co…
pavanimajety May 23, 2025
4fc1bf8
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtrackin…
Crucifixion-Fxl May 23, 2025
2b10ba7
[Bugfix][Nixl] Fix Preemption Bug (#18631)
robertgshaw2-redhat May 23, 2025
45ab403
config.py: Clarify that only local GGUF checkpoints are supported. (#…
MathieuBordere May 24, 2025
ec82c3e
FIX MOE issue in AutoRound format (#18586)
wenhuach21 May 24, 2025
d55e446
[V1][Spec Decode] Small refactors to improve eagle bookkeeping perfor…
zixi-qi May 24, 2025
441dc63
[Frontend] improve vllm serve --help display (#18643)
reidliu41 May 24, 2025
a859320
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditiona…
Nalkey May 24, 2025
c1e4a40
[V1][Spec Decode] Support multi-layer eagle draft model (#18030)
zixi-qi May 24, 2025
07458a5
[Doc] Update README links, mark external links (#18635)
DarkLight1337 May 24, 2025
e77dc4b
[MISC][pre-commit] Add pre-commit check for triton import (#17716)
MengqingCao May 24, 2025
ef1dd68
[Doc] Fix indentation problems in V0 Paged Attention docs (#18659)
DarkLight1337 May 24, 2025
6d166a8
[Doc] Add community links (#18657)
DarkLight1337 May 24, 2025
2cd4d58
[Model] use AutoWeightsLoader for gpt2 (#18625)
ztang2370 May 24, 2025
1cb194a
[Doc] Reorganize user guide (#18661)
DarkLight1337 May 24, 2025
2e67057
[CI/Build] `chmod +x` to `cleanup_pr_body.sh` (#18650)
DarkLight1337 May 24, 2025
4ceafb6
[MISC] typo fix and clean import (#18664)
andyxning May 24, 2025
b9018a3
[BugFix] Fix import error for fused_moe (#18642)
wangxiyuan May 24, 2025
2807271
[CI] enforce import regex instead of re (#18665)
aarnphm May 24, 2025
9ea7f1a
fix(regression): clone from reference items (#18662)
aarnphm May 24, 2025
b554ab7
[CI/Build] fix permission denied issue (#18645)
reidliu41 May 24, 2025
6825d9a
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Dec…
WoosukKwon May 25, 2025
7891fdf
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_mod…
eicherseiji May 25, 2025
6c6dcd8
[MISC] correct signature for LoaderFunction (#18670)
andyxning May 25, 2025
cebc22f
[Misc]Replace `cuda` hard code with `current_platform` in Ray (#14668)
noemotiovon May 25, 2025
6ab681b
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655)
MengqingCao May 25, 2025
75f8175
[VLM] Initialize video input support for InternVL models (#18499)
Isotr0py May 25, 2025
6393454
Speed up the `kernels/quantization/` tests (#18669)
mgoin May 25, 2025
44073a7
[BUGFIX] catch subclass first for try...except (#18672)
andyxning May 25, 2025
503f848
[Misc] Reduce logs on startup (#18649)
DarkLight1337 May 25, 2025
624b77a
[doc] fix broken links (#18671)
reidliu41 May 25, 2025
279f854
[doc] improve readability (#18675)
reidliu41 May 25, 2025
f2faac7
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environme…
zzzyq May 25, 2025
35be8fa
[CI/build] fix no regex (#18676)
reidliu41 May 25, 2025
3a886bd
[Misc] small improve (#18680)
reidliu41 May 25, 2025
57fd13a
[Bugfix] Fix profiling dummy data for Pixtral (#18677)
DarkLight1337 May 25, 2025
6071e98
[Core][Multimodal] Convert PIL Image to array without data copy when …
lgeiger May 25, 2025
fba0642
[CI/Build][Doc] Update `gte-Qwen2-1.5B-instruct` usage (#18683)
DarkLight1337 May 26, 2025
8820821
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation …
zhaohaidao May 26, 2025
abd4030
refactor: simplify request handler, use positive condition check for …
googs1025 May 26, 2025
561b77a
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357)
maxdebayser May 26, 2025
4ea62c0
[CI] add missing argument (#18694)
andyxning May 26, 2025
4b7740a
[GH] Add issue template for reporting CI failures (#18696)
DarkLight1337 May 26, 2025
65523a0
[Doc] Fix issue template format (#18699)
DarkLight1337 May 26, 2025
61a45e7
[Bugfix] Fix Mistral-format models with sliding window (#18693)
DarkLight1337 May 26, 2025
38b13df
[CI/Build] Replace `math.isclose` with `pytest.approx` (#18703)
DarkLight1337 May 26, 2025
5a2c76c
[CI] fix dump_input for str type (#18697)
andyxning May 26, 2025
6d68030
[Model] Add support for YARN in NemotronNAS models (#18427)
Naveassaf May 26, 2025
0877750
[CI/Build] Split pooling and generation extended language models test…
Isotr0py May 26, 2025
e76be06
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test …
ldurejko May 26, 2025
0665e29
[Misc] add AutoGen integration (#18712)
reidliu41 May 26, 2025
243eb91
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM …
YanWuHao May 26, 2025
9553fdb
[Doc] Improve API docs (#18713)
DarkLight1337 May 26, 2025
82e2339
[Doc] Move examples and further reorganize user guide (#18666)
DarkLight1337 May 26, 2025
a869bac
[Bugfix] Fix Llama GGUF initialization (#18717)
DarkLight1337 May 26, 2025
e7523c2
[V1][Sampler] Improve performance of FlashInfer sampling by sampling …
lgeiger May 26, 2025
27bebcd
Convert `examples` to `ruff-format` (#18400)
hmellor May 26, 2025
0eebd74
[Model][Gemma3] Simplify image input validation (#18710)
lgeiger May 27, 2025
1f88dbd
[Misc] improve web section group title display (#18684)
reidliu41 May 27, 2025
1f1b1bc
[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646)
Isotr0py May 27, 2025
b50602d
[Model][Gemma3] Cast image pixel values already on CPU (#18732)
lgeiger May 27, 2025
d260f79
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271)
vllmellm May 27, 2025
25a817f
[Doc] Update OOT model docs (#18742)
DarkLight1337 May 27, 2025
753944f
[Doc] Update reproducibility doc and example (#18741)
DarkLight1337 May 27, 2025
fc6d0c2
[Misc] improve docs (#18734)
reidliu41 May 27, 2025
a547aeb
feat(rocm-support): support mamba2 on rocm (#18565)
almersawi May 27, 2025
bbd9a84
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the …
ldurejko May 27, 2025
4693a34
[Doc] cleanup deprecated flag for doc (#18715)
calvin0327 May 27, 2025
c24b157
Minor fix about MooncakeStoreConnector (#18721)
maobaolong May 27, 2025
e0f0ff8
[Build] fix cpu build missing libtbbmalloc.so (#18744)
kebe7jun May 27, 2025
6881107
[BUG FIX] minicpm (#18739)
huangyuxiang03 May 27, 2025
a68e293
[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...…
Zerohertz May 27, 2025
4318c05
[CI/Build] Remove imports of built-in `re` (#18750)
DarkLight1337 May 27, 2025
06a0338
[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17…
markmc May 27, 2025
aaa4ac1
Disable prefix cache by default for benchmark (#18639)
cascade812 May 27, 2025
6b6d496
optimize get_kv_cache_torch_dtype (#18531)
chunxiaozheng May 27, 2025
696259c
[Core] Automatically cast multi-modal input dtype (#18756)
DarkLight1337 May 27, 2025
5873877
[Bugfix] Mistral tool calling when content is list (#18729)
mgoin May 27, 2025
1c450a5
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras May 27, 2025
d5e35a9
Merge remote-tracking branch 'origin/main' into upstream_merge_2025_0…
gshtras May 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 0 additions & 5 deletions .buildkite/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@

[tool.ruff]
line-length = 88
exclude = [
# External file, leaving license intact
"examples/other/fp8/quantizer/quantize.py",
"vllm/vllm_flash_attn/flash_attn_interface.pyi"
]

[tool.ruff.lint.per-file-ignores]
"vllm/third_party/**" = ["ALL"]
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ steps:
- "docker push vllm/vllm-tpu:$BUILDKITE_COMMIT"
plugins:
- docker-login#v3.0.0:
username: vllm
username: vllmbot
password-env: DOCKERHUB_TOKEN
env:
DOCKER_BUILDKIT: "1"
Expand Down
12 changes: 7 additions & 5 deletions .buildkite/scripts/hardware_ci/run-hpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,17 @@ docker build -t hpu-test-env -f docker/Dockerfile.hpu .
# Setup cleanup
# certain versions of HPU software stack have a bug that can
# override the exit code of the script, so we need to use
# separate remove_docker_container and remove_docker_container_and_exit
# separate remove_docker_containers and remove_docker_containers_and_exit
# functions, while other platforms only need one remove_docker_container
# function.
EXITCODE=1
remove_docker_container() { docker rm -f hpu-test || true; }
remove_docker_container_and_exit() { remove_docker_container; exit $EXITCODE; }
trap remove_docker_container_and_exit EXIT
remove_docker_container
remove_docker_containers() { docker rm -f hpu-test || true; docker rm -f hpu-test-tp2 || true; }
remove_docker_containers_and_exit() { remove_docker_containers; exit $EXITCODE; }
trap remove_docker_containers_and_exit EXIT
remove_docker_containers

# Run the image and launch offline inference
docker run --runtime=habana --name=hpu-test --network=host -e HABANA_VISIBLE_DEVICES=all -e VLLM_SKIP_WARMUP=true --entrypoint="" hpu-test-env python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m
docker run --runtime=habana --name=hpu-test-tp2 --network=host -e HABANA_VISIBLE_DEVICES=all -e VLLM_SKIP_WARMUP=true --entrypoint="" hpu-test-env python3 examples/offline_inference/basic/generate.py --model facebook/opt-125m --tensor-parallel-size 2

EXITCODE=$?
13 changes: 11 additions & 2 deletions .buildkite/scripts/hardware_ci/run-neuron-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,14 @@ container_name="neuron_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"
HF_CACHE="$(realpath ~)/huggingface"
mkdir -p "${HF_CACHE}"
HF_MOUNT="/root/.cache/huggingface"
HF_TOKEN=$(aws secretsmanager get-secret-value --secret-id "ci/vllm-neuron/hf-token" --region us-west-2 --query 'SecretString' --output text | jq -r .VLLM_NEURON_CI_HF_TOKEN)

NEURON_COMPILE_CACHE_URL="$(realpath ~)/neuron_compile_cache"
mkdir -p "${NEURON_COMPILE_CACHE_URL}"
NEURON_COMPILE_CACHE_MOUNT="/root/.cache/neuron_compile_cache"

# Try building the docker image
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-west-2.amazonaws.com
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

# prune old image and containers to save disk space, and only once a day
# by using a timestamp file in tmp.
Expand Down Expand Up @@ -47,8 +48,16 @@ trap remove_docker_container EXIT
docker run --rm -it --device=/dev/neuron0 --network bridge \
-v "${HF_CACHE}:${HF_MOUNT}" \
-e "HF_HOME=${HF_MOUNT}" \
-e "HF_TOKEN=${HF_TOKEN}" \
-v "${NEURON_COMPILE_CACHE_URL}:${NEURON_COMPILE_CACHE_MOUNT}" \
-e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
--name "${container_name}" \
${image_name} \
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/1_core/ -v --capture=tee-sys && python3 -m pytest /workspace/vllm/tests/neuron/2_core/ -v --capture=tee-sys"
/bin/bash -c "
python3 /workspace/vllm/examples/offline_inference/neuron.py;
python3 -m pytest /workspace/vllm/tests/neuron/1_core/ -v --capture=tee-sys;
for f in /workspace/vllm/tests/neuron/2_core/*.py; do
echo 'Running test file: '$f;
python3 -m pytest \$f -v --capture=tee-sys;
done
"
40 changes: 26 additions & 14 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,13 @@ steps:

- label: Documentation Build # 2min
mirror_hardwares: [amdexperimental]
working_dir: "/vllm-workspace/test_docs/docs"
working_dir: "/vllm-workspace/test_docs"
fast_check: true
no_gpu: True
commands:
- pip install -r ../../requirements/docs.txt
- SPHINXOPTS=\"-W\" make html
# Check API reference (if it fails, you may have missing mock imports)
- grep \"sig sig-object py\" build/html/api/vllm/vllm.sampling_params.html
- pip install -r ../requirements/docs.txt
# TODO: add `--strict` once warnings in docstrings are fixed
- mkdocs build

- label: Async Engine, Inputs, Utils, Worker Test # 24min
mirror_hardwares: [amdexperimental]
Expand All @@ -59,6 +58,7 @@ steps:
- pytest -v -s async_engine # AsyncLLMEngine
- NUM_SCHEDULER_STEPS=4 pytest -v -s async_engine/test_async_llm_engine.py
- pytest -v -s test_inputs.py
- pytest -v -s test_outputs.py
- pytest -v -s multimodal
- pytest -v -s test_utils.py # Utils
- pytest -v -s worker # Worker
Expand Down Expand Up @@ -128,7 +128,7 @@ steps:
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
- pytest -v -s entrypoints/llm/test_generate_multiple_loras.py # it needs a clean process
- VLLM_USE_V1=0 pytest -v -s entrypoints/llm/test_guided_generate.py # it needs a clean process
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/correctness/ --ignore=entrypoints/openai/test_openai_schema.py
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_chat_with_tool_reasoning.py --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/test_tensorizer_entrypoint.py --ignore=entrypoints/openai/correctness/
- pytest -v -s entrypoints/test_chat_utils.py
- VLLM_USE_V1=0 pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests

Expand All @@ -141,6 +141,7 @@ steps:
- vllm/core/
- tests/distributed/test_utils
- tests/distributed/test_pynccl
- tests/distributed/test_events
- tests/spec_decode/e2e/test_integration_dist_tp4
- tests/compile/test_basic_correctness
- examples/offline_inference/rlhf.py
Expand All @@ -159,6 +160,7 @@ steps:
- pytest -v -s distributed/test_utils.py
- pytest -v -s compile/test_basic_correctness.py
- pytest -v -s distributed/test_pynccl.py
- pytest -v -s distributed/test_events.py
- pytest -v -s spec_decode/e2e/test_integration_dist_tp4.py
# TODO: create a dedicated test section for multi-GPU example tests
# when we have multiple distributed example tests
Expand Down Expand Up @@ -224,6 +226,7 @@ steps:
- pytest -v -s v1/test_serial_utils.py
- pytest -v -s v1/test_utils.py
- pytest -v -s v1/test_oracle.py
- pytest -v -s v1/test_metrics_reader.py
# TODO: accuracy does not match, whether setting
# VLLM_USE_FLASHINFER_SAMPLER or not on H100.
- pytest -v -s v1/e2e
Expand All @@ -248,7 +251,7 @@ steps:
- python3 offline_inference/vision_language.py --seed 0
- python3 offline_inference/vision_language_embedding.py --seed 0
- python3 offline_inference/vision_language_multi_image.py --seed 0
- VLLM_USE_V1=0 python3 other/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 other/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- VLLM_USE_V1=0 python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/encoder_decoder.py
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
- python3 offline_inference/basic/classify.py
Expand Down Expand Up @@ -320,6 +323,7 @@ steps:
- pytest -v -s compile/test_fusion.py
- pytest -v -s compile/test_silu_mul_quant_fusion.py
- pytest -v -s compile/test_sequence_parallelism.py
- pytest -v -s compile/test_async_tp.py

- label: PyTorch Fullgraph Smoke Test # 9min
mirror_hardwares: [amdexperimental, amdproduction]
Expand Down Expand Up @@ -397,10 +401,12 @@ steps:
source_file_dependencies:
- vllm/model_executor/model_loader
- tests/tensorizer_loader
- tests/entrypoints/openai/test_tensorizer_entrypoint.py
commands:
- apt-get update && apt-get install -y curl libsodium23
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -v -s tensorizer_loader
- pytest -v -s entrypoints/openai/test_tensorizer_entrypoint.py

- label: Benchmarks # 9min
mirror_hardwares: [amdexperimental, amdproduction]
Expand Down Expand Up @@ -479,10 +485,7 @@ steps:
- pytest -v -s models/test_registry.py
- pytest -v -s models/test_utils.py
- pytest -v -s models/test_vision.py
# V1 Test: https://github.com/vllm-project/vllm/issues/14531
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'not llama4 and not plamo2'
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'llama4'
- VLLM_USE_V1=0 pytest -v -s models/test_initialization.py -k 'plamo2'
- pytest -v -s models/test_initialization.py

- label: Language Models Test (Standard)
mirror_hardwares: [amdexperimental]
Expand All @@ -496,16 +499,25 @@ steps:
- pip freeze | grep -E 'torch'
- pytest -v -s models/language -m core_model

- label: Language Models Test (Extended)
- label: Language Models Test (Extended Generation) # 1hr20min
mirror_hardwares: [amdexperimental]
optional: true
source_file_dependencies:
- vllm/
- tests/models/language
- tests/models/language/generation
commands:
# Install causal-conv1d for plamo2 models here, as it is not compatible with pip-compile.
- pip install 'git+https://github.com/Dao-AILab/[email protected]'
- pytest -v -s models/language -m 'not core_model'
- pytest -v -s models/language/generation -m 'not core_model'

- label: Language Models Test (Extended Pooling) # 36min
mirror_hardwares: [amdexperimental]
optional: true
source_file_dependencies:
- vllm/
- tests/models/language/pooling
commands:
- pytest -v -s models/language/pooling -m 'not core_model'

- label: Multi-Modal Models Test (Standard)
mirror_hardwares: [amdexperimental]
Expand Down
6 changes: 3 additions & 3 deletions .github/ISSUE_TEMPLATE/400-bug-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,14 @@ body:
required: true
- type: markdown
attributes:
value: >
⚠️ Please separate bugs of `transformers` implementation or usage from bugs of `vllm`. If you think anything is wrong with the models' output:
value: |
⚠️ Please separate bugs of `transformers` implementation or usage from bugs of `vllm`. If you think anything is wrong with the model's output:

- Try the counterpart of `transformers` first. If the error appears, please go to [their issues](https://github.com/huggingface/transformers/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc).

- If the error only appears in vllm, please provide the detailed script of how you run `transformers` and `vllm`, also highlight the difference and what you expect.

Thanks for contributing 🎉!
Thanks for reporting 🙏!
- type: checkboxes
id: askllm
attributes:
Expand Down
69 changes: 69 additions & 0 deletions .github/ISSUE_TEMPLATE/450-ci-failure.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: 🧪 CI failure report
description: Report a failing test.
title: "[CI Failure]: "
labels: ["ci-failure"]

body:
- type: markdown
attributes:
value: >
#### Include the name of the failing Buildkite step and test file in the title.
- type: input
attributes:
label: Name of failing test
description: |
Paste in the fully-qualified name of the failing test from the logs.
placeholder: |
`path/to/test_file.py::test_name[params]`
validations:
required: true
- type: checkboxes
attributes:
label: Basic information
description: Select all items that apply to the failing test.
options:
- label: Flaky test
- label: Can reproduce locally
- label: Caused by external libraries (e.g. bug in `transformers`)
- type: textarea
attributes:
label: 🧪 Describe the failing test
description: |
Please provide a clear and concise description of the failing test.
placeholder: |
A clear and concise description of the failing test.

```
The error message you got, with the full traceback and the error logs with [dump_input.py:##] if present.
```
validations:
required: true
- type: textarea
attributes:
label: 📝 History of failing test
description: |
Since when did the test start to fail?
You can look up its history via [Buildkite Test Suites](https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main).

If you have time, identify the PR that caused the test to fail on main. You can do so via the following methods:

- Use Buildkite Test Suites to find the PR where the test failure first occurred, and reproduce the failure locally.

- Run [`git bisect`](https://git-scm.com/docs/git-bisect) locally.

- Manually unblock Buildkite steps for suspected PRs on main and check the results. (authorized users only)
placeholder: |
Approximate timeline and/or problematic PRs

A link to the Buildkite analytics of the failing test (if available)
validations:
required: true
- type: textarea
attributes:
label: CC List.
description: >
The list of people you want to CC. Usually, this includes those who worked on the PR that failed the test.
- type: markdown
attributes:
value: >
Thanks for reporting 🙏!
6 changes: 2 additions & 4 deletions .github/mergify.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ pull_request_rules:
- files~=^benchmarks/structured_schemas/
- files=benchmarks/benchmark_serving_structured_output.py
- files=benchmarks/run_structured_output_benchmark.sh
- files=docs/source/features/structured_outputs.md
- files=docs/features/structured_outputs.md
- files=examples/offline_inference/structured_outputs.py
- files=examples/online_serving/openai_chat_completion_structured_outputs.py
- files=examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py
Expand Down Expand Up @@ -135,9 +135,7 @@ pull_request_rules:
- files~=^tests/entrypoints/openai/tool_parsers/
- files=tests/entrypoints/openai/test_chat_with_tool_reasoning.py
- files~=^vllm/entrypoints/openai/tool_parsers/
- files=docs/source/features/tool_calling.md
- files=docs/source/getting_started/examples/openai_chat_completion_client_with_tools.md
- files=docs/source/getting_started/examples/chat_with_tools.md
- files=docs/features/tool_calling.md
- files~=^examples/tool_chat_*
- files=examples/offline_inference/chat_with_tools.py
- files=examples/online_serving/openai_chat_completion_client_with_tools_required.py
Expand Down
2 changes: 1 addition & 1 deletion .github/scripts/cleanup_pr_body.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ sed -i '/\*\*BEFORE SUBMITTING, PLEASE READ.*\*\*/,$d' "${NEW}"

# Remove HTML <details> section that includes <summary> text of "PR Checklist (Click to Expand)"
python3 - <<EOF
import re
import regex as re

with open("${NEW}", "r") as file:
content = file.read()
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/cleanup_pr_body.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@ jobs:
with:
python-version: '3.12'

- name: Install Python dependencies
run: |
python3 -m pip install --upgrade pip
python3 -m pip install regex

- name: Update PR description
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: .github/scripts/cleanup_pr_body.sh "${{ github.event.number }}"
run: bash .github/scripts/cleanup_pr_body.sh "${{ github.event.number }}"
6 changes: 1 addition & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,6 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/
docs/source/getting_started/examples/
docs/source/api/vllm

# PyBuilder
.pybuilder/
target/
Expand Down Expand Up @@ -151,6 +146,7 @@ venv.bak/

# mkdocs documentation
/site
docs/examples

# mypy
.mypy_cache/
Expand Down
18 changes: 17 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ repos:
- id: ruff
args: [--output-format, github, --fix]
- id: ruff-format
files: ^(.buildkite|benchmarks)/.*
files: ^(.buildkite|benchmarks|examples)/.*
- repo: https://github.com/codespell-project/codespell
rev: v2.4.1
hooks:
Expand All @@ -39,6 +39,7 @@ repos:
rev: v0.9.29
hooks:
- id: pymarkdown
exclude: '.*\.inc\.md'
args: [fix]
- repo: https://github.com/rhysd/actionlint
rev: v1.7.7
Expand Down Expand Up @@ -127,6 +128,21 @@ repos:
name: Update Dockerfile dependency graph
entry: tools/update-dockerfile-graph.sh
language: script
- id: enforce-import-regex-instead-of-re
name: Enforce import regex as re
entry: python tools/enforce_regex_import.py
language: python
types: [python]
pass_filenames: false
additional_dependencies: [regex]
# forbid directly import triton
- id: forbid-direct-triton-import
name: "Forbid direct 'import triton'"
entry: python tools/check_triton_import.py
language: python
types: [python]
pass_filenames: false
additional_dependencies: [regex]
# Keep `suggestion` last
- id: suggestion
name: Suggestion
Expand Down
Loading
Loading