Skip to content

Commit d6a8d63

Browse files
committed
[Misc] format patch to make the code clear
Signed-off-by: wangxiyuan <[email protected]>
1 parent 90aabae commit d6a8d63

File tree

16 files changed

+33
-39
lines changed

16 files changed

+33
-39
lines changed

docs/source/developer_guide/versioning_policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Usually, each minor version of vLLM (such as 0.7) will correspond to a vllm-asce
5050

5151
For main branch, vllm-ascend should works with vLLM main branch and latest 1 or 2 release version. So to ensure the backward compatibility, we will do the following:
5252
- Both main branch and target vLLM release is tested by Ascend E2E CI. For example, currently, vLLM main branch and vLLM 0.8.4 are tested now.
53-
- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vllm-ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM pacakge first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
53+
- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vllm-ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM package first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
5454
- For documentation changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. Note should be added if there are any breaking changes.
5555

5656
## Document Branch Policy

docs/source/faqs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [th
8383

8484
### 10. Does vllm-ascend support quantization method?
8585

86-
Currently, w8a8 quantization is already supported by vllm-ascend originally on v0.8.4rc2 or heigher, If you're using vllm 0.7.3 version, w8a8 quantization is supporeted with the integration of vllm-ascend and mindie-turbo, please use `pip install vllm-ascend[mindie-turbo]`.
86+
Currently, w8a8 quantization is already supported by vllm-ascend originally on v0.8.4rc2 or higher, If you're using vllm 0.7.3 version, w8a8 quantization is supporeted with the integration of vllm-ascend and mindie-turbo, please use `pip install vllm-ascend[mindie-turbo]`.
8787

8888
### 11. How to run w8a8 DeepSeek model?
8989

docs/source/tutorials/multi_npu_quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Run docker container:
44
:::{note}
5-
w8a8 quantization feature is supported by v0.8.4rc2 or highter
5+
w8a8 quantization feature is supported by v0.8.4rc2 or higher
66
:::
77

88
```{code-block} bash

docs/source/user_guide/release_notes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ This is the second release candidate of v0.8.4 for vllm-ascend. Please follow th
1010
- DeepSeek V3/R1 works with DP, TP and MTP now. Please note that it's still in experimental status. Let us know if you hit any problem. [#429](https://github.com/vllm-project/vllm-ascend/pull/429) [#585](https://github.com/vllm-project/vllm-ascend/pull/585) [#626](https://github.com/vllm-project/vllm-ascend/pull/626) [#636](https://github.com/vllm-project/vllm-ascend/pull/636) [#671](https://github.com/vllm-project/vllm-ascend/pull/671)
1111

1212
### Core
13-
- ACLGraph feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it avaiable by default in the next release [#426](https://github.com/vllm-project/vllm-ascend/pull/426)
14-
- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. [#661](https://github.com/vllm-project/vllm-ascend/pull/661)
13+
- ACLGraph feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it available by default in the next release [#426](https://github.com/vllm-project/vllm-ascend/pull/426)
14+
- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automatically. [#661](https://github.com/vllm-project/vllm-ascend/pull/661)
1515

1616
### Other
1717
- MiniCPM model works now. [#645](https://github.com/vllm-project/vllm-ascend/pull/645)

tests/singlecard/spec_decode/test_spec_decode_worker.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -589,7 +589,6 @@ def test_empty_input_batch(k: int, batch_size: int,
589589

590590
@pytest.mark.parametrize("acceptance_sampler_method",
591591
["rejection_sampler", "typical_acceptance_sampler"])
592-
@pytest.mark.skip_global_cleanup
593592
def test_init_device(acceptance_sampler_method: str):
594593
"""Verify SpecDecodeWorker invokes proposer/scorer worker init_device, as
595594
well as other GPU initialization.
@@ -646,7 +645,6 @@ def test_initialize_cache(acceptance_sampler_method):
646645
@pytest.mark.parametrize('draft_kv_size_bytes', [0, 2 * 2 * 768, 2 * 2 * 4096])
647646
@pytest.mark.parametrize("acceptance_sampler_method",
648647
["rejection_sampler", "typical_acceptance_sampler"])
649-
@pytest.mark.skip_global_cleanup
650648
def test_determine_num_available_blocks(available_gpu_blocks: int,
651649
available_cpu_blocks: int,
652650
target_cache_block_size_bytes: int,
@@ -685,7 +683,6 @@ def test_determine_num_available_blocks(available_gpu_blocks: int,
685683
@pytest.mark.parametrize('target_cache_block_size_bytes',
686684
[2 * 2 * 4096, 2 * 2 * 8192])
687685
@pytest.mark.parametrize('draft_kv_size_bytes', [0, 2 * 2 * 768, 2 * 2 * 4096])
688-
@pytest.mark.skip_global_cleanup
689686
def test_split_num_cache_blocks_evenly(available_gpu_blocks: int,
690687
target_cache_block_size_bytes: int,
691688
draft_kv_size_bytes: int):

vllm_ascend/__init__.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,9 @@
1818

1919
def register():
2020
"""Register the NPU platform."""
21-
2221
return "vllm_ascend.platform.NPUPlatform"
2322

2423

2524
def register_model():
26-
from .models import register_model
25+
from vllm_ascend.models import register_model
2726
register_model()

vllm_ascend/models/__init__.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,13 @@
22

33

44
def register_model():
5-
from .deepseek_mtp import CustomDeepSeekMTP # noqa: F401
6-
from .deepseek_v2 import CustomDeepseekV2ForCausalLM # noqa: F401
7-
from .deepseek_v2 import CustomDeepseekV3ForCausalLM # noqa: F401
8-
from .qwen2_vl import CustomQwen2VLForConditionalGeneration # noqa: F401
5+
from vllm_ascend.models.deepseek_mtp import CustomDeepSeekMTP # noqa: F401
6+
from vllm_ascend.models.deepseek_v2 import \
7+
CustomDeepseekV2ForCausalLM # noqa: F401
8+
from vllm_ascend.models.deepseek_v2 import \
9+
CustomDeepseekV3ForCausalLM # noqa: F401
10+
from vllm_ascend.models.qwen2_vl import \
11+
CustomQwen2VLForConditionalGeneration # noqa: F401
912

1013
ModelRegistry.register_model(
1114
"DeepSeekMTPModel",

vllm_ascend/models/deepseek_mtp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
from vllm.model_executor.sampling_metadata import SamplingMetadata
3838
from vllm.sequence import IntermediateTensors
3939

40-
from .deepseek_v2 import CustomDeepseekV2DecoderLayer
40+
from vllm_ascend.models.deepseek_v2 import CustomDeepseekV2DecoderLayer
4141

4242

4343
class CustomDeepSeekMultiTokenPredictorLayer(DeepSeekMultiTokenPredictorLayer):

vllm_ascend/patch/__init__.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -90,14 +90,14 @@
9090
# ===============
9191
# ** File: worker/patch_common/patch_metrics.py **
9292
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93-
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
93+
# 1. `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
9494
# Why:
9595
# There are cuda hard code (current_platform.is_cuda_alike()) in
96-
# `AsyncMetricsCollector.maybe_collect_rejsample_metrics`
96+
# `AsyncMetricsCollector._copy_rejsample_metrics_async`
9797
# How:
9898
# Change to use `current_platform.Event` to determine whether to return None
99-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
100-
# https://github.com/vllm-project/vllm/pull/14411
99+
# Related PR (if no, explain why):
100+
# Need a PR to vllm to fix the issue.
101101
# Future Plan:
102102
# Revert it when the related pr is merged in vllm.
103103
#
@@ -110,7 +110,7 @@
110110
# However float32 is not supported in cann rope op, thus we keep this patch
111111
# How:
112112
# Removed the dtype convert operations in forward
113-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
113+
# Related PR (if no, explain why):
114114
# NO, only for npu due to rope op.
115115
# Future Plan:
116116
# Keep this patch in vllm-ascend.
@@ -126,7 +126,7 @@
126126
# - support attention metadata register to the set supported spec decode
127127
# - offer a api in platform to determine whether spec decode is supported,
128128
# and deprecate is_cuda_alike in it.
129-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
129+
# Related PR (if no, explain why):
130130
# - https://github.com/vllm-project/vllm/pull/15195
131131
# - https://github.com/vllm-project/vllm-ascend/pull/395
132132
# Future Plan:
@@ -138,7 +138,7 @@
138138
# vLLM `Remove Sampler from Model Code` so vllm-ascend needs adapt to this change.
139139
# How:
140140
# Use vLLM 0.8.4 method to patch it.
141-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
141+
# Related PR (if no, explain why):
142142
# - https://github.com/vllm-project/vllm/pull/15195
143143
# - https://github.com/vllm-project/vllm-ascend/pull/395
144144
# Future Plan:
@@ -153,7 +153,7 @@
153153
# `FlashAttentionMetadata`
154154
# How:
155155
# ditto
156-
# Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
156+
# Related PR (if no, explain why):
157157
# - https://github.com/vllm-project/vllm/pull/15195
158158
# - https://github.com/vllm-project/vllm-ascend/pull/395
159159
# Future Plan:

vllm_ascend/patch/worker/patch_common/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# limitations under the License.
1616
#
1717

18+
import vllm_ascend.patch.worker.patch_common.patch_cache_engine # noqa
1819
import vllm_ascend.patch.worker.patch_common.patch_metrics # noqa
1920
import vllm_ascend.patch.worker.patch_common.patch_minicpm # noqa
2021
import vllm_ascend.patch.worker.patch_common.patch_multi_step_worker # noqa

0 commit comments

Comments
 (0)