[Misc] format patch to make the code clear

wangxiyuan · wangxiyuan · commit d6a8d63289e9 · 2025-04-30T12:08:38.000+08:00
Signed-off-by: wangxiyuan &lt;wangxiyuan1007@gmail.com&gt;
diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md
@@ -50,7 +50,7 @@ Usually, each minor version of vLLM (such as 0.7) will correspond to a vllm-asce
 
 For main branch, vllm-ascend should works with vLLM main branch and latest 1 or 2 release version. So to ensure the backward compatibility, we will do the following:
 - Both main branch and target vLLM release is tested by Ascend E2E CI. For example, currently, vLLM main branch and vLLM 0.8.4 are tested now.
-- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vllm-ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM pacakge first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
+- For code changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. In this case, vllm-ascend introduced a version check machinism inner the code. It'll check the version of installed vLLM package first to decide which code logic to use. If users hit the `InvalidVersion` error, it sometimes means that they have installed an dev/editable version of vLLM package. In this case, we provide the env variable `VLLM_VERSION` to let users specify the version of vLLM package to use.
 - For documentation changes, we will make sure that the changes are compatible with the latest 1 or 2 vLLM release version as well. Note should be added if there are any breaking changes.
 
 ## Document Branch Policy
diff --git a/docs/source/faqs.md b/docs/source/faqs.md
@@ -83,7 +83,7 @@ Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [th
 
 ### 10. Does vllm-ascend support quantization method?
 
-Currently, w8a8 quantization is already supported by vllm-ascend originally on v0.8.4rc2 or heigher, If you're using vllm 0.7.3 version, w8a8 quantization is supporeted with the integration of vllm-ascend and mindie-turbo, please use `pip install vllm-ascend[mindie-turbo]`.
+Currently, w8a8 quantization is already supported by vllm-ascend originally on v0.8.4rc2 or higher, If you're using vllm 0.7.3 version, w8a8 quantization is supporeted with the integration of vllm-ascend and mindie-turbo, please use `pip install vllm-ascend[mindie-turbo]`.
 
 ### 11. How to run w8a8 DeepSeek model?
 
diff --git a/docs/source/tutorials/multi_npu_quantization.md b/docs/source/tutorials/multi_npu_quantization.md
@@ -2,7 +2,7 @@
 
 ## Run docker container:
 :::{note}
-w8a8 quantization feature is supported by v0.8.4rc2 or highter
+w8a8 quantization feature is supported by v0.8.4rc2 or higher
 :::
 
 ```{code-block} bash
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
@@ -10,8 +10,8 @@ This is the second release candidate of v0.8.4 for vllm-ascend. Please follow th
 - DeepSeek V3/R1 works with DP, TP and MTP now. Please note that it's still in experimental status. Let us know if you hit any problem. [#429](https://github.com/vllm-project/vllm-ascend/pull/429) [#585](https://github.com/vllm-project/vllm-ascend/pull/585)  [#626](https://github.com/vllm-project/vllm-ascend/pull/626) [#636](https://github.com/vllm-project/vllm-ascend/pull/636) [#671](https://github.com/vllm-project/vllm-ascend/pull/671)
 
 ### Core
-- ACLGraph feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it avaiable by default in the next release [#426](https://github.com/vllm-project/vllm-ascend/pull/426)
-- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. [#661](https://github.com/vllm-project/vllm-ascend/pull/661)
+- ACLGraph feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it available by default in the next release [#426](https://github.com/vllm-project/vllm-ascend/pull/426)
+- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automatically. [#661](https://github.com/vllm-project/vllm-ascend/pull/661)
 
 ### Other
 - MiniCPM model works now. [#645](https://github.com/vllm-project/vllm-ascend/pull/645)
diff --git a/tests/singlecard/spec_decode/test_spec_decode_worker.py b/tests/singlecard/spec_decode/test_spec_decode_worker.py
@@ -589,7 +589,6 @@ def test_empty_input_batch(k: int, batch_size: int,
 
 @pytest.mark.parametrize("acceptance_sampler_method",
                          ["rejection_sampler", "typical_acceptance_sampler"])
-@pytest.mark.skip_global_cleanup
 def test_init_device(acceptance_sampler_method: str):
     """Verify SpecDecodeWorker invokes proposer/scorer worker init_device, as
     well as other GPU initialization.
@@ -646,7 +645,6 @@ def test_initialize_cache(acceptance_sampler_method):
 @pytest.mark.parametrize('draft_kv_size_bytes', [0, 2 * 2 * 768, 2 * 2 * 4096])
 @pytest.mark.parametrize("acceptance_sampler_method",
                          ["rejection_sampler", "typical_acceptance_sampler"])
-@pytest.mark.skip_global_cleanup
 def test_determine_num_available_blocks(available_gpu_blocks: int,
                                         available_cpu_blocks: int,
                                         target_cache_block_size_bytes: int,
@@ -685,7 +683,6 @@ def test_determine_num_available_blocks(available_gpu_blocks: int,
 @pytest.mark.parametrize('target_cache_block_size_bytes',
                          [2 * 2 * 4096, 2 * 2 * 8192])
 @pytest.mark.parametrize('draft_kv_size_bytes', [0, 2 * 2 * 768, 2 * 2 * 4096])
-@pytest.mark.skip_global_cleanup
 def test_split_num_cache_blocks_evenly(available_gpu_blocks: int,
                                        target_cache_block_size_bytes: int,
                                        draft_kv_size_bytes: int):
diff --git a/vllm_ascend/__init__.py b/vllm_ascend/__init__.py
@@ -18,10 +18,9 @@
 
 def register():
     """Register the NPU platform."""
-
     return "vllm_ascend.platform.NPUPlatform"
 
 
 def register_model():
-    from .models import register_model
+    from vllm_ascend.models import register_model
     register_model()
diff --git a/vllm_ascend/models/__init__.py b/vllm_ascend/models/__init__.py
@@ -2,10 +2,13 @@
 
 
 def register_model():
-    from .deepseek_mtp import CustomDeepSeekMTP  # noqa: F401
-    from .deepseek_v2 import CustomDeepseekV2ForCausalLM  # noqa: F401
-    from .deepseek_v2 import CustomDeepseekV3ForCausalLM  # noqa: F401
-    from .qwen2_vl import CustomQwen2VLForConditionalGeneration  # noqa: F401
+    from vllm_ascend.models.deepseek_mtp import CustomDeepSeekMTP  # noqa: F401
+    from vllm_ascend.models.deepseek_v2 import \
+        CustomDeepseekV2ForCausalLM  # noqa: F401
+    from vllm_ascend.models.deepseek_v2 import \
+        CustomDeepseekV3ForCausalLM  # noqa: F401
+    from vllm_ascend.models.qwen2_vl import \
+        CustomQwen2VLForConditionalGeneration  # noqa: F401
 
     ModelRegistry.register_model(
         "DeepSeekMTPModel",
diff --git a/vllm_ascend/models/deepseek_mtp.py b/vllm_ascend/models/deepseek_mtp.py
@@ -37,7 +37,7 @@
 from vllm.model_executor.sampling_metadata import SamplingMetadata
 from vllm.sequence import IntermediateTensors
 
-from .deepseek_v2 import CustomDeepseekV2DecoderLayer
+from vllm_ascend.models.deepseek_v2 import CustomDeepseekV2DecoderLayer
 
 
 class CustomDeepSeekMultiTokenPredictorLayer(DeepSeekMultiTokenPredictorLayer):
diff --git a/vllm_ascend/patch/__init__.py b/vllm_ascend/patch/__init__.py
@@ -90,14 +90,14 @@
 # ===============
 # ** File: worker/patch_common/patch_metrics.py **
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-#   1. `vllm.spec_decode.metrics.AsyncMetricsCollector.maybe_collect_rejsample_metrics`
+#   1. `vllm.spec_decode.metrics.AsyncMetricsCollector._copy_rejsample_metrics_async`
 #    Why:
 #       There are cuda hard code (current_platform.is_cuda_alike()) in
-#       `AsyncMetricsCollector.maybe_collect_rejsample_metrics`
+#       `AsyncMetricsCollector._copy_rejsample_metrics_async`
 #    How：
 #       Change to use `current_platform.Event` to determine whether to return None
-#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
-#       https://github.com/vllm-project/vllm/pull/14411
+#    Related PR (if no, explain why):
+#       Need a PR to vllm to fix the issue.
 #    Future Plan:
 #       Revert it when the related pr is merged in vllm.
 #
@@ -110,7 +110,7 @@
 #       However float32 is not supported in cann rope op, thus we keep this patch
 #    How：
 #       Removed the dtype convert operations in forward
-#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#    Related PR (if no, explain why):
 #       NO, only for npu due to rope op.
 #    Future Plan:
 #       Keep this patch in vllm-ascend.
@@ -126,7 +126,7 @@
 #       - support attention metadata register to the set supported spec decode
 #       - offer a api in platform to determine whether spec decode is supported,
 #         and deprecate is_cuda_alike in it.
-#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#    Related PR (if no, explain why):
 #       - https://github.com/vllm-project/vllm/pull/15195
 #       - https://github.com/vllm-project/vllm-ascend/pull/395
 #    Future Plan:
@@ -138,7 +138,7 @@
 #       vLLM `Remove Sampler from Model Code` so vllm-ascend needs adapt to this change.
 #    How：
 #       Use vLLM 0.8.4 method to patch it.
-#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#    Related PR (if no, explain why):
 #       - https://github.com/vllm-project/vllm/pull/15195
 #       - https://github.com/vllm-project/vllm-ascend/pull/395
 #    Future Plan:
@@ -153,7 +153,7 @@
 #           `FlashAttentionMetadata`
 #    How：
 #       ditto
-#    Related PR (if no, explain why): 1. refused by vllm. 2. vllm doesn't support 3. prepare to submit....
+#    Related PR (if no, explain why):
 #       - https://github.com/vllm-project/vllm/pull/15195
 #       - https://github.com/vllm-project/vllm-ascend/pull/395
 #    Future Plan:
diff --git a/vllm_ascend/patch/worker/patch_common/__init__.py b/vllm_ascend/patch/worker/patch_common/__init__.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 
+import vllm_ascend.patch.worker.patch_common.patch_cache_engine  # noqa
 import vllm_ascend.patch.worker.patch_common.patch_metrics  # noqa
 import vllm_ascend.patch.worker.patch_common.patch_minicpm  # noqa
 import vllm_ascend.patch.worker.patch_common.patch_multi_step_worker  # noqa
diff --git a/vllm_ascend/patch/worker/patch_common/patch_cache_engine.py b/vllm_ascend/patch/worker/patch_common/patch_cache_engine.py
@@ -1,7 +1,5 @@
 #
 # Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
-# This file is a part of the vllm-ascend project.
-# Adapted from vllm-project/vllm/vllm/worker/model_runner.py
 # Copyright 2023 The vLLM team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
diff --git a/vllm_ascend/patch/worker/patch_common/patch_metrics.py b/vllm_ascend/patch/worker/patch_common/patch_metrics.py
@@ -15,13 +15,9 @@
 # limitations under the License.
 #
 
-from typing import Callable
-
 import torch
 from vllm.spec_decode.metrics import AsyncMetricsCollector
 
-Timer = Callable[[], float]
-
 
 def _copy_rejsample_metrics_async(self) -> torch.npu.Event:
     """
diff --git a/vllm_ascend/quantization/quant_config.py b/vllm_ascend/quantization/quant_config.py
@@ -34,8 +34,7 @@
 from vllm.model_executor.utils import set_weight_attrs
 
 from vllm_ascend.ops.fused_moe import AscendUnquantizedFusedMoEMethod
-
-from .quantizer import AscendQuantizer
+from vllm_ascend.quantization.quantizer import AscendQuantizer
 
 
 @register_quantization_config("ascend")
diff --git a/vllm_ascend/quantization/quantizer.py b/vllm_ascend/quantization/quantizer.py
@@ -22,11 +22,12 @@
 
 from vllm.logger import logger
 
-from .func_wrapper import (wrapper_load_model, wrapper_rmsnorm_forward_oot,
-                           wrapper_rmsnorm_init)
-from .w8a8 import AscendW8A8LinearMethod
-from .w8a8_dynamic import (AscendW8A8DynamicFusedMoEMethod,
-                           AscendW8A8DynamicLinearMethod)
+from vllm_ascend.quantization.func_wrapper import (wrapper_load_model,
+                                                   wrapper_rmsnorm_forward_oot,
+                                                   wrapper_rmsnorm_init)
+from vllm_ascend.quantization.w8a8 import AscendW8A8LinearMethod
+from vllm_ascend.quantization.w8a8_dynamic import (
+    AscendW8A8DynamicFusedMoEMethod, AscendW8A8DynamicLinearMethod)
 
 CUSTOMIZED_QUANTIZER_TYPE: List[str] = []
 
diff --git a/vllm_ascend/utils.py b/vllm_ascend/utils.py
@@ -16,11 +16,12 @@
 # This file is a part of the vllm-ascend project.
 # Adapted from vllm-project/vllm/vllm/worker/worker.py
 #
+
 import torch
 from packaging.version import InvalidVersion, Version
 from vllm.logger import logger
 
-import vllm_ascend.envs as envs
+from vllm_ascend import envs
 
 
 def try_register_lib(lib_name: str, lib_info: str = ""):
@@ -97,5 +98,5 @@ def vllm_version_is(target_vllm_version: str):
         raise ValueError(
             f"Invalid vllm version {vllm_version} found. A dev version of vllm "
             "is installed probably. Set the environment variable VLLM_VERSION "
-            "to control it by hand. And please make sure the vaule follows the "
+            "to control it by hand. And please make sure the value follows the "
             "format of x.y.z.")
diff --git a/vllm_ascend/worker/__init__.py b/vllm_ascend/worker/__init__.py
@@ -14,4 +14,3 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-import vllm_ascend.worker.cache_engine  # noqa

Original file line number	Diff line number	Diff line change
`@@ -15,6 +15,7 @@`
`15`	`15`	`# limitations under the License.`
`16`	`16`	`#`
`17`	`17`
	`18`	`+import vllm_ascend.patch.worker.patch_common.patch_cache_engine # noqa`
`18`	`19`	`import vllm_ascend.patch.worker.patch_common.patch_metrics # noqa`
`19`	`20`	`import vllm_ascend.patch.worker.patch_common.patch_minicpm # noqa`
`20`	`21`	`import vllm_ascend.patch.worker.patch_common.patch_multi_step_worker # noqa`