supoort 310p moe #1327

Angazenn · 2025-06-20T14:59:01Z

What this PR does / why we need it?

This PR is extension of #914 and #1204 , which support PanguProMoE on 310p platform. Currently fused_experts_310p is only implemented for PanguProMoE.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: angazenn <[email protected]>

…1333) ### What this PR does / why we need it? Add initial experimental support for Ascend 310P, this patch squash below PR into one to help validation: - #914 - #1318 - #1327 ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas 300I DUO series ### How was this patch tested? CI passed with: - E2E image build for 310P - CI test on A2 with e2e test and longterm test - Unit test missing because need a real 310P image to have the test, will add in a separate PR later. - Manually e2e test: - Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B: #914 (comment) - Pangu MGoE 72B The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended. #### ENV information CANN, NNAL version: 8.1.RC1 > [!IMPORTANT] > PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ format and calling NNAL operators on 310P #### Code example ##### Build vllm-ascend from source code ```shell # download source code as vllm-ascend cd vllm-ascend export SOC_VERSION=Ascend310P3 pip install -v -e . cd .. ``` ##### Run offline inference ```python from vllm import LLM, SamplingParams prompts = ["水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。", "水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。"] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10) # Create an LLM. llm = LLM( model="Qwen/Qwen2.5-7B-Instruct", max_model_len=4096, max_num_seqs=4, dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P disable_custom_all_reduce=True, trust_remote_code=True, tensor_parallel_size=2, compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]}, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` --------- Signed-off-by: Vincent Yuan <[email protected]> Signed-off-by: Yikun Jiang <[email protected]> Signed-off-by: angazenn <[email protected]> Co-authored-by: Vincent Yuan <[email protected]> Co-authored-by: angazenn <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: shen-shanshan <[email protected]>

Yikun · 2025-06-21T01:07:43Z

Many thanks for your contributions, I squashed all 310P related commits with your co-author: #1333

…llm-project#1333) ### What this PR does / why we need it? Add initial experimental support for Ascend 310P, this patch squash below PR into one to help validation: - vllm-project#914 - vllm-project#1318 - vllm-project#1327 ### Does this PR introduce _any_ user-facing change? User can run vLLM on Altlas 300I DUO series ### How was this patch tested? CI passed with: - E2E image build for 310P - CI test on A2 with e2e test and longterm test - Unit test missing because need a real 310P image to have the test, will add in a separate PR later. - Manually e2e test: - Qwen2.5-7b-instruct, Qwen2.5-0.5b, Qwen3-0.6B, Qwen3-4B, Qwen3-8B: vllm-project#914 (comment) - Pangu MGoE 72B The patch has been tested locally on Ascend 310P hardware to ensure that the changes do not break existing functionality and that the new features work as intended. #### ENV information CANN, NNAL version: 8.1.RC1 > [!IMPORTANT] > PTA 2.5.1 version >= torch_npu-2.5.1.post1.dev20250528 to support NZ format and calling NNAL operators on 310P #### Code example ##### Build vllm-ascend from source code ```shell # download source code as vllm-ascend cd vllm-ascend export SOC_VERSION=Ascend310P3 pip install -v -e . cd .. ``` ##### Run offline inference ```python from vllm import LLM, SamplingParams prompts = ["水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。", "水的沸点是100摄氏度吗？请回答是或者否。", "若腋下体温为38摄氏度，请问这人是否发烧？请回答是或者否。"] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.0, top_p=0.95, max_tokens=10) # Create an LLM. llm = LLM( model="Qwen/Qwen2.5-7B-Instruct", max_model_len=4096, max_num_seqs=4, dtype="float16", # IMPORTANT cause some ATB ops cannot support bf16 on 310P disable_custom_all_reduce=True, trust_remote_code=True, tensor_parallel_size=2, compilation_config={"custom_ops":['none', "+rms_norm", "+rotary_embedding"]}, ) # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` --------- Signed-off-by: Vincent Yuan <[email protected]> Signed-off-by: Yikun Jiang <[email protected]> Signed-off-by: angazenn <[email protected]> Co-authored-by: Vincent Yuan <[email protected]> Co-authored-by: angazenn <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: shen-shanshan <[email protected]>

github-actions bot added the module:ops label Jun 20, 2025

Angazenn force-pushed the 310p branch 2 times, most recently from 9c587ce to 0792d0d Compare June 20, 2025 15:28

supoort 310p moe

0228e41

Signed-off-by: angazenn <[email protected]>

Angazenn force-pushed the 310p branch from 0792d0d to 0228e41 Compare June 20, 2025 15:42

This was referenced Jun 20, 2025

[Platform] Add initial experimental support for Altlas 300I series #1333

Merged

[release] 0.9.1rc1 release checklist #1315

Closed

Yikun closed this Jun 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

supoort 310p moe #1327

supoort 310p moe #1327

Uh oh!

Angazenn commented Jun 20, 2025

Uh oh!

Yikun commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

supoort 310p moe #1327

supoort 310p moe #1327

Uh oh!

Conversation

Angazenn commented Jun 20, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Yikun commented Jun 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants