Skip to content

Conversation

@wangxiyuan
Copy link
Collaborator

@wangxiyuan wangxiyuan commented Jun 5, 2025

Fix the ascend config check logic:

  1. refactor check_ascend_config to make it clear:
    1. torchair graph should not work with enforce_eager=True
    2. aclgraph should not work with torchair graph
  2. add refresh config for rlhf case
  3. fix a typo in model runner
  4. change expert_tensor_parallel_size default to 0 to keep the same as before

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from 372e466 to b9a4e95 Compare June 5, 2025 15:35
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jun 5, 2025
Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passed

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch 3 times, most recently from 324099c to d1439b7 Compare June 5, 2025 15:51

# for V1 Engine, aclgraph doesn't work with deepseek model and only qwen model is well tested.
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and not enforce_eager:
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and not enforce_eager and not ascend_config.torchair_graph_config.enabled:
Copy link
Contributor

@NeverRaR NeverRaR Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check should be moved to platform.py:check_and_update_config

model_type = vllm_config.model_config.hf_config.model_type
if "deepseek" not in model_type:
raise NotImplementedError(
"Torchair graph mode only works with deepseek model.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be:

if ascend_config.torchair_graph_config.enabled:
        if envs.VLLM_MLA_DISABLE:
            logger.warning(
                "Torchair graph mode is still experimental and not supported for V1 without mla currently, "
                "it has been disabled automatically.")
            ascend_config.torchair_graph_config.enabled = False
        elif vllm_config.model_config:
            model_type = vllm_config.model_config.hf_config.model_type
            if "deepseek" not in model_type:
                raise NotImplementedError(
                    "Torchair graph mode only works with deepseek model.")

ascend_config. ascend_scheduler_config.enabled = False -> ascend_config.torchair_graph_config.enabled = False
if vllm_config.model_config: -> elif vllm_config.model_config:

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from d1439b7 to 3eefdc9 Compare June 6, 2025 01:04
@wangxiyuan
Copy link
Collaborator Author

@NeverRaR Thanks for your review. I've updated the check logic to make it more clear. Please take a look at again.

| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
| `expert_tensor_parallel_size` | str | `1` | Expert tensor parallel size the model to use. |
| `refresh` | bool | `false` | Whether to refresh global ascend config content. This value is usually used by rlhf case. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: Does vLLM has similar config?

Copy link
Collaborator Author

@wangxiyuan wangxiyuan Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not, this value is only used for rlhf, for verl or someother framework, the case is:

  1. verl load and update vllm config
  2. verl start LLM with additional config in external_executor mode

in the fisrt step, the ascend config has been initialized, then in the second step, the additional config will be skipped.

To solve the problem, we should let verl pass refresh, the we can regenerate the config

@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch 5 times, most recently from 830e333 to 8cf5adf Compare June 6, 2025 08:47
@wangxiyuan wangxiyuan force-pushed the fix_ascend_config branch from 8cf5adf to 755edcf Compare June 6, 2025 09:08
@wangxiyuan wangxiyuan merged commit dab19d5 into vllm-project:main Jun 6, 2025
23 checks passed
@wangxiyuan wangxiyuan deleted the fix_ascend_config branch June 9, 2025 01:25
venus-taibai pushed a commit to venus-taibai/vllm-ascend that referenced this pull request Jun 18, 2025
…llm-project#1092)

Merge branch wengang/cherry-pick-1029-1092 of [email protected]:Theta/vllm-ascend.git into dev-v0.9.0604
https://code.alipay.com/Theta/vllm-ascend/pull_requests/108

Reviewed-by: 子宏 <[email protected]>


* [Misc] Refactor additional_config (vllm-project#1029)
* [BugFix] Fix ascend config check (vllm-project#1092)
* [Misc] Update benchmark scripts
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
Fix the ascend config check logic:
1. refactor check_ascend_config to make it clear:
    1. torchair graph should not work with enforce_eager=True
    2. aclgraph should not work with torchair graph
3. add refresh config for rlhf case
4. fix a typo in model runner
5. change expert_tensor_parallel_size default to 0 to keep the same as
before

Signed-off-by: wangxiyuan <[email protected]>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
Fix the ascend config check logic:
1. refactor check_ascend_config to make it clear:
    1. torchair graph should not work with enforce_eager=True
    2. aclgraph should not work with torchair graph
3. add refresh config for rlhf case
4. fix a typo in model runner
5. change expert_tensor_parallel_size default to 0 to keep the same as
before

Signed-off-by: wangxiyuan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants