Skip to content

Conversation

notkisk
Copy link

@notkisk notkisk commented Jul 29, 2025

#7453
Implements comprehensive support for EXAONE 4.0 models (32B and 1.2B variants) in DeepSpeed's inference v2 framework.

Key features:

  • Hybrid attention mechanism with 3:1 sliding window to full attention ratio
  • QK-Reorder-Norm support for custom normalization ordering
  • Conditional RoPE application (skipped for global attention layers)
  • Grouped Query Attention (40 query heads, 8 key-value heads)
  • Full compatibility with ZeRO optimization stages
  • Parameter mapping between HuggingFace and DeepSpeed formats

Implementation includes:

  • ExaoneTransformerContainer and ExaoneNonTransformerContainer for parameter management
  • ExaoneInferenceModel with layer type detection and hybrid attention logic
  • ExaonePolicy for model instantiation and container orchestration
  • Comprehensive unit test suite with 14 test cases
  • Integration with existing DeepSpeed inference v2 architecture

Validated with EXAONE-4.0-32B and EXAONE-4.0-1.2B models from HuggingFace.

@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from 0f7375e to 299d96a Compare July 29, 2025 01:54
@notkisk
Copy link
Author

notkisk commented Jul 29, 2025

@hwchen2017 @tohtana @tjruwase @loadams Please take a look!

@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from d6d4e0e to 11792c2 Compare July 29, 2025 15:55
@notkisk
Copy link
Author

notkisk commented Jul 29, 2025

@loadams

@notkisk notkisk requested a review from loadams July 30, 2025 11:44
@notkisk notkisk force-pushed the feature/exaone-4.0-support branch 2 times, most recently from 6663bf8 to 0b346ec Compare August 10, 2025 14:04
@hwchen2017
Copy link
Contributor

hwchen2017 commented Aug 12, 2025

Hi @notkisk , I tried to test your code, and get the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/deepspeed/hongwei/test.py", line 23, in <module>
[rank0]:     pipe = pipeline("LGAI-EXAONE/EXAONE-4.0-1.2B")
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/api.py", line 231, in pipeline
[rank0]:     inference_engine = load_model(model_config)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
[rank0]:     inference_engine = build_hf_engine(
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_factory.py", line 142, in build_hf_engine
[rank0]:     return InferenceEngineV2(policy, engine_config)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
[rank0]:     self._model = self._policy.build_model(self._config, self._base_mp_group)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 157, in build_model
[rank0]:     self.populate_model_parameters()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 199, in populate_model_parameters
[rank0]:     container_map.map_param(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 78, in map_param
[rank0]:     self._non_transformer_params.set_dependency(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 318, in set_dependency
[rank0]:     setattr(target_param, target_dependency_name, dep_value)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 39, in param_setter
[rank0]:     self.complete_component()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 164, in complete_component
[rank0]:     finalized_param = self.finalize()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/common_parameters/embedding_parameters.py", line 26, in finalize
[rank0]:     return self.inference_model.transform_embedding_param(self.params)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
[rank0]:     return super().__getattribute__(key)
[rank0]: AttributeError: 'Exaone4Config' object has no attribute 'transform_embedding_param'

Can you show me how you verified the code? Also your can contribute the test code to deepspeed example.

map.set_transformer_params(['model.layers'], transformer_containers)

# Create non-transformer container for embedding/output/norm parameters
map.set_non_transformer_params(ExaoneNonTransformerContainer(self._model_config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that the parameter is supposed to be self.model

- Added @pytest.mark.inference_v2 markers to all test methods in test_exaone.py
- This ensures the tests are included in CI workflow runs for inference v2
- Tests will now run automatically with the nv-a6000.yml workflow

Signed-off-by: notkisk <[email protected]>
@notkisk notkisk force-pushed the feature/exaone-4.0-support branch from 0b346ec to f0fcaf5 Compare August 12, 2025 14:00
@notkisk notkisk marked this pull request as draft August 12, 2025 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants