Add EXAONE 4.0 model support for DeepSpeed inference v2 @ #7456

notkisk · 2025-07-29T01:48:18Z

#7453
Implements comprehensive support for EXAONE 4.0 models (32B and 1.2B variants) in DeepSpeed's inference v2 framework.

Key features:

Hybrid attention mechanism with 3:1 sliding window to full attention ratio
QK-Reorder-Norm support for custom normalization ordering
Conditional RoPE application (skipped for global attention layers)
Grouped Query Attention (40 query heads, 8 key-value heads)
Full compatibility with ZeRO optimization stages
Parameter mapping between HuggingFace and DeepSpeed formats

Implementation includes:

ExaoneTransformerContainer and ExaoneNonTransformerContainer for parameter management
ExaoneInferenceModel with layer type detection and hybrid attention logic
ExaonePolicy for model instantiation and container orchestration
Comprehensive unit test suite with 14 test cases
Integration with existing DeepSpeed inference v2 architecture

Validated with EXAONE-4.0-32B and EXAONE-4.0-1.2B models from HuggingFace.

notkisk · 2025-07-29T10:37:26Z

@hwchen2017 @tohtana @tjruwase @loadams Please take a look!

tests/unit/inference/v2/model_implementations/test_exaone.py

notkisk · 2025-07-29T15:58:05Z

@loadams

hwchen2017 · 2025-08-12T06:03:02Z

Hi @notkisk , I tried to test your code, and get the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/deepspeed/hongwei/test.py", line 23, in <module>
[rank0]:     pipe = pipeline("LGAI-EXAONE/EXAONE-4.0-1.2B")
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/api.py", line 231, in pipeline
[rank0]:     inference_engine = load_model(model_config)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
[rank0]:     inference_engine = build_hf_engine(
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_factory.py", line 142, in build_hf_engine
[rank0]:     return InferenceEngineV2(policy, engine_config)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
[rank0]:     self._model = self._policy.build_model(self._config, self._base_mp_group)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 157, in build_model
[rank0]:     self.populate_model_parameters()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 199, in populate_model_parameters
[rank0]:     container_map.map_param(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 78, in map_param
[rank0]:     self._non_transformer_params.set_dependency(name, parameter)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 318, in set_dependency
[rank0]:     setattr(target_param, target_dependency_name, dep_value)
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 39, in param_setter
[rank0]:     self.complete_component()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/parameter_base.py", line 164, in complete_component
[rank0]:     finalized_param = self.finalize()
[rank0]:   File "/home/deepspeed/hongwei/DeepSpeed/deepspeed/inference/v2/model_implementations/common_parameters/embedding_parameters.py", line 26, in finalize
[rank0]:     return self.inference_model.transform_embedding_param(self.params)
[rank0]:   File "/home/deepspeed/hongwei/hwenv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 211, in __getattribute__
[rank0]:     return super().__getattribute__(key)
[rank0]: AttributeError: 'Exaone4Config' object has no attribute 'transform_embedding_param'

Can you show me how you verified the code? Also your can contribute the test code to deepspeed example.

hwchen2017 · 2025-08-12T06:07:58Z

deepspeed/inference/v2/model_implementations/exaone/policy.py

+        map.set_transformer_params(['model.layers'], transformer_containers)
+
+        # Create non-transformer container for embedding/output/norm parameters
+        map.set_non_transformer_params(ExaoneNonTransformerContainer(self._model_config))


Looks like that the parameter is supposed to be self.model

- Added @pytest.mark.inference_v2 markers to all test methods in test_exaone.py - This ensures the tests are included in CI workflow runs for inference v2 - Tests will now run automatically with the nv-a6000.yml workflow Signed-off-by: notkisk <[email protected]>

notkisk requested review from hwchen2017, tohtana, tjruwase and loadams as code owners July 29, 2025 01:48

notkisk force-pushed the feature/exaone-4.0-support branch from 0f7375e to 299d96a Compare July 29, 2025 01:54

notkisk mentioned this pull request Jul 29, 2025

[REQUEST] Add support for EXAONE 4.0 models #7453

Open

loadams reviewed Jul 29, 2025

View reviewed changes

tests/unit/inference/v2/model_implementations/test_exaone.py Show resolved Hide resolved

notkisk force-pushed the feature/exaone-4.0-support branch from d6d4e0e to 11792c2 Compare July 29, 2025 15:55

notkisk requested a review from loadams July 30, 2025 11:44

notkisk force-pushed the feature/exaone-4.0-support branch 2 times, most recently from 6663bf8 to 0b346ec Compare August 10, 2025 14:04

hwchen2017 reviewed Aug 12, 2025

View reviewed changes

notkisk added 2 commits August 12, 2025 13:57

Fix EXAONE 4.0 policy container mapping issue

ef075c5

notkisk force-pushed the feature/exaone-4.0-support branch from 0b346ec to f0fcaf5 Compare August 12, 2025 14:00

notkisk marked this pull request as draft August 12, 2025 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add EXAONE 4.0 model support for DeepSpeed inference v2 @ #7456

Add EXAONE 4.0 model support for DeepSpeed inference v2 @ #7456

Uh oh!

notkisk commented Jul 29, 2025 •

edited

Loading

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

hwchen2017 commented Aug 12, 2025 •

edited

Loading

Uh oh!

hwchen2017 Aug 12, 2025

Uh oh!

Uh oh!

Add EXAONE 4.0 model support for DeepSpeed inference v2 @ #7456

Are you sure you want to change the base?

Add EXAONE 4.0 model support for DeepSpeed inference v2 @ #7456

Uh oh!

Conversation

notkisk commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025

Uh oh!

hwchen2017 commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hwchen2017 Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

notkisk commented Jul 29, 2025 •

edited

Loading

hwchen2017 commented Aug 12, 2025 •

edited

Loading