[RFC]: Decoupling vLLM Configuration from Hugging Face

### Motivation.

Currently, vLLM assumes that all models follow the Hugging Face format. Configuration is parsed directly into a `transformers.PretrainedConfig` instance, which is then embedded into `ModelConfig` as `hf_config`. This tight coupling introduces several problems:

- **Poor extensibility**: non-HF models (e.g., Mistral-native) cannot be integrated cleanly. Their configuration must first be awkwardly adapted into a `PretrainedConfig`-like object.  
- **Maintenance overhead**: many fields in `PretrainedConfig` are irrelevant to inference, but vLLM does not clearly separate used vs. unused fields. Users who want to support their own model formats must carefully map into HF’s schema, and this kind of manual mapping is fragile and error-prone.

- **Under‑specified, inconsistently named critical fields**[added on Nov 1]:  Fields vLLM relies on at runtime—such as [num_kv_heads](https://github.com/vllm-project/vllm/blob/main/vllm/config/model.py#L1304), [num_experts](https://github.com/vllm-project/vllm/blob/main/vllm/config/model.py#L1382), [max_model_len](https://github.com/vllm-project/vllm/blob/b5d90f740048d43376390a61ca5b77c287505d0e/vllm/config/model.py?brid=J9xsXseEdXMBF-8gge5fvA#L2028-L2056)—are not emphasized or standardized in `PretrainedConfig`. Different models use different names, forcing `ModelConfig` to perform bespoke mappings.

- **Missing architecture hints force runtime introspection**[added on Nov 1]: Useful architecture details (e.g., attention type per layer for KV‑cache initialization) are not guaranteed(maybe it’s now? but what if there are other hints needed in the future?) available in `PretrainedConfig`. Today we infer them at runtime (e.g., via [forward_context](https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L997)), adding complexity and risk. Getting such fields into HF first and then into every model slows vLLM development.

This proposal aims to resolve these issues by introducing a **clean separation of concerns**:  
- Define a standardized, vLLM‑native configuration schema that clearly defines the fields the engine needs.
- Provide pluggable parsers that translate external configuration formats (HF, Mistral‑native, GGUF, etc.) into that schema.
- Simplify code paths: fewer runtime probes (e.g., no per‑layer attention discovery via `forward_context`) and earlier detection of missing information.





### Proposed Change.

### 1. Unified Configuration Schema

Introduce a new class, tentatively named `ModelArchitectureConfig `(or a better name lol) , that contains only the essential fields required by vLLM for inference:


```
class ModelArchitectureConfig:
    architectures: List[str]
    model_type:str
    hidden_size: int
    num_hidden_layers: int
    num_attention_heads: int
    head_dim: int  
    vocab_size: int

def __init__(self,    
   architectures,
   model_type,
   hidden_size,
   num_hidden_layers,
   num_attention_heads,
   head_dim,
   vocab_size,
   **kwargs,
):
   ...
      for key, value in kwargs.items():
           setattr(self, key, value)


def validate():
```

Current structure:
```
VLLMConfig
 └── ModelConfig
       └── hf_config: PretrainedConfig
```

Proposed structure:
```
VLLMConfig
 └── ModelConfig
       ├── unified_config: UnifiedPretrainedConfig   # always present
       └── hf_config: PretrainedConfig    # optional

```
- Engine logic consumes only unified_config.
- hf_config is retained when loading from Hugging Face, but becomes optional.

### 2. Parsers
With `ModelArchitectureConfig `, each model format can be supported through a dedicated parser.  Parsers always ensure the vllm-runtime required fields will be correctly extracted, while leaving the rest model specific fields as it is.

Similar to https://github.com/vllm-project/vllm/pull/24277, we can do: 
```
class HFConfigParser(ConfigParserBase):

    def parse(self,
              model: Union[str, Path],
              trust_remote_code: bool,
              revision: Optional[str] = None,
              code_revision: Optional[str] = None,
              **kwargs) -> UnifiedPretrainedConfig:
```


```
@register_config_parser("custom_config_parser")
class CustomConfigParser(ConfigParserBase):

    def parse(self,
              model: Union[str, Path],
              trust_remote_code: bool,
              revision: Optional[str] = None,
              code_revision: Optional[str] = None,
              **kwargs) -> UnifiedPretrainedConfig:
        raise NotImplementedError
```

### 3. Engine surface change
Where the engine previously reached for `model_config.hf_config`, switch to the `model_config.unified_pretrained_config`:

### 4. Migration Plan (updated on Nov 1)
A. Add ModelArchitectureConfig interface, HFArchitectureConfigParser and MistralArchitectureConfigParser to parse their trained model params into ModelArchitectureConfig
B. Update ModelConfig to include model_architecture_config and make hf_config optional. 
C. During kv cache initialization, call  model_architecture_config.layer_attention_types to get each layer’s attention type. Also refactor get_kv_cache_spec a bit to make it able to get spec from a un-initialized module
D. Calls to hf_config during vllm-runtime are gradually replaced with model_architecture_config.
E. Calls to hf_config in model_executor/models/*.py are gradually replaced with model_architecture_config.
F. Keep hf_config but mark it as deprecated, populate it only if the source was a hugging face model
G. Completely remove hf_config

### Feedback Period.

1-2 weeks

### CC List.

@22quinn @zhuohan123 @yeqcharlotte @houseroad @simon-mo 

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Decoupling vLLM Configuration from Hugging Face #24384

Motivation.

Proposed Change.

1. Unified Configuration Schema

2. Parsers

3. Engine surface change

4. Migration Plan (updated on Nov 1)

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Decoupling vLLM Configuration from Hugging Face #24384

Description

Motivation.

Proposed Change.

1. Unified Configuration Schema

2. Parsers

3. Engine surface change

4. Migration Plan (updated on Nov 1)

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions