MptForCausalLM.from_pretrained gives error 'dict' object has no attribute 'softmax_scale'

### System Info

Creating model MPT:
```
    model = MptForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        torch_dtype=torch.bfloat16,
        use_cache=False,
        init_device=f"cuda:{local_rank}",
        attn_config=dict(attn_impl="flash", softmax_scale=None),  # triton, flash
    )
```
Gives the following error:
```
  File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 258, in __init__
    self.attn = MptAttention(config)
  File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 137, in __init__
    self.softmax_scale = config.attn_config.softmax_scale
AttributeError: 'dict' object has no attribute 'softmax_scale'
```

### Who can help?
### Information
### Tasks
### Reproduction

```
    model = MptForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        torch_dtype=torch.bfloat16,
        use_cache=False,
        init_device=f"cuda:{local_rank}",
        attn_config=dict(attn_impl="flash", softmax_scale=None),  # triton, flash
    )
```

### Expected behavior

Should be using `MptAttentionConfig` instead is using a dict object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MptForCausalLM.from_pretrained gives error 'dict' object has no attribute 'softmax_scale' #25114

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MptForCausalLM.from_pretrained gives error 'dict' object has no attribute 'softmax_scale' #25114

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions