Skip to content

MptForCausalLM.from_pretrained gives error 'dict' object has no attribute 'softmax_scale' #25114

@abacaj

Description

@abacaj

System Info

Creating model MPT:

    model = MptForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        torch_dtype=torch.bfloat16,
        use_cache=False,
        init_device=f"cuda:{local_rank}",
        attn_config=dict(attn_impl="flash", softmax_scale=None),  # triton, flash
    )

Gives the following error:

  File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 258, in __init__
    self.attn = MptAttention(config)
  File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 137, in __init__
    self.softmax_scale = config.attn_config.softmax_scale
AttributeError: 'dict' object has no attribute 'softmax_scale'

Who can help?

Information

Tasks

Reproduction

    model = MptForCausalLM.from_pretrained(
        model_args.model_name_or_path,
        cache_dir=training_args.cache_dir,
        torch_dtype=torch.bfloat16,
        use_cache=False,
        init_device=f"cuda:{local_rank}",
        attn_config=dict(attn_impl="flash", softmax_scale=None),  # triton, flash
    )

Expected behavior

Should be using MptAttentionConfig instead is using a dict object

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions