-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Closed
Description
System Info
Creating model MPT:
model = MptForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
torch_dtype=torch.bfloat16,
use_cache=False,
init_device=f"cuda:{local_rank}",
attn_config=dict(attn_impl="flash", softmax_scale=None), # triton, flash
)
Gives the following error:
File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 258, in __init__
self.attn = MptAttention(config)
File "/home/anton/personal/stanford_alpaca-replit/env/lib/python3.10/site-packages/transformers/models/mpt/modeling_mpt.py", line 137, in __init__
self.softmax_scale = config.attn_config.softmax_scale
AttributeError: 'dict' object has no attribute 'softmax_scale'
Who can help?
Information
Tasks
Reproduction
model = MptForCausalLM.from_pretrained(
model_args.model_name_or_path,
cache_dir=training_args.cache_dir,
torch_dtype=torch.bfloat16,
use_cache=False,
init_device=f"cuda:{local_rank}",
attn_config=dict(attn_impl="flash", softmax_scale=None), # triton, flash
)
Expected behavior
Should be using MptAttentionConfig instead is using a dict object
Metadata
Metadata
Assignees
Labels
No labels