-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Open
Open
Copy link
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomerskeep-openPrevents stale label being appliedPrevents stale label being applied
Description
🚀 The feature, motivation and pitch
#9160 first introduced AutoWeightsLoader to recursively call load_weights on sub-modules. This lets composite models (most notably multi-modal models) use language backbones (*Model classes such as LlamaModel) without having to repeat their weight loading logic.
Currently, load_weights is only implemented in a few language backbones. It would be great to standardize this approach and apply it to all language backbones in vLLM. The steps to do this are pretty straightforward:
- Move the existing
load_weightsfunction from*ForCausalLMto*Model. - Create a new
load_weightsfunction in*ForCausalLMthat loads the weights usingAutoWeightsLoader. - Move any logic in
*Model.load_weightsthat only applies to*ForCausalLMback to*ForCausalLM.load_weights. Usually, this involveslm_head.
For reference, you can look at the implementation for models such as Llama, Gemma2/3, Qwen2 and ChatGLM.
To avoid scope creep, I suggest opening a PR for updating only a few models at a time
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomerskeep-openPrevents stale label being appliedPrevents stale label being applied
Type
Projects
Status
Done