@@ -12,201 +12,249 @@ Alongside each architecture, we include some popular models that use it.
1212Decoder-only Language Models
1313^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1414.. list-table ::
15- :widths: 25 25 50 5
15+ :widths: 25 25 50 5 5
1616 :header-rows: 1
1717
1818 * - Architecture
1919 - Models
2020 - Example HuggingFace Models
2121 - :ref: `LoRA <lora >`
22+ - :ref: `PP <distributed_serving >`
2223 * - :code: `AquilaForCausalLM `
2324 - Aquila, Aquila2
2425 - :code: `BAAI/Aquila-7B `, :code: `BAAI/AquilaChat-7B `, etc.
2526 - ✅︎
27+ - ✅︎
2628 * - :code: `ArcticForCausalLM `
2729 - Arctic
2830 - :code: `Snowflake/snowflake-arctic-base `, :code: `Snowflake/snowflake-arctic-instruct `, etc.
2931 -
32+ - ✅︎
3033 * - :code: `BaiChuanForCausalLM `
3134 - Baichuan2, Baichuan
3235 - :code: `baichuan-inc/Baichuan2-13B-Chat `, :code: `baichuan-inc/Baichuan-7B `, etc.
3336 - ✅︎
37+ - ✅︎
3438 * - :code: `BloomForCausalLM `
3539 - BLOOM, BLOOMZ, BLOOMChat
3640 - :code: `bigscience/bloom `, :code: `bigscience/bloomz `, etc.
3741 -
42+ - ✅︎
3843 * - :code: `ChatGLMModel `
3944 - ChatGLM
4045 - :code: `THUDM/chatglm2-6b `, :code: `THUDM/chatglm3-6b `, etc.
4146 - ✅︎
47+ - ✅︎
4248 * - :code: `CohereForCausalLM `
4349 - Command-R
4450 - :code: `CohereForAI/c4ai-command-r-v01 `, etc.
45- -
51+ - ✅︎
52+ - ✅︎
4653 * - :code: `DbrxForCausalLM `
4754 - DBRX
4855 - :code: `databricks/dbrx-base `, :code: `databricks/dbrx-instruct `, etc.
4956 -
57+ - ✅︎
5058 * - :code: `DeciLMForCausalLM `
5159 - DeciLM
5260 - :code: `Deci/DeciLM-7B `, :code: `Deci/DeciLM-7B-instruct `, etc.
5361 -
62+ - ✅︎
5463 * - :code: `DeepseekForCausalLM `
5564 - DeepSeek
5665 - :code: `deepseek-ai/deepseek-llm-67b-base `, :code: `deepseek-ai/deepseek-llm-7b-chat ` etc.
5766 -
67+ - ✅︎
5868 * - :code: `DeepseekV2ForCausalLM `
5969 - DeepSeek-V2
6070 - :code: `deepseek-ai/DeepSeek-V2 `, :code: `deepseek-ai/DeepSeek-V2-Chat ` etc.
6171 -
72+ - ✅︎
6273 * - :code: `ExaoneForCausalLM `
6374 - EXAONE-3
6475 - :code: `LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct `, etc.
6576 - ✅︎
77+ - ✅︎
6678 * - :code: `FalconForCausalLM `
6779 - Falcon
6880 - :code: `tiiuae/falcon-7b `, :code: `tiiuae/falcon-40b `, :code: `tiiuae/falcon-rw-7b `, etc.
6981 -
82+ - ✅︎
7083 * - :code: `GemmaForCausalLM `
7184 - Gemma
7285 - :code: `google/gemma-2b `, :code: `google/gemma-7b `, etc.
7386 - ✅︎
87+ - ✅︎
7488 * - :code: `Gemma2ForCausalLM `
7589 - Gemma2
7690 - :code: `google/gemma-2-9b `, :code: `google/gemma-2-27b `, etc.
7791 - ✅︎
92+ - ✅︎
7893 * - :code: `GPT2LMHeadModel `
7994 - GPT-2
8095 - :code: `gpt2 `, :code: `gpt2-xl `, etc.
8196 -
97+ - ✅︎
8298 * - :code: `GPTBigCodeForCausalLM `
8399 - StarCoder, SantaCoder, WizardCoder
84100 - :code: `bigcode/starcoder `, :code: `bigcode/gpt_bigcode-santacoder `, :code: `WizardLM/WizardCoder-15B-V1.0 `, etc.
85101 - ✅︎
102+ - ✅︎
86103 * - :code: `GPTJForCausalLM `
87104 - GPT-J
88105 - :code: `EleutherAI/gpt-j-6b `, :code: `nomic-ai/gpt4all-j `, etc.
89106 -
107+ - ✅︎
90108 * - :code: `GPTNeoXForCausalLM `
91109 - GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM
92110 - :code: `EleutherAI/gpt-neox-20b `, :code: `EleutherAI/pythia-12b `, :code: `OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 `, :code: `databricks/dolly-v2-12b `, :code: `stabilityai/stablelm-tuned-alpha-7b `, etc.
93111 -
112+ - ✅︎
94113 * - :code: `GraniteForCausalLM `
95114 - PowerLM
96115 - :code: `ibm/PowerLM-3b ` etc.
97116 - ✅︎
117+ - ✅︎
98118 * - :code: `GraniteMoeForCausalLM `
99119 - PowerMoE
100120 - :code: `ibm/PowerMoE-3b ` etc.
101121 - ✅︎
122+ - ✅︎
102123 * - :code: `InternLMForCausalLM `
103124 - InternLM
104125 - :code: `internlm/internlm-7b `, :code: `internlm/internlm-chat-7b `, etc.
105126 - ✅︎
127+ - ✅︎
106128 * - :code: `InternLM2ForCausalLM `
107129 - InternLM2
108130 - :code: `internlm/internlm2-7b `, :code: `internlm/internlm2-chat-7b `, etc.
109131 -
132+ - ✅︎
110133 * - :code: `JAISLMHeadModel `
111134 - Jais
112135 - :code: `core42/jais-13b `, :code: `core42/jais-13b-chat `, :code: `core42/jais-30b-v3 `, :code: `core42/jais-30b-chat-v3 `, etc.
113136 -
137+ - ✅︎
114138 * - :code: `JambaForCausalLM `
115139 - Jamba
116140 - :code: `ai21labs/AI21-Jamba-1.5-Large `, :code: `ai21labs/AI21-Jamba-1.5-Mini `, :code: `ai21labs/Jamba-v0.1 `, etc.
117141 - ✅︎
142+ -
118143 * - :code: `LlamaForCausalLM `
119144 - Llama 3.1, Llama 3, Llama 2, LLaMA, Yi
120145 - :code: `meta-llama/Meta-Llama-3.1-405B-Instruct `, :code: `meta-llama/Meta-Llama-3.1-70B `, :code: `meta-llama/Meta-Llama-3-70B-Instruct `, :code: `meta-llama/Llama-2-70b-hf `, :code: `01-ai/Yi-34B `, etc.
121146 - ✅︎
147+ - ✅︎
122148 * - :code: `MiniCPMForCausalLM `
123149 - MiniCPM
124150 - :code: `openbmb/MiniCPM-2B-sft-bf16 `, :code: `openbmb/MiniCPM-2B-dpo-bf16 `, etc.
125- -
151+ - ✅︎
152+ - ✅︎
126153 * - :code: `MiniCPM3ForCausalLM `
127154 - MiniCPM3
128155 - :code: `openbmb/MiniCPM3-4B `, etc.
129- -
156+ - ✅︎
157+ - ✅︎
130158 * - :code: `MistralForCausalLM `
131159 - Mistral, Mistral-Instruct
132160 - :code: `mistralai/Mistral-7B-v0.1 `, :code: `mistralai/Mistral-7B-Instruct-v0.1 `, etc.
133161 - ✅︎
162+ - ✅︎
134163 * - :code: `MixtralForCausalLM `
135164 - Mixtral-8x7B, Mixtral-8x7B-Instruct
136165 - :code: `mistralai/Mixtral-8x7B-v0.1 `, :code: `mistralai/Mixtral-8x7B-Instruct-v0.1 `, :code: `mistral-community/Mixtral-8x22B-v0.1 `, etc.
137166 - ✅︎
167+ - ✅︎
138168 * - :code: `MPTForCausalLM `
139169 - MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter
140170 - :code: `mosaicml/mpt-7b `, :code: `mosaicml/mpt-7b-storywriter `, :code: `mosaicml/mpt-30b `, etc.
141171 -
172+ - ✅︎
142173 * - :code: `NemotronForCausalLM `
143174 - Nemotron-3, Nemotron-4, Minitron
144175 - :code: `nvidia/Minitron-8B-Base `, :code: `mgoin/Nemotron-4-340B-Base-hf-FP8 `, etc.
145176 - ✅︎
146- * - :code: `OLMoEForCausalLM `
147- - OLMoE
148- - :code: `allenai/OLMoE-1B-7B-0924 `, :code: `allenai/OLMoE-1B-7B-0924-Instruct `, etc.
149- -
177+ - ✅︎
150178 * - :code: `OLMoForCausalLM `
151179 - OLMo
152180 - :code: `allenai/OLMo-1B-hf `, :code: `allenai/OLMo-7B-hf `, etc.
153181 -
182+ - ✅︎
183+ * - :code: `OLMoEForCausalLM `
184+ - OLMoE
185+ - :code: `allenai/OLMoE-1B-7B-0924 `, :code: `allenai/OLMoE-1B-7B-0924-Instruct `, etc.
186+ - ✅︎
187+ - ✅︎
154188 * - :code: `OPTForCausalLM `
155189 - OPT, OPT-IML
156190 - :code: `facebook/opt-66b `, :code: `facebook/opt-iml-max-30b `, etc.
157191 -
192+ - ✅︎
158193 * - :code: `OrionForCausalLM `
159194 - Orion
160195 - :code: `OrionStarAI/Orion-14B-Base `, :code: `OrionStarAI/Orion-14B-Chat `, etc.
161196 -
197+ - ✅︎
162198 * - :code: `PhiForCausalLM `
163199 - Phi
164200 - :code: `microsoft/phi-1_5 `, :code: `microsoft/phi-2 `, etc.
165201 - ✅︎
202+ - ✅︎
166203 * - :code: `Phi3ForCausalLM `
167204 - Phi-3
168205 - :code: `microsoft/Phi-3-mini-4k-instruct `, :code: `microsoft/Phi-3-mini-128k-instruct `, :code: `microsoft/Phi-3-medium-128k-instruct `, etc.
169- -
206+ - ✅︎
207+ - ✅︎
170208 * - :code: `Phi3SmallForCausalLM `
171209 - Phi-3-Small
172210 - :code: `microsoft/Phi-3-small-8k-instruct `, :code: `microsoft/Phi-3-small-128k-instruct `, etc.
173211 -
212+ - ✅︎
174213 * - :code: `PhiMoEForCausalLM `
175214 - Phi-3.5-MoE
176215 - :code: `microsoft/Phi-3.5-MoE-instruct `, etc.
177- -
216+ - ✅︎
217+ - ✅︎
178218 * - :code: `PersimmonForCausalLM `
179219 - Persimmon
180220 - :code: `adept/persimmon-8b-base `, :code: `adept/persimmon-8b-chat `, etc.
181221 -
222+ - ✅︎
182223 * - :code: `QWenLMHeadModel `
183224 - Qwen
184225 - :code: `Qwen/Qwen-7B `, :code: `Qwen/Qwen-7B-Chat `, etc.
185226 -
227+ - ✅︎
186228 * - :code: `Qwen2ForCausalLM `
187229 - Qwen2
188230 - :code: `Qwen/Qwen2-beta-7B `, :code: `Qwen/Qwen2-beta-7B-Chat `, etc.
189231 - ✅︎
232+ - ✅︎
190233 * - :code: `Qwen2MoeForCausalLM `
191234 - Qwen2MoE
192235 - :code: `Qwen/Qwen1.5-MoE-A2.7B `, :code: `Qwen/Qwen1.5-MoE-A2.7B-Chat `, etc.
193236 -
237+ - ✅︎
194238 * - :code: `StableLmForCausalLM `
195239 - StableLM
196240 - :code: `stabilityai/stablelm-3b-4e1t `, :code: `stabilityai/stablelm-base-alpha-7b-v2 `, etc.
197241 -
242+ - ✅︎
198243 * - :code: `Starcoder2ForCausalLM `
199244 - Starcoder2
200245 - :code: `bigcode/starcoder2-3b `, :code: `bigcode/starcoder2-7b `, :code: `bigcode/starcoder2-15b `, etc.
201246 -
247+ - ✅︎
202248 * - :code: `SolarForCausalLM `
203- - EXAONE-3
249+ - Solar Pro
204250 - :code: `upstage/solar-pro-preview-instruct `, etc.
205- -
251+ - ✅︎
252+ - ✅︎
206253 * - :code: `XverseForCausalLM `
207- - Xverse
254+ - XVERSE
208255 - :code: `xverse/XVERSE-7B-Chat `, :code: `xverse/XVERSE-13B-Chat `, :code: `xverse/XVERSE-65B-Chat `, etc.
209- -
256+ - ✅︎
257+ - ✅︎
210258
211259.. note ::
212260 Currently, the ROCm version of vLLM supports Mistral and Mixtral only for context lengths up to 4096.
@@ -217,94 +265,111 @@ Multimodal Language Models
217265^^^^^^^^^^^^^^^^^^^^^^^^^^^^
218266
219267.. list-table ::
220- :widths: 25 25 25 25 5
268+ :widths: 25 25 25 25 5 5
221269 :header-rows: 1
222270
223271 * - Architecture
224272 - Models
225273 - Modalities
226274 - Example HuggingFace Models
227275 - :ref: `LoRA <lora >`
276+ - :ref: `PP <distributed_serving >`
228277 * - :code: `Blip2ForConditionalGeneration `
229278 - BLIP-2
230279 - Image\ :sup: `E`
231280 - :code: `Salesforce/blip2-opt-2.7b `, :code: `Salesforce/blip2-opt-6.7b `, etc.
232281 -
282+ - ✅︎
233283 * - :code: `ChameleonForConditionalGeneration `
234284 - Chameleon
235285 - Image
236286 - :code: `facebook/chameleon-7b ` etc.
237287 -
288+ - ✅︎
238289 * - :code: `FuyuForCausalLM `
239290 - Fuyu
240291 - Image
241292 - :code: `adept/fuyu-8b ` etc.
242293 -
294+ - ✅︎
243295 * - :code: `InternVLChatModel `
244296 - InternVL2
245297 - Image\ :sup: `E+`
246298 - :code: `OpenGVLab/InternVL2-4B `, :code: `OpenGVLab/InternVL2-8B `, etc.
247299 -
300+ - ✅︎
248301 * - :code: `LlavaForConditionalGeneration `
249302 - LLaVA-1.5
250303 - Image\ :sup: `E+`
251304 - :code: `llava-hf/llava-1.5-7b-hf `, :code: `llava-hf/llava-1.5-13b-hf `, etc.
252305 -
306+ - ✅︎
253307 * - :code: `LlavaNextForConditionalGeneration `
254308 - LLaVA-NeXT
255309 - Image\ :sup: `E+`
256310 - :code: `llava-hf/llava-v1.6-mistral-7b-hf `, :code: `llava-hf/llava-v1.6-vicuna-7b-hf `, etc.
257311 -
312+ - ✅︎
258313 * - :code: `LlavaNextVideoForConditionalGeneration `
259314 - LLaVA-NeXT-Video
260315 - Video
261316 - :code: `llava-hf/LLaVA-NeXT-Video-7B-hf `, etc.
262317 -
318+ - ✅︎
263319 * - :code: `LlavaOnevisionForConditionalGeneration `
264320 - LLaVA-Onevision
265321 - Image\ :sup: `+` / Video
266322 - :code: `llava-hf/llava-onevision-qwen2-7b-ov-hf `, :code: `llava-hf/llava-onevision-qwen2-0.5b-ov-hf `, etc.
267323 -
324+ - ✅︎
268325 * - :code: `MiniCPMV `
269326 - MiniCPM-V
270327 - Image\ :sup: `+`
271328 - :code: `openbmb/MiniCPM-V-2 ` (see note), :code: `openbmb/MiniCPM-Llama3-V-2_5 `, :code: `openbmb/MiniCPM-V-2_6 `, etc.
272- -
329+ - ✅︎
330+ - ✅︎
273331 * - :code: `MllamaForConditionalGeneration `
274332 - Llama 3.2
275333 - Image
276334 - :code: `meta-llama/Llama-3.2-90B-Vision-Instruct `, :code: `meta-llama/Llama-3.2-11B-Vision `, etc.
277335 -
336+ -
278337 * - :code: `PaliGemmaForConditionalGeneration `
279338 - PaliGemma
280339 - Image\ :sup: `E`
281340 - :code: `google/paligemma-3b-pt-224 `, :code: `google/paligemma-3b-mix-224 `, etc.
282341 -
342+ - ✅︎
283343 * - :code: `Phi3VForCausalLM `
284344 - Phi-3-Vision, Phi-3.5-Vision
285345 - Image\ :sup: `E+`
286346 - :code: `microsoft/Phi-3-vision-128k-instruct `, :code: `microsoft/Phi-3.5-vision-instruct ` etc.
287347 -
348+ - ✅︎
288349 * - :code: `PixtralForConditionalGeneration `
289350 - Pixtral
290351 - Image\ :sup: `+`
291352 - :code: `mistralai/Pixtral-12B-2409 `
292353 -
354+ - ✅︎
293355 * - :code: `QWenLMHeadModel `
294356 - Qwen-VL
295357 - Image\ :sup: `E+`
296358 - :code: `Qwen/Qwen-VL `, :code: `Qwen/Qwen-VL-Chat `, etc.
297359 -
360+ - ✅︎
298361 * - :code: `Qwen2VLForConditionalGeneration `
299362 - Qwen2-VL
300363 - Image\ :sup: `E+` / Video\ :sup: `+`
301364 - :code: `Qwen/Qwen2-VL-2B-Instruct `, :code: `Qwen/Qwen2-VL-7B-Instruct `, :code: `Qwen/Qwen2-VL-72B-Instruct `, etc.
302365 -
366+ - ✅︎
303367 * - :code: `UltravoxModel `
304368 - Ultravox
305369 - Audio\ :sup: `E+`
306370 - :code: `fixie-ai/ultravox-v0_3 `
307371 -
372+ - ✅︎
308373
309374| :sup:`E` Pre-computed embeddings can be inputted for this modality.
310375| :sup:`+` Multiple items can be inputted per text prompt for this modality.
0 commit comments