| 
 | 1 | +(new-model-registration)=  | 
 | 2 | + | 
 | 3 | +# Model Registration  | 
 | 4 | + | 
 | 5 | +vLLM relies on a model registry to determine how to run each model.  | 
 | 6 | +A list of pre-registered architectures can be found on the [Supported Models](#supported-models) page.  | 
 | 7 | + | 
 | 8 | +If your model is not on this list, you must register it to vLLM.  | 
 | 9 | +This page provides detailed instructions on how to do so.  | 
 | 10 | + | 
 | 11 | +## Built-in models  | 
 | 12 | + | 
 | 13 | +To add a model directly to the vLLM library, start by forking our [GitHub repository](https://github.com/vllm-project/vllm) and then [build it from source](#build-from-source).  | 
 | 14 | +This gives you the ability to modify the codebase and test your model.  | 
 | 15 | + | 
 | 16 | +After you have implemented your model (see [tutorial](#new-model-basic)), put it into the <gh-dir:vllm/model_executor/models> directory.  | 
 | 17 | +Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.  | 
 | 18 | +You should also include an example HuggingFace repository for this model in <gh-file:tests/models/registry.py> to run the unit tests.  | 
 | 19 | +Finally, update the [Supported Models](#supported-models) documentation page to promote your model!  | 
 | 20 | + | 
 | 21 | +```{important}  | 
 | 22 | +The list of models in each section should be maintained in alphabetical order.  | 
 | 23 | +```  | 
 | 24 | + | 
 | 25 | +## Out-of-tree models  | 
 | 26 | + | 
 | 27 | +You can load an external model using a plugin without modifying the vLLM codebase.  | 
 | 28 | + | 
 | 29 | +```{seealso}  | 
 | 30 | +[vLLM's Plugin System](#plugin-system)  | 
 | 31 | +```  | 
 | 32 | + | 
 | 33 | +To register the model, use the following code:  | 
 | 34 | + | 
 | 35 | +```python  | 
 | 36 | +from vllm import ModelRegistry  | 
 | 37 | +from your_code import YourModelForCausalLM  | 
 | 38 | +ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)  | 
 | 39 | +```  | 
 | 40 | + | 
 | 41 | +If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:  | 
 | 42 | + | 
 | 43 | +```python  | 
 | 44 | +from vllm import ModelRegistry  | 
 | 45 | + | 
 | 46 | +ModelRegistry.register_model("YourModelForCausalLM", "your_code:YourModelForCausalLM")  | 
 | 47 | +```  | 
 | 48 | + | 
 | 49 | +```{important}  | 
 | 50 | +If your model is a multimodal model, ensure the model class implements the {class}`~vllm.model_executor.models.interfaces.SupportsMultiModal` interface.  | 
 | 51 | +Read more about that [here](#enabling-multimodal-inputs).  | 
 | 52 | +```  | 
 | 53 | + | 
 | 54 | +```{note}  | 
 | 55 | +Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.  | 
 | 56 | +```  | 
0 commit comments