-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Description
Motivation.
There is an increasing need to customize vLLM, including:
- out-of-tree model registration, where users want to register their model outside of vLLM repo. This is partially fulfilled by [Core] enable out-of-tree model register #3871 . But later users find that it does not work in distributed setting with
ray: [Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs #5657 - custom executor class, already added in [Core] Allow specifying custom Executor #6557
- custom scheduler, requested in [RFC]: Replaceable Scheduler #7123
- custom tensor parallel implementation, requested in [RFC]: Model architecture plugins #7124
Usually, the request is to swap out some functions / classes in vLLM, or call some functions before vLLM runs the model. While implementing them in vLLM is not difficult, the maintenaince burden grows.
In order to satisfy the growing need of customization, I propose to introduce vLLM plugin system.
It is inspired by the pytest community, where a plugin is a standalone pypi package, e.g. https://pypi.org/project/pytest-forked/ .
#7130 is a draft implementation, where I added a new env var VLLM_PLUGINS. The way it works, is similar to the operating system's LD_PRELOAD, with a colon-separated list of python modules to import.
One of the most important concern, is to fight against arbitrary code execution risk. When a user serves a model using vLLM, the endpoint user cannot activate the plugin, so this does not suffer from code injection risk. However, there is indeed a risk, if the user runs vLLM in an untrusted environment. In this case:
- we require the plugin package name starts with
vllm_, so that vLLM user does not accidentally add irrelevant modules to execute. - we explicitly log the plugin module vLLM is using, so that vLLM user can easily see if any unexpected code is executed.
With these efforts, the security level should be the same as LD_PRELOAD. And since LD_PRELOAD exists for so many years, I think VLLM_PLUGINS should be acceptable in terms of security risk.
Proposed Change.
see #7130 for the draft implementation
Feedback Period.
No response
CC List.
No response
Any Other Things.
No response