-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
docs: Add tutorial on deploying vLLM model with KServe #2586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yuan Tang <[email protected]>
Signed-off-by: Yuan Tang <[email protected]>
cc @WoosukKwon @zhuohan123 @Yard1 @simon-mo Could you review this when you get a chance? Thanks! |
Friendly ping! |
Thanks for adding this guide! @terrytangyuan FYI - this tutorial is based on Kserve + vLLM AFAIK KServe doesn't support OpenAI Schema yet and it's WIP, so how about waiting for that to be done so we can have an official guide to deploy the vLLM OpenAPI API server with KServe? |
Thanks for taking a look and your thoughtful response! vLLM is already supported via kserve/kserve#3415 While OpenAI schema compatibility is in-progress, I think it's useful to correct the misunderstanding that KServe does not work with vLLM with this documentation (many community users have asked about this). Perhaps I can remove the specific example YAML but keep the link in this doc so that it points to KServe docs that we will continuously update going forward. WDYT? |
Good idea! I think this is something we should add to the doc since KServe is popular for serving orchestration with k8s. cc @simon-mo |
I just updated the PR. PTAL. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - will need an approval from vLLM folks but I will let them know.
…2586) Signed-off-by: Yuan Tang <[email protected]>
The original source of the example can be found here: https://kserve.github.io/website/latest/modelserving/v1beta1/llm/vllm/