Skip to content

Conversation

terrytangyuan
Copy link
Contributor

The original source of the example can be found here: https://kserve.github.io/website/latest/modelserving/v1beta1/llm/vllm/

@terrytangyuan
Copy link
Contributor Author

cc @WoosukKwon @zhuohan123 @Yard1 @simon-mo Could you review this when you get a chance? Thanks!

@terrytangyuan
Copy link
Contributor Author

Friendly ping!

@ywang96
Copy link
Member

ywang96 commented Mar 1, 2024

Thanks for adding this guide! @terrytangyuan

FYI - this tutorial is based on Kserve + vLLM /generate API server that has been locked only for demo purposes and the OpenAI API server will be supported going forward for production purposes.

AFAIK KServe doesn't support OpenAI Schema yet and it's WIP, so how about waiting for that to be done so we can have an official guide to deploy the vLLM OpenAPI API server with KServe?

@terrytangyuan
Copy link
Contributor Author

terrytangyuan commented Mar 1, 2024

Thanks for taking a look and your thoughtful response! vLLM is already supported via kserve/kserve#3415

While OpenAI schema compatibility is in-progress, I think it's useful to correct the misunderstanding that KServe does not work with vLLM with this documentation (many community users have asked about this). Perhaps I can remove the specific example YAML but keep the link in this doc so that it points to KServe docs that we will continuously update going forward. WDYT?

@ywang96
Copy link
Member

ywang96 commented Mar 1, 2024

Thanks for taking a look and your thoughtful response! vLLM is already supported via kserve/kserve#3415

While OpenAI schema compatibility is in-progress, I think it's useful to correct the misunderstanding that KServe does not work with vLLM with this documentation (many community users have asked about this). Perhaps I can remove the specific example YAML but keep the link in this doc so that it points to KServe docs that we will continuously update going forward. WDYT?

Good idea! I think this is something we should add to the doc since KServe is popular for serving orchestration with k8s. cc @simon-mo

@terrytangyuan
Copy link
Contributor Author

I just updated the PR. PTAL. Thank you!

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - will need an approval from vLLM folks but I will let them know.

@simon-mo simon-mo merged commit 49d849b into vllm-project:main Mar 1, 2024
@terrytangyuan terrytangyuan deleted the kserve branch March 1, 2024 19:07
xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants