[RFC]: OpenVINO vLLM backend

### Motivation.

OpenVINO is open source solution for inference deep learning models, including LLMs. OpenVINO supports both Intel and ARM CPUs, Intel integrated and discrete GPUs, NPU and has a good reputation as production ready solution for client and server scenarios. The idea is to create OpenVINO backend for vLLM which will initially support x86 CPU as primary device, later other devices can be enabled. 

Because of Intel Optimum HuggingFace extension https://github.com/huggingface/optimum-intel, OpenVINO vLLM backend can support wide range of models, including https://docs.vllm.ai/en/stable/models/supported_models.html

OpenVINO provides better performance compared to current vLLM CPU implementation, which will be shown in integration PR. Also, OpenVINO implementation of Paged Attention operation supports modern vLLM features like chunked prefill and prefix caching.

### Proposed Change.

Introduce OpenVINO vLLM backend, which:
- Loads model via optimum-intel extension for HuggingFace https://github.com/huggingface/optimum-intel 
- (Optional step) Compresses model weights to low-bit format
- Automatically converts PyTorch model to OpenVINO IR representation, which contains PagedAttention operation
- Custom implementation of OpenVINO model loader, model runner and cache manager to hide OpenVINO API details.

### Feedback Period.

_No response_

### CC List.

@WoosukKwon @zhuohan123 @Yard1

### Any Other Things.

OpenVINO has a wide list of customers awaiting OpenVINO vLLM backend integrated to upstream vLLM repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: OpenVINO vLLM backend #5377

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: OpenVINO vLLM backend #5377

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions