Can vllm become faster?

I find an artice [Accelerating Generative AI with PyTorch II: GPT, Fast](https://pytorch.org/blog/accelerating-generative-ai-2/)
The optimization used in this article is as shown below
![image](https://github.com/vllm-project/vllm/assets/138603914/35813245-78c2-40bc-9947-3f7b123beea2)
I simply tried [gpt-fast](https://github.com/pytorch-labs/gpt-fast), the improvement is huge

**codellama-python-7b, 2xA10(24G)**
|   infer  | speed(token/s)  |
|  ----  | ----  |
| vllm fp16 | 45.2 |
| gpt-fast fp16  | 66.5 |
| gpt-fast int8 | 105.1 |
| gpt-fast int4 | 145.9 |

**ps**: results of int4 is terrible

I'm curious, can these optimizations be used on vllm?
I can see some discussion about these optimizations, but it doesn't look like they will be possible in the short term (because of some problems about vllm?)
#### torch.compile
[+34% higher throughput?](https://github.com/vllm-project/vllm/issues/421)
[Compiled model with torch.compile, unfortunately without performance improvements](https://github.com/vllm-project/vllm/pull/2131)
#### quantization
[Add GPTQ support](https://github.com/vllm-project/vllm/pull/916) (I tried a version before but it didn't work well
#### Speculative Decoding
[Speculative Decoding](https://github.com/vllm-project/vllm/pull/1797)


vllm is a great project!! I really hope to see these optimizations in vllm. I also want to know the difficulties that still exist :)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Can vllm become faster? #2327

torch.compile

quantization

Speculative Decoding

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

infer	speed(token/s)
vllm fp16	45.2
gpt-fast fp16	66.5
gpt-fast int8	105.1
gpt-fast int4	145.9

Uh oh!

Can vllm become faster? #2327

Description

torch.compile

quantization

Speculative Decoding

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions