-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[torch.compile] initial integration #8949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
|
simple test on H100: throughput: $ # main branch
$ python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 28.99 requests/s, 14843.59 tokens/s
$ # this branch
$ python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 28.89 requests/s, 14792.03 tokens/s
$ # this branch
$ VLLM_TORCH_COMPILE_LEVEL=2 python benchmarks/benchmark_throughput.py --input-len 256 --output-len 256 --model meta-llama/Meta-Llama-3-8B
Throughput: 29.90 requests/s, 15309.14 tokens/sabout 3.5% throughput improvement single request serving (Output token throughput (tok/s)):
|
pipeline parallelwhen I enable pipeline parallel, there's a dynamo error: cc @anijain2305 it turns out to be caused by Line 1152 in f13a07b
when I change it to normal tensor parallelwhen I enable tensor parallel, it runs but the output is wrong. I'm still investigating. |
Seen in vllm-project/vllm#8949 [ghstack-poisoned]
Seen in vllm-project/vllm#8949 ghstack-source-id: 785c59a Pull Request resolved: #137044
…taclass has untouched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…uched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…taclass has untouched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
…uched __new__" Seen in vllm-project/vllm#8949 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
Seen in vllm-project/vllm#8949 ghstack-source-id: 70445f1 Pull Request resolved: #137044
…rch#137044) Seen in vllm-project/vllm#8949 Pull Request resolved: pytorch#137044 Approved by: https://github.com/jansel
|
close as it has been moved to #9058 |
TODOs (can be future PRs):