Your current environment
Due to project requirements, I'm currently still using vLLM version 0.6.3. I noticed that when using CUDA Graph, the output is correct with TP=1, but becomes garbled with TP=2. Has anyone encountered this issue? Are there any related issues that have addressed and resolved it? Thank you very much!
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Before submitting a new issue...