Your current environment
vllm version: '0.5.0.post1'
🐛 Describe the bug
When I set tensor_parallel_size=1, it works well.
But, if I set tensor_parallel_size>1, below error occurs:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
After I add
import torch
import multiprocessing
torch.multiprocessing.set_start_method('spawn')
the same RuntimeError still occurs.