Skip to content

[Usage]: EngineCore v1 fails to initialize but Engine0 works perfectly #25837

@vamejiag

Description

@vamejiag

Your current environment

OS: WSL 2 on Windows 11
Python: 3.10
CUDA: 12.8
GPU: NVIDIA RTX A500 Laptop GPU
Driver: 573.44
VLLM version: 0.10.2

How would you like to use vllm

When trying to initialize a VLLM EngineCore v1 on WSL with the facebook/opt-125m model, the engine fails during startup. EngineCore v0 works fine on the same system.

I have already tried limiting parallelism (export MAX_JOBS=1) and other job limits, but the error persists: Unable to register cuDNN/cuFFT/cuBLAS factory

Steps to reproduce (minimal code snippet):

from vllm import LLM
llm = LLM(model="facebook/opt-125m")

Error log
OUTPUT USING ENGINE1:
2025-09-28 15:47:00.704737: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-28 15:47:00.704813: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-28 15:47:00.758139: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-28 15:47:00.863219: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-28 15:47:01.841256: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO 09-28 15:47:06 [init.py:216] Automatically detected platform cuda.
INFO 09-28 15:47:07 [utils.py:328] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 09-28 15:47:17 [init.py:742] Resolved architecture: OPTForCausalLM
torch_dtype is deprecated! Use dtype instead!
INFO 09-28 15:47:17 [init.py:1815] Using max model len 2048
INFO 09-28 15:47:19 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
(EngineCore_DP0 pid=17374) INFO 09-28 15:47:20 [core.py:654] Waiting for init message from front-end.
(EngineCore_DP0 pid=17374) INFO 09-28 15:47:20 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=facebook/opt-125m, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=17374) WARNING 09-28 15:47:21 [interface.py:391] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] self._init_executor()
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] self.collective_rpc("init_device")
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/utils/init.py", line 3060, in run_method
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] raise ValueError(
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] ValueError: Free memory on device (3.22/4.0 GiB) on startup is less than desired GPU memory utilization (0.9, 3.6 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(EngineCore_DP0 pid=17374) Process EngineCore_DP0:
(EngineCore_DP0 pid=17374) Traceback (most recent call last):
(EngineCore_DP0 pid=17374) File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=17374) self.run()
(EngineCore_DP0 pid=17374) File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=17374) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=17374) raise e
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=17374) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in init
(EngineCore_DP0 pid=17374) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in init
(EngineCore_DP0 pid=17374) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=17374) self._init_executor()
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=17374) self.collective_rpc("init_device")
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=17374) answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/utils/init.py", line 3060, in run_method
(EngineCore_DP0 pid=17374) return func(*args, **kwargs)
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device
(EngineCore_DP0 pid=17374) self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device
(EngineCore_DP0 pid=17374) raise ValueError(
(EngineCore_DP0 pid=17374) ValueError: Free memory on device (3.22/4.0 GiB) on startup is less than desired GPU memory utilization (0.9, 3.6 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(EngineCore_DP0 pid=17374) Exception ignored in: <function ExecutorBase.del at 0x7f26b5178160>
(EngineCore_DP0 pid=17374) Traceback (most recent call last):
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 237, in del
(EngineCore_DP0 pid=17374) self.shutdown()
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 76, in shutdown
(EngineCore_DP0 pid=17374) worker.shutdown()
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 528, in shutdown
(EngineCore_DP0 pid=17374) self.worker.shutdown()
(EngineCore_DP0 pid=17374) File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 675, in shutdown
(EngineCore_DP0 pid=17374) self.model_runner.ensure_kv_transfer_shutdown()
(EngineCore_DP0 pid=17374) AttributeError: 'NoneType' object has no attribute 'ensure_kv_transfer_shutdown'
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/vmejia/projects/loreal/llm/test.py", line 13, in
llm = LLM(model="facebook/opt-125m")
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 282, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 493, in from_engine_args
return engine_cls.from_vllm_config(
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 134, in from_vllm_config
return cls(vllm_config=vllm_config,
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 111, in init
self.engine_core = EngineCoreClient.make_client(
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 602, in init
super().init(
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in init
with launch_core_engines(vllm_config, executor_class,
File "/usr/lib/python3.10/contextlib.py", line 142, in exit
next(self.gen)
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines
wait_for_engine_startup(
File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

OUTPUT USING ENGINE 0 export VLLM_USE_V1=0

2025-09-28 15:56:48.273556: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-28 15:56:48.273620: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-28 15:56:48.338346: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-28 15:56:48.451647: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-28 15:56:49.531749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO 09-28 15:56:54 [init.py:216] Automatically detected platform cuda.
INFO 09-28 15:56:55 [utils.py:328] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 09-28 15:57:08 [init.py:742] Resolved architecture: OPTForCausalLM
torch_dtype is deprecated! Use dtype instead!
INFO 09-28 15:57:08 [init.py:1815] Using max model len 2048
INFO 09-28 15:57:10 [llm_engine.py:221] Initializing a V0 LLM engine (v0.10.2) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=None, served_model_name=facebook/opt-125m, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":256,"local_cache_dir":null}, use_cached_outputs=False,
WARNING 09-28 15:57:12 [interface.py:391] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 09-28 15:57:12 [cuda.py:456] Using Flash Attention backend.
[W928 15:57:14.436008376 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator())
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 09-28 15:57:14 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 09-28 15:57:14 [model_runner.py:1051] Starting to load model facebook/opt-125m...
INFO 09-28 15:57:14 [weight_utils.py:348] Using model weights format ['.safetensors', '.bin', '*.pt']
Loading pt checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.56it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 2.56it/s]

INFO 09-28 15:57:15 [default_loader.py:268] Loading weights took 0.39 seconds
INFO 09-28 15:57:16 [model_runner.py:1083] Model loading took 0.2389 GiB and 0.978913 seconds
INFO 09-28 15:57:17 [worker.py:290] Memory profiling takes 0.89 seconds
INFO 09-28 15:57:17 [worker.py:290] the current vLLM instance can use total_gpu_memory (4.00GiB) x gpu_memory_utilization (0.90) = 3.60GiB
INFO 09-28 15:57:17 [worker.py:290] model weights take 0.24GiB; non_torch_memory takes 0.03GiB; PyTorch activation peak memory takes 0.47GiB; the rest of the memory reserved for KV Cache is 2.87GiB.
INFO 09-28 15:57:17 [executor_base.py:114] # cuda blocks: 5218, # CPU blocks: 7281
INFO 09-28 15:57:17 [executor_base.py:119] Maximum concurrency for 2048 tokens per request: 40.77x
INFO 09-28 15:57:18 [model_runner.py:1355] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing gpu_memory_utilization or switching to eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:23<00:00, 1.46it/s]
INFO 09-28 15:57:42 [model_runner.py:1507] Graph capturing finished in 22 secs, took 0.05 GiB
INFO 09-28 15:57:42 [worker.py:467] Free memory on device (3.23/4.0 GiB) on startup. Desired GPU memory utilization is (0.9, 3.6 GiB). Actual usage is 0.24 GiB for weight, 0.47 GiB for peak activation, 0.03 GiB for non-torch memory, and 0.05 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with --kv-cache-memory=2867842457 to fit into requested memory, or --kv-cache-memory=2467542016 to fully utilize gpu memory. Current kv cache memory in use is 3077819801 bytes.
INFO 09-28 15:57:42 [llm_engine.py:420] init engine (profile, create kv cache, warmup model) took 26.35 seconds
INFO 09-28 15:57:42 [llm.py:295] Supported_tasks: ['generate']
INFO 09-28 15:57:42 [init.py:36] No IOProcessor plugins requested by the model
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 899.68it/s]
Processed prompts: 100%|██████████████████████████████████████████████████| 4/4 [00:00<00:00, 15.66it/s, est. speed input: 101.82 toks/s, output: 250.63 toks/s]
Prompt: 'Hello, my name is', Generated text: " Stephan, I'm a 20 year old, I'm from the Netherlands and I"
Prompt: 'The president of the United States is', Generated text: ' at war with the Chinese government, and the military is too strong.\n\n'
Prompt: 'The capital of France is', Generated text: ' getting a train that will travel through the heart of Paris to a business meeting at'
Prompt: 'The future of AI is', Generated text: " secure: Facebook and Google\nThe future of AI is secure. That's the"

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions