[Usage]: EngineCore v1 fails to initialize but Engine0 works perfectly

### Your current environment

OS: WSL 2 on Windows 11
Python: 3.10
CUDA: 12.8
GPU: NVIDIA RTX A500 Laptop GPU
Driver: 573.44
VLLM version: 0.10.2

### How would you like to use vllm

When trying to initialize a VLLM EngineCore v1 on WSL with the facebook/opt-125m model, the engine fails during startup. EngineCore v0 works fine on the same system.

I have already tried limiting parallelism (export MAX_JOBS=1) and other job limits, but the error persists: Unable to register cuDNN/cuFFT/cuBLAS factory

Steps to reproduce (minimal code snippet):

```
from vllm import LLM
llm = LLM(model="facebook/opt-125m")
```

Error log  
**OUTPUT USING ENGINE1**: 
2025-09-28 15:47:00.704737: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-28 15:47:00.704813: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-28 15:47:00.758139: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-28 15:47:00.863219: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-28 15:47:01.841256: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO 09-28 15:47:06 [__init__.py:216] Automatically detected platform cuda.
INFO 09-28 15:47:07 [utils.py:328] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 09-28 15:47:17 [__init__.py:742] Resolved architecture: OPTForCausalLM
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-28 15:47:17 [__init__.py:1815] Using max model len 2048
INFO 09-28 15:47:19 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=8192.
(EngineCore_DP0 pid=17374) INFO 09-28 15:47:20 [core.py:654] Waiting for init message from front-end.
(EngineCore_DP0 pid=17374) INFO 09-28 15:47:20 [core.py:76] Initializing a V1 LLM engine (v0.10.2) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=facebook/opt-125m, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=17374) WARNING 09-28 15:47:21 [interface.py:391] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     self._init_executor()
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     self.collective_rpc("init_device")
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     return func(*args, **kwargs)
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718]     raise ValueError(
(EngineCore_DP0 pid=17374) ERROR 09-28 15:47:24 [core.py:718] ValueError: Free memory on device (3.22/4.0 GiB) on startup is less than desired GPU memory utilization (0.9, 3.6 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(EngineCore_DP0 pid=17374) Process EngineCore_DP0:
(EngineCore_DP0 pid=17374) Traceback (most recent call last):
(EngineCore_DP0 pid=17374)   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=17374)     self.run()
(EngineCore_DP0 pid=17374)   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=17374)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
(EngineCore_DP0 pid=17374)     raise e
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=17374)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__
(EngineCore_DP0 pid=17374)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in __init__
(EngineCore_DP0 pid=17374)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=17374)     self._init_executor()
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
(EngineCore_DP0 pid=17374)     self.collective_rpc("init_device")
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
(EngineCore_DP0 pid=17374)     answer = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method
(EngineCore_DP0 pid=17374)     return func(*args, **kwargs)
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 611, in init_device
(EngineCore_DP0 pid=17374)     self.worker.init_device()  # type: ignore
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 180, in init_device
(EngineCore_DP0 pid=17374)     raise ValueError(
(EngineCore_DP0 pid=17374) ValueError: Free memory on device (3.22/4.0 GiB) on startup is less than desired GPU memory utilization (0.9, 3.6 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.
(EngineCore_DP0 pid=17374) Exception ignored in: <function ExecutorBase.__del__ at 0x7f26b5178160>
(EngineCore_DP0 pid=17374) Traceback (most recent call last):
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 237, in __del__
(EngineCore_DP0 pid=17374)     self.shutdown()
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 76, in shutdown
(EngineCore_DP0 pid=17374)     worker.shutdown()
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 528, in shutdown
(EngineCore_DP0 pid=17374)     self.worker.shutdown()
(EngineCore_DP0 pid=17374)   File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 675, in shutdown
(EngineCore_DP0 pid=17374)     self.model_runner.ensure_kv_transfer_shutdown()
(EngineCore_DP0 pid=17374) AttributeError: 'NoneType' object has no attribute 'ensure_kv_transfer_shutdown'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/vmejia/projects/loreal/llm/test.py", line 13, in <module>
    llm = LLM(model="facebook/opt-125m")
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 282, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 493, in from_engine_args
    return engine_cls.from_vllm_config(
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 134, in from_vllm_config
    return cls(vllm_config=vllm_config,
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/llm_engine.py", line 111, in __init__
    self.engine_core = EngineCoreClient.make_client(
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 602, in __init__
    super().__init__(
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
    with launch_core_engines(vllm_config, executor_class,
  File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines
    wait_for_engine_startup(
  File "/home/vmejia/projects/loreal/loreal-dipi/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}


**OUTPUT USING ENGINE 0** export VLLM_USE_V1=0

2025-09-28 15:56:48.273556: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-09-28 15:56:48.273620: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-09-28 15:56:48.338346: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-09-28 15:56:48.451647: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-09-28 15:56:49.531749: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO 09-28 15:56:54 [__init__.py:216] Automatically detected platform cuda.
INFO 09-28 15:56:55 [utils.py:328] non-default args: {'disable_log_stats': True, 'model': 'facebook/opt-125m'}
INFO 09-28 15:57:08 [__init__.py:742] Resolved architecture: OPTForCausalLM
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 09-28 15:57:08 [__init__.py:1815] Using max model len 2048
INFO 09-28 15:57:10 [llm_engine.py:221] Initializing a V0 LLM engine (v0.10.2) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=None, served_model_name=facebook/opt-125m, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":256,"local_cache_dir":null}, use_cached_outputs=False, 
WARNING 09-28 15:57:12 [interface.py:391] Using 'pin_memory=False' as WSL is detected. This may slow down the performance.
INFO 09-28 15:57:12 [cuda.py:456] Using Flash Attention backend.
[W928 15:57:14.436008376 ProcessGroupNCCL.cpp:981] Warning: TORCH_NCCL_AVOID_RECORD_STREAMS is the default now, this environment variable is thus deprecated. (function operator())
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
INFO 09-28 15:57:14 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 09-28 15:57:14 [model_runner.py:1051] Starting to load model facebook/opt-125m...
INFO 09-28 15:57:14 [weight_utils.py:348] Using model weights format ['*.safetensors', '*.bin', '*.pt']
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.56it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  2.56it/s]

INFO 09-28 15:57:15 [default_loader.py:268] Loading weights took 0.39 seconds
INFO 09-28 15:57:16 [model_runner.py:1083] Model loading took 0.2389 GiB and 0.978913 seconds
INFO 09-28 15:57:17 [worker.py:290] Memory profiling takes 0.89 seconds
INFO 09-28 15:57:17 [worker.py:290] the current vLLM instance can use total_gpu_memory (4.00GiB) x gpu_memory_utilization (0.90) = 3.60GiB
INFO 09-28 15:57:17 [worker.py:290] model weights take 0.24GiB; non_torch_memory takes 0.03GiB; PyTorch activation peak memory takes 0.47GiB; the rest of the memory reserved for KV Cache is 2.87GiB.
INFO 09-28 15:57:17 [executor_base.py:114] # cuda blocks: 5218, # CPU blocks: 7281
INFO 09-28 15:57:17 [executor_base.py:119] Maximum concurrency for 2048 tokens per request: 40.77x
INFO 09-28 15:57:18 [model_runner.py:1355] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:23<00:00,  1.46it/s]
INFO 09-28 15:57:42 [model_runner.py:1507] Graph capturing finished in 22 secs, took 0.05 GiB
INFO 09-28 15:57:42 [worker.py:467] Free memory on device (3.23/4.0 GiB) on startup. Desired GPU memory utilization is (0.9, 3.6 GiB). Actual usage is 0.24 GiB for weight, 0.47 GiB for peak activation, 0.03 GiB for non-torch memory, and 0.05 GiB for CUDAGraph memory. Replace gpu_memory_utilization config with `--kv-cache-memory=2867842457` to fit into requested memory, or `--kv-cache-memory=2467542016` to fully utilize gpu memory. Current kv cache memory in use is 3077819801 bytes.
INFO 09-28 15:57:42 [llm_engine.py:420] init engine (profile, create kv cache, warmup model) took 26.35 seconds
INFO 09-28 15:57:42 [llm.py:295] Supported_tasks: ['generate']
INFO 09-28 15:57:42 [__init__.py:36] No IOProcessor plugins requested by the model
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 899.68it/s]
Processed prompts: 100%|██████████████████████████████████████████████████| 4/4 [00:00<00:00, 15.66it/s, est. speed input: 101.82 toks/s, output: 250.63 toks/s]
Prompt: 'Hello, my name is', Generated text: " Stephan, I'm a 20 year old, I'm from the Netherlands and I"
Prompt: 'The president of the United States is', Generated text: ' at war with the Chinese government, and the military is too strong.\n\n'
Prompt: 'The capital of France is', Generated text: ' getting a train that will travel through the heart of Paris to a business meeting at'
Prompt: 'The future of AI is', Generated text: " secure: Facebook and Google\nThe future of AI is secure. That's the"



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: EngineCore v1 fails to initialize but Engine0 works perfectly #25837

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: EngineCore v1 fails to initialize but Engine0 works perfectly #25837

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions