[Usage]: Qwen3-Reranker with ragflow have error

### Your current environment

Qwen3-Reranker with ragflow have error


```
  vllm-openai-8002:
    runtime: nvidia
    # 只使用 gpu 1
    deploy:
      resources:
        reservations:
          devices:
            - device_ids: ["1"]
              capabilities: ["gpu"]
              driver: "nvidia"
    environment:
      - CUDA_VISIBLE_DEVICES=1
    # command: --model /models/safetensors/Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 --served-model-name Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8 --gpu-memory-utilization 0.75 --kv-cache-dtype fp8 --max_model_len 61440 --max-num-batched-tokens 61440
    command: >
      --model /models/safetensors/Qwen/Qwen3-Reranker-4B 
      --served-model-name Qwen/Qwen3-Reranker-4B  
      --gpu-memory-utilization 0.7
      --hf_overrides '{"architectures":["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
    volumes:
      - ./models/.cache/huggingface:/root/.cache/huggingface
      - ./models/safetensors:/models/safetensors
    dns:
      - 8.8.8.8
    ports:
      - 8002:8000
    ipc: host
    image: vllm/vllm-openai:v0.10.1.1
```

```
(APIServer pid=1) WARNING 09-25 01:41:57 [protocol.py:81] The following fields were present in the request but ignored: {'return_documents'}
(EngineCore_0 pid=268) ERROR 09-25 01:41:58 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.1.1) with config: model='/models/safetensors/Qwen/Qwen3-Reranker-4B', speculative_config=None, tokenizer='/models/safetensors/Qwen/Qwen3-Reranker-4B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-Reranker-4B, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=PoolerConfig(pooling_type='LAST', normalize=None, dimensions=None, activation=None, softmax=None, step_tag_id=None, returned_token_ids=None, enable_chunked_processing=None, max_embed_len=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"/root/.cache/vllm/torch_compile_cache/b51f96a49b","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":"/root/.cache/vllm/torch_compile_cache/b51f96a49b/rank_0_0/backbone"}, 

(EngineCore_0 pid=268) ERROR 09-25 01:41:58 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-9,prompt_token_ids_len=103,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 165, 166, 167, 168, 169, 170],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-10,prompt_token_ids_len=175,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-11,prompt_token_ids_len=323,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-12,prompt_token_ids_len=187,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-13,prompt_token_ids_len=192,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-14,prompt_token_ids_len=323,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-15,prompt_token_ids_len=216,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 116, 157, 158, 159, 160, 161, 243, 244, 245, 246, 247, 248, 249],),num_computed_tokens=112,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-16,prompt_token_ids_len=235,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-17,prompt_token_ids_len=476,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-18,prompt_token_ids_len=306,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 293, 294, 295],),num_computed_tokens=16,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=['rerank-cde819ff5e504189aa1a75c300dea704-8'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[[[164]]], num_computed_tokens=[130]), num_scheduled_tokens={rerank-cde819ff5e504189aa1a75c300dea704-12: 171, rerank-cde819ff5e504189aa1a75c300dea704-15: 104, rerank-cde819ff5e504189aa1a75c300dea704-14: 307, rerank-cde819ff5e504189aa1a75c300dea704-17: 460, rerank-cde819ff5e504189aa1a75c300dea704-11: 307, rerank-cde819ff5e504189aa1a75c300dea704-10: 159, rerank-cde819ff5e504189aa1a75c300dea704-8: 22, rerank-cde819ff5e504189aa1a75c300dea704-9: 87, rerank-cde819ff5e504189aa1a75c300dea704-13: 176, rerank-cde819ff5e504189aa1a75c300dea704-18: 36, rerank-cde819ff5e504189aa1a75c300dea704-16: 219}, total_num_scheduled_tokens=2048, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[1], finished_req_ids=['rerank-cde819ff5e504189aa1a75c300dea704-5', 'rerank-cde819ff5e504189aa1a75c300dea704-2', 'rerank-cde819ff5e504189aa1a75c300dea704-4', 'rerank-cde819ff5e504189aa1a75c300dea704-6', 'rerank-cde819ff5e504189aa1a75c300dea704-7', 'rerank-cde819ff5e504189aa1a75c300dea704-3'], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(APIServer pid=1) WARNING 09-25 01:41:57 [protocol.py:81] The following fields were present in the request but ignored: {'return_documents'}

(EngineCore_0 pid=268) ERROR 09-25 01:41:58 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.1.1) with config: model='/models/safetensors/Qwen/Qwen3-Reranker-4B', speculative_config=None, tokenizer='/models/safetensors/Qwen/Qwen3-Reranker-4B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=40960, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen/Qwen3-Reranker-4B, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=PoolerConfig(pooling_type='LAST', normalize=None, dimensions=None, activation=None, softmax=None, step_tag_id=None, returned_token_ids=None, enable_chunked_processing=None, max_embed_len=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"/root/.cache/vllm/torch_compile_cache/b51f96a49b","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":"/root/.cache/vllm/torch_compile_cache/b51f96a49b/rank_0_0/backbone"}, 

(EngineCore_0 pid=268) ERROR 09-25 01:41:58 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-9,prompt_token_ids_len=103,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 165, 166, 167, 168, 169, 170],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-10,prompt_token_ids_len=175,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-11,prompt_token_ids_len=323,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-12,prompt_token_ids_len=187,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-13,prompt_token_ids_len=192,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-14,prompt_token_ids_len=323,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-15,prompt_token_ids_len=216,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 116, 157, 158, 159, 160, 161, 243, 244, 245, 246, 247, 248, 249],),num_computed_tokens=112,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-16,prompt_token_ids_len=235,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-17,prompt_token_ids_len=476,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292],),num_computed_tokens=16,lora_request=None), NewRequestData(req_id=rerank-cde819ff5e504189aa1a75c300dea704-18,prompt_token_ids_len=306,mm_kwargs=[],mm_hashes=[],mm_positions=[],sampling_params=None,block_ids=([3, 293, 294, 295],),num_computed_tokens=16,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=['rerank-cde819ff5e504189aa1a75c300dea704-8'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[[[164]]], num_computed_tokens=[130]), num_scheduled_tokens={rerank-cde819ff5e504189aa1a75c300dea704-12: 171, rerank-cde819ff5e504189aa1a75c300dea704-15: 104, rerank-cde819ff5e504189aa1a75c300dea704-14: 307, rerank-cde819ff5e504189aa1a75c300dea704-17: 460, rerank-cde819ff5e504189aa1a75c300dea704-11: 307, rerank-cde819ff5e504189aa1a75c300dea704-10: 159, rerank-cde819ff5e504189aa1a75c300dea704-8: 22, rerank-cde819ff5e504189aa1a75c300dea704-9: 87, rerank-cde819ff5e504189aa1a75c300dea704-13: 176, rerank-cde819ff5e504189aa1a75c300dea704-18: 36, rerank-cde819ff5e504189aa1a75c300dea704-16: 219}, total_num_scheduled_tokens=2048, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[1], finished_req_ids=['rerank-cde819ff5e504189aa1a75c300dea704-5', 'rerank-cde819ff5e504189aa1a75c300dea704-2', 'rerank-cde819ff5e504189aa1a75c300dea704-4', 'rerank-cde819ff5e504189aa1a75c300dea704-6', 'rerank-cde819ff5e504189aa1a75c300dea704-7', 'rerank-cde819ff5e504189aa1a75c300dea704-3'], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=13, num_waiting_reqs=40, step_counter=0, current_wave=0, kv_cache_usage=0.014106050305914386, prefix_cache_stats=PrefixCacheStats(reset=False, requests=12, queries=2708, hits=320), spec_decoding_stats=None, num_corrupted_reqs=0)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702] EngineCore encountered a fatal error.

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702] Traceback (most recent call last):

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 693, in run_engine_core

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     engine_core.run_busy_loop()

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     self._process_engine_step()

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     outputs, model_executed = self.step_fn()

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]                               ^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 288, in step

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     model_output = self.execute_model_with_error_logging(

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     raise err

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     return model_fn(scheduler_output)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]            ^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     output = self.collective_rpc("execute_model",

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     answer = run_method(self.driver_worker, method, args, kwargs)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3007, in run_method

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     return func(*args, **kwargs)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]            ^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     return func(*args, **kwargs)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]            ^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     output = self.model_runner.execute_model(scheduler_output,

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     return func(*args, **kwargs)

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]            ^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1522, in execute_model

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     max_query_len) = (self._prepare_inputs(scheduler_output))

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 712, in _prepare_inputs

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]     tokens = [scheduler_output.num_scheduled_tokens[i] for i in req_ids]

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702]               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^

(EngineCore_0 pid=268) ERROR 09-25 01:51:45 [core.py:702] KeyError: None
(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430] AsyncLLM output_handler failed.

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430] Traceback (most recent call last):

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 389, in output_handler

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430]     outputs = await engine_core.get_output_async()

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 843, in get_output_async

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430]     raise self._format_exception(outputs) from None

(APIServer pid=1) ERROR 09-25 01:51:45 [async_llm.py:430] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

(EngineCore_0 pid=268) Process EngineCore_0:

(EngineCore_0 pid=268) Traceback (most recent call last):

(EngineCore_0 pid=268)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap

(EngineCore_0 pid=268)     self.run()

(EngineCore_0 pid=268)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run

(EngineCore_0 pid=268)     self._target(*self._args, **self._kwargs)

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 704, in run_engine_core

(EngineCore_0 pid=268)     raise e

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 693, in run_engine_core

(EngineCore_0 pid=268)     engine_core.run_busy_loop()

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 720, in run_busy_loop

(EngineCore_0 pid=268)     self._process_engine_step()

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 745, in _process_engine_step

(EngineCore_0 pid=268)     outputs, model_executed = self.step_fn()

(EngineCore_0 pid=268)                               ^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 288, in step

(EngineCore_0 pid=268)     model_output = self.execute_model_with_error_logging(

(EngineCore_0 pid=268)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 274, in execute_model_with_error_logging

(EngineCore_0 pid=268)     raise err

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 265, in execute_model_with_error_logging

(EngineCore_0 pid=268)     return model_fn(scheduler_output)

(EngineCore_0 pid=268)            ^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 87, in execute_model

(EngineCore_0 pid=268)     output = self.collective_rpc("execute_model",

(EngineCore_0 pid=268)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc

(EngineCore_0 pid=268)     answer = run_method(self.driver_worker, method, args, kwargs)

(EngineCore_0 pid=268)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3007, in run_method

(EngineCore_0 pid=268)     return func(*args, **kwargs)

(EngineCore_0 pid=268)            ^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

(EngineCore_0 pid=268)     return func(*args, **kwargs)

(EngineCore_0 pid=268)            ^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 362, in execute_model

(EngineCore_0 pid=268)     output = self.model_runner.execute_model(scheduler_output,

(EngineCore_0 pid=268)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context

(EngineCore_0 pid=268)     return func(*args, **kwargs)

(EngineCore_0 pid=268)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1522, in execute_model

(EngineCore_0 pid=268)     max_query_len) = (self._prepare_inputs(scheduler_output))

(EngineCore_0 pid=268)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=268)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 712, in _prepare_inputs

(EngineCore_0 pid=268)     tokens = [scheduler_output.num_scheduled_tokens[i] for i in req_ids]

(EngineCore_0 pid=268)               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^

(EngineCore_0 pid=268) KeyError: None

(APIServer pid=1) INFO:     172.24.0.1:51940 - "POST /v1/rerank HTTP/1.1" 500 Internal Server Error

(APIServer pid=1) INFO:     Shutting down

(APIServer pid=1) INFO:     Waiting for application shutdown.

(APIServer pid=1) INFO:     Application shutdown complete.

(APIServer pid=1) INFO:     Finished server process [1]

```

### How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Qwen3-Reranker with ragflow have error #25659

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Qwen3-Reranker with ragflow have error #25659

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions