[Bug]: "500 Internal Server Error" after upgrade to v0.5.4

### Your current environment

```text
The output of `python collect_env.py`
```


### 🐛 Describe the bug

After I upgraded to v0.5.4, got "500 Internal Server Error".
My manifest snippet to start vllm:
```yaml
      containers:
      - name: 8x7b-open
        image: vllm/vllm-openai:v0.5.4
        command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]                                                                                                                        
        args: ["--model", "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4", "--host", "0.0.0.0", "--port", "8080", "--tensor-parallel-size", "2", "--seed", "42", "--trust-remote-code"]                      
        securityContext:                                                                                                                                                                        
          privileged: true                                                                                                                                                                      
        ports:                                                                                                                                                                                  
        - containerPort: 8080                                                                                                                                                                   
        env:                                                                                                                                                                                    
        - name: OMP_NUM_THREADS                                                                                                                                                               
          value: "2"                                                                                                                                                                          
        volumeMounts:                                                                                                                                                                           
          - mountPath: "/root/.cache"                                                                                                                                                           
            name: ceph-volume                                                                                                                                                                   
        resources:                                                                                                                                                                              
          limits:                                                                                                                                                                               
            cpu: '12'                                                                                                                                                                           
            memory: 200Gi                                                                                                                                                                       
            nvidia.com/gpu: '2'                                                                                                                                                                 
          requests:                                                                                                                                                                             
            cpu: '12'                                                                                                                                                                           
            memory: 200Gi                                                                                                                                                                       
            nvidia.com/gpu: '2'                                                                                                                                                                 
```

Backtrace log:
```
INFO:     10.254.17.246:59936 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 189, in create_chat_completion
    generator = await openai_serving_chat.create_chat_completion(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 185, in create_chat_completion
    return await self.chat_completion_full_generator(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 436, in chat_completion_full_generator
    async for res in result_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 196, in generate
    with self.socket() as socket:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 59, in socket
    socket = self.context.socket(zmq.constants.DEALER)
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/context.py", line 354, in socket
    socket_class(  # set PYTHONTRACEMALLOC=2 to get the calling frame
  File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 218, in __init__
    super().__init__(context, socket_type, **kwargs)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 156, in __init__
    super().__init__(
  File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
zmq.error.ZMQError: Too many open files
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: "500 Internal Server Error" after upgrade to v0.5.4 #7290

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: "500 Internal Server Error" after upgrade to v0.5.4 #7290

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions