Skip to content

[Bug]: "500 Internal Server Error" after upgrade to v0.5.4 #7290

@tonyaw

Description

@tonyaw

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

After I upgraded to v0.5.4, got "500 Internal Server Error".
My manifest snippet to start vllm:

      containers:
      - name: 8x7b-open
        image: vllm/vllm-openai:v0.5.4
        command: ["python3", "-m", "vllm.entrypoints.openai.api_server"]                                                                                                                        
        args: ["--model", "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4", "--host", "0.0.0.0", "--port", "8080", "--tensor-parallel-size", "2", "--seed", "42", "--trust-remote-code"]                      
        securityContext:                                                                                                                                                                        
          privileged: true                                                                                                                                                                      
        ports:                                                                                                                                                                                  
        - containerPort: 8080                                                                                                                                                                   
        env:                                                                                                                                                                                    
        - name: OMP_NUM_THREADS                                                                                                                                                               
          value: "2"                                                                                                                                                                          
        volumeMounts:                                                                                                                                                                           
          - mountPath: "/root/.cache"                                                                                                                                                           
            name: ceph-volume                                                                                                                                                                   
        resources:                                                                                                                                                                              
          limits:                                                                                                                                                                               
            cpu: '12'                                                                                                                                                                           
            memory: 200Gi                                                                                                                                                                       
            nvidia.com/gpu: '2'                                                                                                                                                                 
          requests:                                                                                                                                                                             
            cpu: '12'                                                                                                                                                                           
            memory: 200Gi                                                                                                                                                                       
            nvidia.com/gpu: '2'                                                                                                                                                                 

Backtrace log:

INFO:     10.254.17.246:59936 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 189, in create_chat_completion
    generator = await openai_serving_chat.create_chat_completion(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 185, in create_chat_completion
    return await self.chat_completion_full_generator(
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 436, in chat_completion_full_generator
    async for res in result_generator:
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 196, in generate
    with self.socket() as socket:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/rpc/client.py", line 59, in socket
    socket = self.context.socket(zmq.constants.DEALER)
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/context.py", line 354, in socket
    socket_class(  # set PYTHONTRACEMALLOC=2 to get the calling frame
  File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 218, in __init__
    super().__init__(context, socket_type, **kwargs)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/zmq/sugar/socket.py", line 156, in __init__
    super().__init__(
  File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
zmq.error.ZMQError: Too many open files

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions