[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

@robertgshaw2-neuralmagic, @njhill

I am running vllm @ 665304092de6d56aaccaadacfa497a7836d88e7b which includes #7394.

Reproducer:
```
# vllm serve meta-llama/Meta-Llama-3-8B-Instruct  --disable-log-requests

import openai
import asyncio

N = 800

client = openai.AsyncOpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

async def generate_streaming(prompt: str):
    async for req_output in await client.completions.create(
      model="meta-llama/Meta-Llama-3-8B-Instruct",
      prompt=prompt,
      stream=True,
    ):
        yield req_output.choices[0].text

async def generate_output(prompt: str):
    async for output in generate_streaming(prompt):
       final_output = output
    return final_output


async def main():
    prompts = [str(i) for i in range(N)]
    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(generate_output(prompt)) for prompt in prompts]

asyncio.run(main())
```

Error message:
```
    | Traceback (most recent call last):
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File ".venv/lib/python3.11/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "vllm/vllm/entrypoints/openai/serving_completion.py", line 231, in completion_stream_generator
    |     async for prompt_idx, res in result_generator:
    |   File "vllm/vllm/utils.py", line 468, in merge_async_iterators
    |     item = await d
    |            ^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 424, in generate
    |     await self.abort(request_id)
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 350, in abort
    |     await self._send_one_way_rpc_request(
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 256, in _send_one_way_rpc_request
    |     with self.to_proxy_socket() as socket:
    |   File "/usr/lib/python3.11/contextlib.py", line 137, in __enter__
    |     return next(self.gen)
    |            ^^^^^^^^^^^^^^
    |   File "vllm/vllm/entrypoints/openai/rpc/client.py", line 195, in to_proxy_socket
    |     socket = self.context.socket(zmq.constants.DEALER)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/context.py", line 354, in socket
    |     socket_class(  # set PYTHONTRACEMALLOC=2 to get the calling frame
    |   File ".venv/lib/python3.11/site-packages/zmq/_future.py", line 218, in __init__
    |     super().__init__(context, socket_type, **kwargs)  # type: ignore
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File ".venv/lib/python3.11/site-packages/zmq/sugar/socket.py", line 156, in __init__
    |     super().__init__(
    |   File "_zmq.py", line 690, in zmq.backend.cython._zmq.Socket.__init__
    | zmq.error.ZMQError: Too many open files

```

This arguably is not normal online serving traffic. With that said, if `--disable-frontend-multiprocessing` is on, the server can handle `N=8192` with no issue.

strace shows lots of eventfd, which might be related to https://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg31244.html
```
730059 eventfd2(0, EFD_CLOEXEC)         = 976
730059 eventfd2(0, EFD_CLOEXEC)         = 977
730059 eventfd2(0, EFD_CLOEXEC)         = 978
730059 eventfd2(0, EFD_CLOEXEC)         = 979
730059 eventfd2(0, EFD_CLOEXEC)         = 980
730059 eventfd2(0, EFD_CLOEXEC)         = 981
730059 eventfd2(0, EFD_CLOEXEC)         = 982
730059 eventfd2(0, EFD_CLOEXEC)         = 983
730059 eventfd2(0, EFD_CLOEXEC)         = 984
730059 eventfd2(0, EFD_CLOEXEC)         = 985
730059 eventfd2(0, EFD_CLOEXEC)         = 986
730059 eventfd2(0, EFD_CLOEXEC)         = 987
730059 eventfd2(0, EFD_CLOEXEC)         = 988
730059 eventfd2(0, EFD_CLOEXEC)         = 989
730059 eventfd2(0, EFD_CLOEXEC)         = 990
730059 eventfd2(0, EFD_CLOEXEC)         = 991
730059 eventfd2(0, EFD_CLOEXEC)         = 992
730059 eventfd2(0, EFD_CLOEXEC)         = 993
730059 eventfd2(0, EFD_CLOEXEC)         = 994
730059 eventfd2(0, EFD_CLOEXEC)         = 995
730059 eventfd2(0, EFD_CLOEXEC)         = 996
730059 eventfd2(0, EFD_CLOEXEC)         = 997
730059 eventfd2(0, EFD_CLOEXEC)         = 998
730059 eventfd2(0, EFD_CLOEXEC)         = 999
730059 eventfd2(0, EFD_CLOEXEC)         = 1000
730059 eventfd2(0, EFD_CLOEXEC)         = 1001
730059 eventfd2(0, EFD_CLOEXEC)         = 1002
730059 eventfd2(0, EFD_CLOEXEC)         = 1003
730059 eventfd2(0, EFD_CLOEXEC <unfinished ...>
730059 <... eventfd2 resumed>)          = 1004
730059 eventfd2(0, EFD_CLOEXEC)         = 1005
730059 eventfd2(0, EFD_CLOEXEC)         = 1006
730059 eventfd2(0, EFD_CLOEXEC)         = 1007
730059 eventfd2(0, EFD_CLOEXEC)         = 1008
730059 eventfd2(0, EFD_CLOEXEC)         = 1009
730059 eventfd2(0, EFD_CLOEXEC)         = 1010
730059 eventfd2(0, EFD_CLOEXEC)         = 1011
730059 eventfd2(0, EFD_CLOEXEC)         = 1012
730059 eventfd2(0, EFD_CLOEXEC)         = 1013
730059 eventfd2(0, EFD_CLOEXEC)         = 1014
730059 eventfd2(0, EFD_CLOEXEC)         = 1015
730059 eventfd2(0, EFD_CLOEXEC)         = 1016
730059 eventfd2(0, EFD_CLOEXEC)         = 1017
730059 eventfd2(0, EFD_CLOEXEC)         = 1018
730059 eventfd2(0, EFD_CLOEXEC)         = 1019
730059 eventfd2(0, EFD_CLOEXEC)         = 1020
730059 eventfd2(0, EFD_CLOEXEC)         = 1021
730059 eventfd2(0, EFD_CLOEXEC)         = 1022
730059 eventfd2(0, EFD_CLOEXEC)         = 1023
730059 eventfd2(0, EFD_CLOEXEC)         = -1 EMFILE (Too many open files)
```

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: OpenAI server errors out with "ZMQError Too many open files" under heavy load #7920

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions