Skip to content

[Bug]: Requests aren't aborted when client disconnects if user provided a middleware #10087

@tomeras91

Description

@tomeras91

In #7111, @njhill did a great job to make sure requests are aborted if they are cancelled by the client. This works great and can really ease the load on the model engine in cases of timeouts / other client disconnects.

Unfortunately, this mechanism doesn't work if a BaseHTTPMiddleware is added to the server, which is the case when a use provides a function to add as middleware (via the --middleware cmd arg of vllm serve). In this case, some weird starlette behavior causes request.is_disconnected() to return False, even if the client disconnects. This is discussed in Kludex/starlette#2094.
One of the suggested solutions in that discussion (for example, in Kludex/starlette#2094 (comment) and Kludex/starlette#2094 (comment)) is to change the mechanism used to check if the request was actually disconnected. Copied verbatim from that discussion, use this helper instead of request.is_disconnected():

async def my_is_disconnected(request: Request) -> bool:
    assert hasattr(request, '_is_disconnected'), "private API in starlette changed"
    assert isinstance(request._is_disconnected, bool), "private API in starlette changed"

    if request._is_disconnected:
        return True

    message: Message = {}

    # this timeout may need to be tweaked.
    # Ideally request.receive() is non-blocking, in which case the timeout doesn't matter.
    # But the ASGI spec seems to imply it should be a blocking callable, as it doesn't specify
    # what should be returned in case of no new messages. In this situation you can return an
    # empty dict, but given that it's against the spec it seems inadvisable.
    with anyio.move_on_after(0.01):
        message = await request.receive()

    if message.get("type") == "http.disconnect":
        request._is_disconnected = True

    return request._is_disconnected

In vLLM, the is_cancelled callable is passed to iterate_with_cancellation or to merge_async_iterators. In all cases, request.is_disconnected is used as that callable. So it should be pretty easy to replace it with the helper function instead. The only issue I see is with the timeout in anyio.move_on_after.

So WDYT? Is it possible to tweak the is_cancelled callable so requests will be aborted upon client disconnect also when the user provides a middleware function?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions