-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
In #7111, @njhill did a great job to make sure requests are aborted if they are cancelled by the client. This works great and can really ease the load on the model engine in cases of timeouts / other client disconnects.
Unfortunately, this mechanism doesn't work if a BaseHTTPMiddleware is added to the server, which is the case when a use provides a function to add as middleware (via the --middleware cmd arg of vllm serve). In this case, some weird starlette behavior causes request.is_disconnected() to return False, even if the client disconnects. This is discussed in Kludex/starlette#2094.
One of the suggested solutions in that discussion (for example, in Kludex/starlette#2094 (comment) and Kludex/starlette#2094 (comment)) is to change the mechanism used to check if the request was actually disconnected. Copied verbatim from that discussion, use this helper instead of request.is_disconnected():
async def my_is_disconnected(request: Request) -> bool: assert hasattr(request, '_is_disconnected'), "private API in starlette changed" assert isinstance(request._is_disconnected, bool), "private API in starlette changed" if request._is_disconnected: return True message: Message = {} # this timeout may need to be tweaked. # Ideally request.receive() is non-blocking, in which case the timeout doesn't matter. # But the ASGI spec seems to imply it should be a blocking callable, as it doesn't specify # what should be returned in case of no new messages. In this situation you can return an # empty dict, but given that it's against the spec it seems inadvisable. with anyio.move_on_after(0.01): message = await request.receive() if message.get("type") == "http.disconnect": request._is_disconnected = True return request._is_disconnected
In vLLM, the is_cancelled callable is passed to iterate_with_cancellation or to merge_async_iterators. In all cases, request.is_disconnected is used as that callable. So it should be pretty easy to replace it with the helper function instead. The only issue I see is with the timeout in anyio.move_on_after.
So WDYT? Is it possible to tweak the is_cancelled callable so requests will be aborted upon client disconnect also when the user provides a middleware function?