multi-part-simulator

Set up a simulator for the request flow for LLMD with envoy

Client -> Envoy: Client sends the simple JSON request.
Envoy -> EPP: The ext_proc filter (EPP) receives the request.
EPP -> Envoy: The EPP mock creates the first part of the multipart body (containing the request_payload) and sends it back to Envoy. Crucially, it tells Envoy to NOT close the request stream to the upstream (vLLM) yet.
Envoy -> vLLM (Prefill): Envoy forwards this first multipart chunk to the vLLM mock instance. The TCP connection and HTTP request stream remain open and waiting for more data.
vLLM Processing (The Long Part): The vLLM mock server receives and starts fake processing the request_payload. This is the long step (sleep for 100ms - 30s). The server's request-reading code is essentially paused, waiting for the rest of the request stream to arrive.
vLLM -> Envoy (Response Started): Once vLLM finishes fake prefill and generates the first token, it sends its response headers (e.g., HTTP/1.1 200 OK) and the first token (randomly generated) back towards Envoy.
Envoy -> EPP (Response Notification): The ext_proc filter intercepts this response from vLLM and forwards it to the EPP.
EPP Gets Metadata: Now, the EPP sees the first token, generates a random string as metadata .
EPP -> Envoy (Injects Part 2): The EPP uses its ext_proc control channel to send a new instruction to Envoy: "For that original request stream you're holding open, here is the second multipart part (metadata) and the final boundary. Send it now and close the stream."
Envoy -> vLLM (Request Finished): Envoy writes the second multipart part and the closing boundary to the still-open request stream to vLLM.
vLLM Receives Full Request: The vLLM server's code, which was waiting for the request stream to finish, now receives the metadata and the end-of-stream signal. It has everything it needs and proceeds with the decode phase. So it generates a response to mock that it has finished everything.
Response Flow: The rest of the response tokens flow from vLLM through Envoy to the client as normal.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
epp-mock-go		epp-mock-go
vllm-mock		vllm-mock
README.md		README.md
deploy.sh		deploy.sh
manifest.yaml		manifest.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

multi-part-simulator

About

Uh oh!

Releases

Packages

Languages

yangligt2/multi-part-simulator

Folders and files

Latest commit

History

Repository files navigation

multi-part-simulator

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages