Skip to content

Conversation

@amacaskill
Copy link

@amacaskill amacaskill commented Oct 27, 2025

Description

Start with a short description of what the PR does and how this is a change from
the past.

Recently we made changes to support GCS for the Run AI model streamer in vllm. The last step of that was to install the RunAI model streamer within the vllm image. This was done for GPU in PR 26464, but we forgot to add the installation of the runai-model-streamer module in TPU Dockerfile. By not installing run ai model streamer in TPU image, it requires customers to make this change locally, and build custom TPU vllm image in order to use the RunAI model streamer.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
No bug / Issue has been created for this as RunAI model streamer support in GCS is still pending release for GPU.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Tested that building the image still succeeds, and that run AI model streamer can be used to load the model for a vllm inference server:

# Build docker file. You might not need DOCKER_BUILDKIT=1 based on your docker version. Tested with and without v7x: 
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile . -t vllm-tpu-runai --load
DOCKER_BUILDKIT=1 docker build --build-arg IS_FOR_V7X=true -f docker/Dockerfile . -t vllm-tpu-runai-2 --load
#Push image to artifact registry
export REGION_NAME=us-central1
export PROJECT_ID=my-project
gcloud artifacts repositories create vllm-tpu --repository-format=docker --location=$REGION_NAME && \
gcloud auth configure-docker $REGION_NAME-docker.pkg.dev && \
docker image tag vllm-tpu-runai $REGION_NAME-docker.pkg.dev/$PROJECT_ID/vllm-tpu/vllm-tpu-runai:latest && \
docker push $REGION_NAME-docker.pkg.dev/$PROJECT_ID/vllm-tpu/vllm-tpu-runai:latest

Checklist

Before submitting this PR, please make sure:
[x] I have performed a self-review of my code.
[x] I have necessary comments in my code, particularly in hard-to-understand areas.
[x] I have made or will make corresponding changes to any relevant documentation.

@amacaskill amacaskill force-pushed the install-runai-streamer branch 4 times, most recently from c04d3e2 to 9f75b5a Compare October 28, 2025 16:29
@bvrockwell
Copy link
Collaborator

Thanks for wanting to enable this feature!

We need tests (I assume across different TP values for different model sizes) that show this is both 1) correct/accurate 2) performant.

@py4 @vipannalla @manojkris @jcyang43 to comment if there's any guidance we can share.

@amacaskill amacaskill force-pushed the install-runai-streamer branch from 9f75b5a to 68557aa Compare November 7, 2025 23:50
@amacaskill amacaskill changed the title Install runai-model-streamer module in Dockerfile Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx Nov 7, 2025
@amacaskill amacaskill marked this pull request as draft November 7, 2025 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants