Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Start with a short description of what the PR does and how this is a change from
the past.
Recently we made changes to support GCS for the Run AI model streamer in vllm. The last step of that was to install the RunAI model streamer within the vllm image. This was done for GPU in PR 26464, but we forgot to add the installation of the runai-model-streamer module in TPU Dockerfile. By not installing run ai model streamer in TPU image, it requires customers to make this change locally, and build custom TPU vllm image in order to use the RunAI model streamer.
If the change fixes a bug or a Github issue, please include a link, e.g.,:
No bug / Issue has been created for this as RunAI model streamer support in GCS is still pending release for GPU.
Tests
Please describe how you tested this change, and include any instructions and/or
commands to reproduce.
Tested that building the image still succeeds, and that run AI model streamer can be used to load the model for a vllm inference server:
Checklist
Before submitting this PR, please make sure:
[x] I have performed a self-review of my code.
[x] I have necessary comments in my code, particularly in hard-to-understand areas.
[x] I have made or will make corresponding changes to any relevant documentation.