- 
                Notifications
    You must be signed in to change notification settings 
- Fork 663
build: enable kvbm in vllm container #2763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
b6614ea    to
    92a0143      
    Compare
  
    Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
1bc68b7    to
    9c7fd55      
    Compare
  
    Signed-off-by: Anant Sharma <[email protected]>
| WalkthroughRefactors container/Dockerfile.vllm to parameterize Python via ARG, reuse a prebuilt virtual environment from the framework stage, copy uv from the framework, and install runtime wheels from a wheelhouse. Updates container/build.sh to force ENABLE_KVBM for VLLM builds. Re-enables pytest marks and adds a vllm mark to a determinism test. Changes
 Sequence Diagram(s)sequenceDiagram
    autonumber
    participant Builder as Build System
    participant Framework as Docker Stage: framework
    participant DynamoBase as Docker Stage: dynamo_base
    participant Runtime as Docker Stage: runtime
    Builder->>Framework: Build with ARG PYTHON_VERSION
    Framework->>Framework: apt-get install python${PYTHON_VERSION}-dev
    Framework->>Framework: Build venv (VIRTUAL_ENV) + install framework/vLLM deps
    Framework->>Framework: Provide uv/uvx binaries
    Builder->>DynamoBase: Build wheelhouse
    DynamoBase->>DynamoBase: Produce /opt/dynamo/wheelhouse
    Builder->>Runtime: Start runtime stage with ARG PYTHON_VERSION
    Runtime->>Runtime: apt-get install python${PYTHON_VERSION}-dev
    Framework-->>Runtime: COPY VIRTUAL_ENV (entire venv)
    Framework-->>Runtime: COPY /bin/uv, /bin/uvx
    DynamoBase-->>Runtime: COPY /opt/dynamo/wheelhouse
    Runtime->>Runtime: uv pip install wheelhouse (ai_dynamo_runtime, ai_dynamo, nixl, benchmarks)
    note over Runtime: Runtime reuses prebuilt venv and adds runtime wheels
sequenceDiagram
    autonumber
    participant Dev as Invoker
    participant BuildSh as container/build.sh
    participant Docker as docker build
    Dev->>BuildSh: FRAMEWORK=VLLM ./build.sh
    BuildSh->>BuildSh: if FRAMEWORK == "VLLM": ENABLE_KVBM=true
    note over BuildSh: Echo notice about forcing enable_kvbm
    BuildSh->>Docker: docker build ... --build-arg ENABLE_KVBM=true ...
    note over Docker: Potential duplicate arg entries if appended twice
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
 Poem
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type  Other keywords and placeholders
 Status, Documentation and Community
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
container/Dockerfile.vllm (1)
257-263: Hard-coded cp312 wheel tag breaks when PYTHON_VERSION ≠ 3.12This defeats your new PYTHON_VERSION arg. Resolve the Python tag dynamically at build-time.
Apply:
-RUN uv pip install \ - /opt/dynamo/wheelhouse/ai_dynamo_runtime*cp312*.whl \ - /opt/dynamo/wheelhouse/ai_dynamo*any.whl \ - /opt/dynamo/wheelhouse/nixl/nixl*.whl \ - /opt/dynamo/benchmarks && \ - rm -rf /opt/dynamo/benchmarks +RUN PY_TAG="$(python - <<'PY'\nimport sys;print(f\"cp{sys.version_info.major}{sys.version_info.minor}\")\nPY\n)" && \ + RUNTIME_WHL="$(ls /opt/dynamo/wheelhouse/ai_dynamo_runtime*${PY_TAG}*.whl | head -n1)" && \ + test -n "${RUNTIME_WHL}" && \ + uv pip install \ + "${RUNTIME_WHL}" \ + /opt/dynamo/wheelhouse/ai_dynamo*any.whl \ + /opt/dynamo/wheelhouse/nixl/nixl*.whl \ + /opt/dynamo/benchmarks && \ + rm -rf /opt/dynamo/benchmarks
🧹 Nitpick comments (4)
container/build.sh (1)
583-586: Force-enable KVBM for VLLM is fine; guard against duplicate build-argsIf this block ever moves above an earlier ENABLE_KVBM addition, BUILD_ARGS could include multiple ENABLE_KVBM entries. Safer to de-duplicate when appending.
Apply:
if [[ $FRAMEWORK == "VLLM" ]]; then echo "Forcing enable_kvbm to true in vLLM image build" ENABLE_KVBM=true fi - -if [ ! -z ${ENABLE_KVBM} ]; then +if [ ! -z ${ENABLE_KVBM} ]; then echo "Enabling the KVBM in the ai-dynamo-runtime" - BUILD_ARGS+=" --build-arg ENABLE_KVBM=${ENABLE_KVBM} " + case " $BUILD_ARGS " in + *" --build-arg ENABLE_KVBM="*) : ;; # already present + *) BUILD_ARGS+=" --build-arg ENABLE_KVBM=${ENABLE_KVBM} " ;; + esac ficontainer/Dockerfile.vllm (3)
93-101: Pin uv image for reproducible buildsCOPY-ing from ghcr.io/astral-sh/uv:latest harms reproducibility. Pin a version tag or digest.
Example:
-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ +COPY --from=ghcr.io/astral-sh/uv:0.4.20 /uv /uvx /bin/
183-199: Runtime should not need -dev headersIn the runtime image, python${PYTHON_VERSION}-dev pulls headers and extra deps. Since you copy a prebuilt venv and are not compiling wheels here, python${PYTHON_VERSION} (or python3-minimal) should suffice.
- python${PYTHON_VERSION}-dev \ + python${PYTHON_VERSION} \If any JIT components need headers at runtime, ignore this suggestion.
217-230: Optional: verify Prometheus download integrityFetching binaries via curl without checksum verification is risky. Consider validating the SHA256 for the selected version.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
- container/Dockerfile.vllm(4 hunks)
- container/build.sh(1 hunks)
- tests/kvbm/test_determinism.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.
📚 Learning: 2025-08-30T20:43:49.593Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.593Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
- container/Dockerfile.vllm
🔇 Additional comments (5)
tests/kvbm/test_determinism.py (2)
777-777: Add vLLM marker is correct; avoids TRT-LLM path on mixed environmentsGood scoping. This test spins up vLLM-specific serve args; marking it vllm prevents accidental TRT-LLM runs.
36-39: The marker list under[tool.pytest.ini_options]inpyproject.tomldoesn’t includee2e,slow,nightly, orgpu_1. You’ll need to add those to avoid PytestUnknownMarkWarning, and ensure your CI has a GPU test lane.container/Dockerfile.vllm (3)
72-79: Parameterizing Python: OK in framework stageInstalling python${PYTHON_VERSION}-dev here makes sense for native builds and venv creation.
250-253: Copying venv across stages: good; ensure interpreter ABI matchesCopying the entire venv is efficient. It relies on the same Python ABI being present in runtime. Your apt install of python${PYTHON_VERSION} above covers that; just ensure PYTHON_VERSION in build args matches the venv’s interpreter version.
You can enforce this at build-time:
ENV VIRTUAL_ENV=/opt/dynamo/venv RUN test -x "${VIRTUAL_ENV}/bin/python" && \ - "${VIRTUAL_ENV}/bin/python" -c "import sys; assert f'{sys.version_info.major}.{sys.version_info.minor}' == '${PYTHON_VERSION}', 'VENV interpreter mismatch'" + "${VIRTUAL_ENV}/bin/python" -c "import sys; assert f'{sys.version_info.major}.{sys.version_info.minor}' == '${PYTHON_VERSION}', 'VENV interpreter mismatch'"
354-359: Local-dev ownership fixes: nice touchCopying the venv with chown avoids the recursive chown cost; good optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Overview:
Changes:
closes: OPS-766
Summary by CodeRabbit
New Features
Refactor
Tests