Skip to content

Conversation

@nv-anants
Copy link
Contributor

@nv-anants nv-anants commented Aug 28, 2025

Overview:

Changes:

  • Enable KVBM feature in dynamo vLLM runtime container
  • Enable test_determinism_with_cache_reset to run in 1GPU testing
  • Copy whole venv from framework image, it was missing vllm and other binaries installed in vevn

closes: OPS-766

Summary by CodeRabbit

  • New Features

    • Enable KV cache block management (KVBM) by default in vLLM images for improved out-of-the-box performance.
  • Refactor

    • Streamlined image build to reuse a prebuilt Python environment and wheel-based installs, reducing duplication and improving build consistency.
    • Standardized Python version handling across build stages for more reliable runtime behavior.
  • Tests

    • Re-enabled slow, nightly, and GPU tests for broader coverage.
    • Added a vLLM-specific marker to relevant determinism testing.

Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 2, 2025

Walkthrough

Refactors container/Dockerfile.vllm to parameterize Python via ARG, reuse a prebuilt virtual environment from the framework stage, copy uv from the framework, and install runtime wheels from a wheelhouse. Updates container/build.sh to force ENABLE_KVBM for VLLM builds. Re-enables pytest marks and adds a vllm mark to a determinism test.

Changes

Cohort / File(s) Summary
Dockerfile: Python/venv/uv refactor
container/Dockerfile.vllm
Adds ARG PYTHON_VERSION in framework and runtime stages; switches to python${PYTHON_VERSION}-dev; removes a duplicate ARG; copies uv from framework instead of external image; copies full VIRTUAL_ENV from framework into runtime; copies wheelhouse from dynamo_base and installs wheels with uv; removes creating a new venv/site-packages-only copy in runtime.
Build script: force KVBM for vLLM
container/build.sh
Adds condition(s) to set ENABLE_KVBM=true when FRAMEWORK==VLLM, logging the action; placement may lead to duplicated BUILD_ARGS entries.
Tests: re-enable and tag
tests/kvbm/test_determinism.py
Uncomments module-level marks: e2e, slow, nightly, gpu_1; adds @pytest.mark.vllm to test_determinism_with_cache_reset.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Builder as Build System
    participant Framework as Docker Stage: framework
    participant DynamoBase as Docker Stage: dynamo_base
    participant Runtime as Docker Stage: runtime

    Builder->>Framework: Build with ARG PYTHON_VERSION
    Framework->>Framework: apt-get install python${PYTHON_VERSION}-dev
    Framework->>Framework: Build venv (VIRTUAL_ENV) + install framework/vLLM deps
    Framework->>Framework: Provide uv/uvx binaries

    Builder->>DynamoBase: Build wheelhouse
    DynamoBase->>DynamoBase: Produce /opt/dynamo/wheelhouse

    Builder->>Runtime: Start runtime stage with ARG PYTHON_VERSION
    Runtime->>Runtime: apt-get install python${PYTHON_VERSION}-dev
    Framework-->>Runtime: COPY VIRTUAL_ENV (entire venv)
    Framework-->>Runtime: COPY /bin/uv, /bin/uvx
    DynamoBase-->>Runtime: COPY /opt/dynamo/wheelhouse
    Runtime->>Runtime: uv pip install wheelhouse (ai_dynamo_runtime, ai_dynamo, nixl, benchmarks)
    note over Runtime: Runtime reuses prebuilt venv and adds runtime wheels
Loading
sequenceDiagram
    autonumber
    participant Dev as Invoker
    participant BuildSh as container/build.sh
    participant Docker as docker build

    Dev->>BuildSh: FRAMEWORK=VLLM ./build.sh
    BuildSh->>BuildSh: if FRAMEWORK == "VLLM": ENABLE_KVBM=true
    note over BuildSh: Echo notice about forcing enable_kvbm
    BuildSh->>Docker: docker build ... --build-arg ENABLE_KVBM=true ...
    note over Docker: Potential duplicate arg entries if appended twice
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I thump my paw on fresh-built ground,
A venv hops stages, wheels abound.
UV shines bright, no extra chase,
KVBM’s flagged, we pick up pace.
Tests awake beneath moon’s brim—
Determinism? A tidy whim. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
container/Dockerfile.vllm (1)

257-263: Hard-coded cp312 wheel tag breaks when PYTHON_VERSION ≠ 3.12

This defeats your new PYTHON_VERSION arg. Resolve the Python tag dynamically at build-time.

Apply:

-RUN uv pip install \
-    /opt/dynamo/wheelhouse/ai_dynamo_runtime*cp312*.whl \
-    /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
-    /opt/dynamo/wheelhouse/nixl/nixl*.whl \
-    /opt/dynamo/benchmarks && \
-    rm -rf /opt/dynamo/benchmarks
+RUN PY_TAG="$(python - <<'PY'\nimport sys;print(f\"cp{sys.version_info.major}{sys.version_info.minor}\")\nPY\n)" && \
+    RUNTIME_WHL="$(ls /opt/dynamo/wheelhouse/ai_dynamo_runtime*${PY_TAG}*.whl | head -n1)" && \
+    test -n "${RUNTIME_WHL}" && \
+    uv pip install \
+      "${RUNTIME_WHL}" \
+      /opt/dynamo/wheelhouse/ai_dynamo*any.whl \
+      /opt/dynamo/wheelhouse/nixl/nixl*.whl \
+      /opt/dynamo/benchmarks && \
+    rm -rf /opt/dynamo/benchmarks
🧹 Nitpick comments (4)
container/build.sh (1)

583-586: Force-enable KVBM for VLLM is fine; guard against duplicate build-args

If this block ever moves above an earlier ENABLE_KVBM addition, BUILD_ARGS could include multiple ENABLE_KVBM entries. Safer to de-duplicate when appending.

Apply:

 if [[ $FRAMEWORK == "VLLM" ]]; then
     echo "Forcing enable_kvbm to true in vLLM image build"
     ENABLE_KVBM=true
 fi
-
-if [  ! -z ${ENABLE_KVBM} ]; then
+if [ ! -z ${ENABLE_KVBM} ]; then
     echo "Enabling the KVBM in the ai-dynamo-runtime"
-    BUILD_ARGS+=" --build-arg ENABLE_KVBM=${ENABLE_KVBM} "
+    case " $BUILD_ARGS " in
+      *" --build-arg ENABLE_KVBM="*) : ;; # already present
+      *) BUILD_ARGS+=" --build-arg ENABLE_KVBM=${ENABLE_KVBM} " ;;
+    esac
 fi
container/Dockerfile.vllm (3)

93-101: Pin uv image for reproducible builds

COPY-ing from ghcr.io/astral-sh/uv:latest harms reproducibility. Pin a version tag or digest.

Example:

-COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+COPY --from=ghcr.io/astral-sh/uv:0.4.20 /uv /uvx /bin/

183-199: Runtime should not need -dev headers

In the runtime image, python${PYTHON_VERSION}-dev pulls headers and extra deps. Since you copy a prebuilt venv and are not compiling wheels here, python${PYTHON_VERSION} (or python3-minimal) should suffice.

-        python${PYTHON_VERSION}-dev \
+        python${PYTHON_VERSION} \

If any JIT components need headers at runtime, ignore this suggestion.


217-230: Optional: verify Prometheus download integrity

Fetching binaries via curl without checksum verification is risky. Consider validating the SHA256 for the selected version.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 58d2699 and fbae747.

📒 Files selected for processing (3)
  • container/Dockerfile.vllm (4 hunks)
  • container/build.sh (1 hunks)
  • tests/kvbm/test_determinism.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.
📚 Learning: 2025-08-30T20:43:49.593Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.593Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • container/Dockerfile.vllm
🔇 Additional comments (5)
tests/kvbm/test_determinism.py (2)

777-777: Add vLLM marker is correct; avoids TRT-LLM path on mixed environments

Good scoping. This test spins up vLLM-specific serve args; marking it vllm prevents accidental TRT-LLM runs.


36-39: The marker list under [tool.pytest.ini_options] in pyproject.toml doesn’t include e2e, slow, nightly, or gpu_1. You’ll need to add those to avoid PytestUnknownMarkWarning, and ensure your CI has a GPU test lane.

container/Dockerfile.vllm (3)

72-79: Parameterizing Python: OK in framework stage

Installing python${PYTHON_VERSION}-dev here makes sense for native builds and venv creation.


250-253: Copying venv across stages: good; ensure interpreter ABI matches

Copying the entire venv is efficient. It relies on the same Python ABI being present in runtime. Your apt install of python${PYTHON_VERSION} above covers that; just ensure PYTHON_VERSION in build args matches the venv’s interpreter version.

You can enforce this at build-time:

 ENV VIRTUAL_ENV=/opt/dynamo/venv
 RUN test -x "${VIRTUAL_ENV}/bin/python" && \
-    "${VIRTUAL_ENV}/bin/python" -c "import sys; assert f'{sys.version_info.major}.{sys.version_info.minor}' == '${PYTHON_VERSION}', 'VENV interpreter mismatch'"
+    "${VIRTUAL_ENV}/bin/python" -c "import sys; assert f'{sys.version_info.major}.{sys.version_info.minor}' == '${PYTHON_VERSION}', 'VENV interpreter mismatch'"

354-359: Local-dev ownership fixes: nice touch

Copying the venv with chown avoids the recursive chown cost; good optimization.

Copy link
Contributor

@oandreeva-nv oandreeva-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@saturley-hall saturley-hall merged commit 316e884 into main Sep 2, 2025
15 of 16 checks passed
@saturley-hall saturley-hall deleted the anants/vllm-kvbm branch September 2, 2025 18:43
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants