Skip to content

Conversation

csahithi
Copy link
Contributor

@csahithi csahithi commented Sep 26, 2025

Purpose

Currently only x86 images are being pushed as nightly builds made available through #24102.
This PR helps push multiarch manifests instead as nightly builds.

The nightly build cleanup script is also throwing access token related errors - https://buildkite.com/vllm/release/builds/8648#01997f3f-1fee-409b-87a2-b9cab746378e
This PR fixes these issues with the cleanup script.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Sahithi Chigurupati <[email protected]>
@mergify mergify bot added the ci/build label Sep 26, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the CI pipeline to push multi-arch Docker manifests for nightly builds and fixes an issue with the nightly build cleanup script. The changes to create and push multi-arch manifests in .buildkite/release-pipeline.yaml are correct. The fix in .buildkite/scripts/cleanup-nightly-builds.sh properly authenticates with the Docker Hub API. However, I've identified a critical security vulnerability in the cleanup script where a secret token is exposed in the build logs and have provided a suggestion to resolve it.

Comment on lines +24 to +27
BEARER_TOKEN=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "{\"username\": \"$DOCKERHUB_USERNAME\", \"password\": \"$DOCKERHUB_TOKEN\"}" \
"https://hub.docker.com/v2/users/login" | jq -r '.token')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Due to set -ex at the top of the script, the curl command will be printed to the build logs, exposing the DOCKERHUB_TOKEN. This is a critical security vulnerability. You should temporarily disable command tracing around this sensitive command to prevent the token from being logged.

Suggested change
BEARER_TOKEN=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "{\"username\": \"$DOCKERHUB_USERNAME\", \"password\": \"$DOCKERHUB_TOKEN\"}" \
"https://hub.docker.com/v2/users/login" | jq -r '.token')
set +x
BEARER_TOKEN=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "{\"username\": \"$DOCKERHUB_USERNAME\", \"password\": \"$DOCKERHUB_TOKEN\"}" \
"https://hub.docker.com/v2/users/login" | jq -r '.token')
set -x

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment

@simon-mo
Copy link
Collaborator

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines 152 to 156
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "docker pull public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT"
- "docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT vllm/vllm-openai:nightly"
- "docker tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT vllm/vllm-openai:nightly-$BUILDKITE_COMMIT"
- "docker push vllm/vllm-openai:nightly"
- "docker push vllm/vllm-openai:nightly-$BUILDKITE_COMMIT"
- "docker manifest create vllm/vllm-openai:nightly public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-x86_64 public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-aarch64 --amend"
- "docker manifest create vllm/vllm-openai:nightly-$BUILDKITE_COMMIT public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-x86_64 public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-aarch64 --amend"
- "docker manifest push vllm/vllm-openai:nightly"
- "docker manifest push vllm/vllm-openai:nightly-$BUILDKITE_COMMIT"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Manifest pushes lack per-arch images in DockerHub

The nightly step now creates and pushes a multi-arch manifest that references the images stored in ECR but no longer pulls and retags those per‑architecture images into the vllm/vllm-openai repository before pushing. docker manifest push cannot upload the underlying manifests/layers from another registry, so the command will either fail with manifest unknown or publish a manifest whose digests are missing in DockerHub. In either case, pulling vllm/vllm-openai:nightly will fail for all users. The pipeline still needs to copy each architecture image into DockerHub before creating the manifest.

Useful? React with 👍 / 👎.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 you are no longer pushing the image here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the issue by pushing the images first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants