-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[cuda 13][aarch64][CI] Adding CI steps to build arm64 cuda13 nightly wheels and images #28983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,6 +15,21 @@ steps: | |
| env: | ||
| DOCKER_BUILDKIT: "1" | ||
|
|
||
| - label: "Build arm64 wheel - CUDA 13.0" | ||
| depends_on: ~ | ||
| id: build-wheel-arm64-cuda-13-0 | ||
| agents: | ||
| queue: arm64_cpu_queue_postmerge | ||
| commands: | ||
| # #NOTE: torch_cuda_arch_list is derived from upstream PyTorch build files here: | ||
| # https://github.com/pytorch/pytorch/blob/main/.ci/aarch64_linux/aarch64_ci_build.sh#L7 | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg VLLM_MAIN_CUDA_VERSION=13.0 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ." | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg VLLM_MAIN_CUDA_VERSION=13.0 --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --tag vllm-ci:build-image --target build --progress plain -f docker/Dockerfile ." |
||
| - "mkdir artifacts" | ||
| - "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'" | ||
| - "bash .buildkite/scripts/upload-wheels.sh" | ||
| env: | ||
| DOCKER_BUILDKIT: "1" | ||
|
|
||
| # aarch64 build | ||
| - label: "Build arm64 CPU wheel" | ||
| depends_on: ~ | ||
|
|
@@ -93,6 +108,16 @@ steps: | |
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.9.1 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m) --target vllm-openai --progress plain -f docker/Dockerfile ." | ||
| - "docker push public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m)" | ||
|
|
||
| - label: "Build release image (arm64) - CUDA 13.0" | ||
| depends_on: ~ | ||
| id: build-release-image-arm64-cuda-13-0 | ||
| agents: | ||
| queue: arm64_cpu_queue_postmerge | ||
| commands: | ||
| - "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7" | ||
| - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg BUILD_BASE_IMAGE=nvidia/cuda:13.0.1-devel-ubuntu22.04 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m)-cuda13.0 --target vllm-openai --progress plain -f docker/Dockerfile ." | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the wheel build step, setting
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The - "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=13.0.1 --build-arg FLASHINFER_AOT_COMPILE=true --build-arg torch_cuda_arch_list='8.7 8.9 9.0 10.0+PTX 12.0' --build-arg INSTALL_KV_CONNECTORS=true --tag public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m)-cuda13.0 --target vllm-openai --progress plain -f docker/Dockerfile ." |
||
| - "docker push public.ecr.aws/q9t5s3a7/vllm-release-repo:$BUILDKITE_COMMIT-$(uname -m)-cuda13.0" | ||
|
|
||
| # Add job to create multi-arch manifest | ||
| - label: "Create multi-arch manifest" | ||
| depends_on: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting
CUDA_VERSION=13.0.1for thisarm64build will likely cause it to fail. Thedocker/Dockerfilehas a hardcoded PyTorch version for CUDA 12.8 (torch==2.8.0.dev20250318+cu128) forarm64platforms (seedocker/Dockerfilelines 344-352). The build process will attempt to find thiscu128package in thecu130PyTorch index, which will not work. To fix this, the hardcoded PyTorch version indocker/Dockerfileneeds to be updated or made dynamic to support CUDA 13.0.