-
-
Couldn't load subscription status.
- Fork 10.8k
update to cuda 12.9 #24020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update to cuda 12.9 #24020
Conversation
Signed-off-by: youkaichao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the CUDA version from 12.8.1 to 12.9.1 across the build configurations. The changes in .buildkite/release-pipeline.yaml and the CUDA_VERSION argument in docker/Dockerfile are consistent with this goal. However, I've identified a critical issue in the docker/Dockerfile where a hardcoded PyTorch version for arm64 builds is still pointing to a CUDA 12.8-specific build, which will likely cause the arm64 build to fail with the updated CUDA version. This needs to be addressed.
| # docs/assets/contributing/dockerfile-stages-dependency.png | ||
|
|
||
| ARG CUDA_VERSION=12.8.1 | ||
| ARG CUDA_VERSION=12.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While updating CUDA_VERSION to 12.9.1 is the goal of this PR, this change will likely break the arm64 build due to a hardcoded PyTorch version string.
In this file at line 349, the PyTorch version for arm64 is hardcoded as torch==2.8.0.dev20250318+cu128. The +cu128 suffix indicates it's built for CUDA 12.8.
With CUDA_VERSION set to 12.9.1, the build script will correctly look for packages in the CUDA 12.9 index (.../cu129), but it will try to install a package for CUDA 12.8, which will likely fail.
To fix this, the hardcoded torch version string needs to be updated to a version compatible with CUDA 12.9 (e.g., ...dev...+cu129). The torchvision version on the same line might also need to be updated accordingly. This is a critical issue that needs to be addressed to ensure the arm64 build succeeds.
Since this part of the file is not in the diff, I cannot provide a direct code suggestion, but the line to change is:
349: "torch==2.8.0.dev20250318+cu128" "torchvision==0.22.0.dev20250319" ; \|
The arm build is broken https://buildkite.com/vllm/release/builds/7808/steps/canvas?sid=01990371-cf21-43a1-bc1f-7b51b716a488 by the #20358 pytorch 2.8 update, because it finds a cpu built pytorch. |
Signed-off-by: youkaichao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if release pipeline can turn green then.
Thank you! That explains why the build suddenly required numa. |
I think so. Switching to the cuda 12.9 build looks more reasonable. |
|
close as #23960 covers more aspects |
Purpose
PyTorch 2.8 is only available for cuda 12.9 on arm64 platforms
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.