Skip to content

Conversation

@Lunderberg
Copy link
Contributor

@Lunderberg Lunderberg commented Dec 29, 2023

The updates to the cutlass kernels made in TVM PR #16244 require symbols provided in cuda 7.5+. While the cuda architecture is specified by setting NVCC_FLAGS in the CMakeLists.txt for each kernel, cmake 3.18+ also sets it based on the
CMAKE_CUDA_ARCHITECTURES value. If not set, cmake will explicitly pass the compute capability as nvidia's default of 5.2, EVEN IF it has already been specified in NVCC_FLAGS. Because the kernels cannot compile with compute capability of 5.2, this causes compilation errors.

By setting CMAKE_CUDA_ARCHITECTURES to OFF, cmake does not add 5.2 as a target architecture.

See https://cmake.org/cmake/help/latest/policy/CMP0104.html for details on CMake's policy for CUDA architecture flags.

See https://stackoverflow.com/a/28933055 for the default CUDA architecture for each version of CUDA.

The updates to the cutlass kernels made in TVM PR apache#16244 require
symbols provided in cuda 7.5+.  While the cuda architecture is
specified by setting `NVCC_FLAGS` in the `CMakeLists.txt` for each
kernel, cmake 3.18+ also sets it based on the
`CMAKE_CUDA_ARCHITECTURES` value.  If not set, cmake will explicitly
pass the compute capability as nvidia's default of 5.2, *EVEN IF* it
has already been specified in `NVCC_FLAGS`.  Because the kernels
cannot compile with compute capability of 5.2, this causes
compilation errors.

By setting `CMAKE_CUDA_ARCHITECTURES` to `OFF`, cmake does not add
5.2 as a target architecture.

See https://cmake.org/cmake/help/latest/policy/CMP0104.html for
details on CMake's policy for CUDA architecture flags.

See https://cmake.org/cmake/help/latest/policy/CMP0104.html for the
default CUDA architecture for each version of CUDA.
@junrushao
Copy link
Member

I encountered similar problems and thanks for the fix! Maybe @vinx13 could take a look? Thanks!

@vinx13
Copy link
Member

vinx13 commented Jan 2, 2024

This may also change behavior of compile other cuda sources other than cutlass_fpA_intB_gemm kernels. What about doing the following:

  • Remove the hardcoded architectures in NVCC flags in cutlass_fpA_intB_gemm so that it depends on the new cmake behavior of setting CMAKE_CUDA_ARCHITECTURES
  • Provide a default value native if CMAKE_CUDA_ARCHITECTURES is not defined (which is generally larger than 52 for not-too-old gpus)

@Lunderberg
Copy link
Contributor Author

@vinx13 I agree that is the better long-term option. However, to avoid it will require dropping support for cmake versions older than 3.18. For these versions, the NVCC_FLAGS options are the only way to change the target architecture. The CMAKE_CUDA_ARCHITECTURES option was only added in 3.18 (and the native option is only available in cmake 3.24+). It looks like we currently set cmake_minimum_required(VERSION 3.18) in the TVM CMakeLists.txt. The cutlass kernels have older version requirement, but those should be updatable.

That said, I think the long-term switch from NVCC_FLAGS to CMAKE_CUDA_ARCHITECTURES should be in a later PR. Updating the cutlass repos to use CMAKE_CUDA_ARCHITECTURES would be a multi-step process, first updating them and later updating TVM to point to the new versions. Because this currently prevents compiling on TVM main, I would like to add this PR first, before the breakage impacts too many others.

Should resolve CI failure by pulling in bugfix from
apache#16322
@Lunderberg Lunderberg merged commit 163c7ac into apache:unity Jan 2, 2024
@Lunderberg Lunderberg deleted the unity_cmake_compatibility_fix branch January 3, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants