Skip to content

Conversation

avik-pal
Copy link
Contributor

@avik-pal avik-pal commented Sep 1, 2025

The current release fails with

WARNING: could not import APIUtils.@checked into LibNCCL
ERROR: LoadError: UndefVarError: `@checked` not defined in `NCCL.LibNCCL`
Stacktrace:
 [1] top-level scope
   @ :0
 [2] include(mod::Module, _path::String)
   @ Base ./Base.jl:562
 [3] include(x::String)
   @ NCCL /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:1
 [4] top-level scope
   @ /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:5
 [5] include
   @ ./Base.jl:562 [inlined]
 [6] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2881
 [7] top-level scope
   @ stdin:6
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/libnccl.jl:56
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/libnccl.jl:1
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:1
in expression starting at stdin:6
  ✗ NCCL
  0 dependencies successfully precompiled in 4 seconds. 103 already precompiled.

ERROR: The following 1 direct dependency failed to precompile:

NCCL 

Failed to precompile NCCL [3fe64909-d7a1-4096-9b7d-7a0f12cf0f6b] to "/mnt/.julia/compiled/v1.11/NCCL/jl_t0Ow3O".
WARNING: could not import APIUtils.@checked into LibNCCL
ERROR: LoadError: UndefVarError: `@checked` not defined in `NCCL.LibNCCL`
Stacktrace:
 [1] top-level scope
   @ :0
 [2] include(mod::Module, _path::String)
   @ Base ./Base.jl:562
 [3] include(x::String)
   @ NCCL /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:1
 [4] top-level scope
   @ /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:5
 [5] include
   @ ./Base.jl:562 [inlined]
 [6] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
   @ Base ./loading.jl:2881
 [7] top-level scope
   @ stdin:6
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/libnccl.jl:56
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/libnccl.jl:1
in expression starting at /mnt/.julia/packages/NCCL/CwvFb/src/NCCL.jl:1
in expression starting at stdin:

https://buildkite.com/julialang/lux-dot-jl/builds/6534/steps/canvas?jid=019906f7-2982-40b8-8ff9-97e4675f0e15#019906f7-2982-40b8-8ff9-97e4675f0e15/356-890

cc @simonbyrne @kshyatt

src/libnccl.jl Outdated

const NULL = C_NULL
const INT_MIN = typemin(Cint)

import CUDA: @checked, CuPtr, CUstream
import CUDA: CuPtr, CUstream, @checked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this depend on a specific version of CUDA.jl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I am not sure where this macro is exactly defined. @maleadt / @kshyatt might know the recommended way to import this

@simonbyrne
Copy link
Contributor

Wasn't this fixed by #60? Do we just need to make a new release?

@simonbyrne
Copy link
Contributor

@maleadt any chance you can make a new release? (I don't have write access to this repo)

@avik-pal
Copy link
Contributor Author

avik-pal commented Sep 2, 2025

Wasn't this fixed by #60? Do we just need to make a new release?

Ah yes, I missed that (though we still want the changes to prologue since the changes will be overwritten once we regenerate the bindings)

@simonbyrne
Copy link
Contributor

Good point. Can you rebase?

@maleadt
Copy link
Member

maleadt commented Sep 3, 2025

I don't have write access to this repo

I invited you to the JuliaGPU/CUDA team, which includes NCCL.jl.

@avik-pal avik-pal force-pushed the ap/fixes_for_new_versions branch from f1f62eb to 54db9be Compare September 11, 2025 22:42
@avik-pal
Copy link
Contributor Author

tests are failing because we are missing the cuda 13 builds for nccl. triggering one at JuliaPackaging/Yggdrasil#12061

@avik-pal avik-pal force-pushed the ap/fixes_for_new_versions branch from b1a4dbd to 3bdc737 Compare September 15, 2025 23:26
@avik-pal
Copy link
Contributor Author

Once JuliaRegistries/General#138616 lands we should have the proper JLLs

@avik-pal avik-pal changed the title fix: NCCL import of @checked fix: NCCL import of @checked + support for CUDA 13 Sep 16, 2025
@avik-pal avik-pal requested a review from simonbyrne September 16, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants