Skip to content

Releases: JuliaGPU/CUDA.jl

v5.8.2

28 May 06:52
8b6a2a0
Compare
Choose a tag to compare

CUDA v5.8.2

Diff since v5.8.1

Merged pull requests:

Closed issues:

  • Where to host extension(s) (#2735)
  • spdiagm doesn't support specified diagonal elements (#2783)
  • CUDA failed to create a diagonal matrix of CuArray(u) (#2785)

v5.8.1

14 May 14:38
a4a7af4
Compare
Choose a tag to compare

CUDA v5.8.1

Diff since v5.8.0

Merged pull requests:

  • CUSPARSE: Bugfixes for sparse vector broadcast. (#2780) (@maleadt)

v5.8.0

14 May 08:41
Compare
Choose a tag to compare

CUDA v5.8.0

Diff since v5.7.3

Merged pull requests:

Closed issues:

  • Type conversions in broadcast fails when compiling with always_inline=true (#2722)
  • cuDNN loses memory to log messages in Pluto.jl context (#2743)
  • Xgesvdp! failure when only requesting singular values (#2761)
  • CUDA 5.7.3 fails to precompile on Julia 1.12.0-beta2 (#2762)
  • aligned_sizeof with an existing identifier (#2766)
  • CUSPARSE_SPGEMM_ALG2 not working (#2768)
  • sum! throws dispatch error beyond a threshold number of rows (#2777)

v5.7.3

17 Apr 08:37
1a006ea
Compare
Choose a tag to compare

CUDA v5.7.3

Diff since v5.7.2

Merged pull requests:

v5.7.2

07 Apr 07:35
57e06f9
Compare
Choose a tag to compare

CUDA v5.7.2

Diff since v5.7.1

Merged pull requests:

Closed issues:

  • Ability to opt out of / improved automatic synchronization between tasks for shared array usage (#2617)
  • maximum(abs, CuSparseMatrixCSR) returns Inf (#2705)
  • mapreduce(f, op, A) for sparse A is wrong if f(0) =/= 0 (#2709)

v5.7.1

21 Mar 01:09
6180d2c
Compare
Choose a tag to compare

CUDA v5.7.1

Diff since v5.7.0

Merged pull requests:

Closed issues:

  • GC corruption on 1.10 during cusparse/reduce tests (#2027)
  • Launch bounds interface (#2674)
  • Precompilation errors: ERROR: LoadError: invalid redefinition of constant CUSPARSE.CuSparseUpperOrUnitUpperTriangular (#2690)

v5.7.0

11 Mar 09:37
Compare
Choose a tag to compare

CUDA v5.7.0

Diff since v5.6.1

Merged pull requests:

Closed issues:

  • Batched strided GEMM tests fail (#151)
  • CuArrays.CURAND.curand missing methods (#141)
  • Rationals behave badly (#118)
  • Matrix inversion for CuArray (#116)
  • Dot product of a complex CuArray with a real CuArray performance (#668)
  • Sporadic cudnn/convolution test failures (#725)
  • Support for LinearAlgebra.pinv (#883)
  • Update mv!, mm!, sv! and sm! with the future release of CUPARSE (#1610)
  • [CUSPARSE] changing size in similar returns a cpu array (#1667)
  • Mix precision sparse mul is not dispatched correctly (#1760)
  • Make CuRef(Value) behave more like Ref (#1803)
  • [cuTENSOR] Issue when contracting views of CuArrays with cuTENSOR (#2407)
  • versioninfo broken on Jetson Orin due to NVML lookup failure (#2542)
  • CUBLAS: Improve concurrency using device pointer mode (#2571)
  • NVML issues on Jetson Nano Orin (#2580)
  • Passing Symbol as a an argument fails (#2590)
  • Remove kron functionality (#2602)
  • Disable or make automatic prefecthing of unified memory optional (#2618)
  • Circular dependency in CUDA with Julia 1.10 (#2622)
  • Regression with nsys profile and CUDA.@profile (#2629)
  • PrecompileTools.jl with CUDA.jl causes kernels to fail to run on 1.11 (#2637)
  • Support Adjoint Sparse Matrices for CuSparseMatrixCOO (#2647)
  • Implicit stream sync in tasks serialise kernel execution (#2654)
  • Broadcasting on arrays larger than typemax(Int32) yields truncation error (#2658)
  • Problem with function in CUDA (#2667)
  • CUDA.limit errors with invalid argument (code 1, ERROR_INVALID_VALUE) (#2672)
  • CUDA.jl does not support tuples of UInt128 (#2675)
  • Can not permutedims! CuArray with length larger that typemax(Int32) (#2679)
  • Support for older GPUs (#2685)

v5.6.1

15 Jan 11:13
6ef1a3d
Compare
Choose a tag to compare

CUDA v5.6.1

Diff since v5.6.0

Merged pull requests:

Closed issues:

  • Add strides, implement CUDA Array Interface (#1298)
  • Restore broken CUBLAS test (#2584)
  • Issues with multiple GPUs on a single node (#2615)

v5.6.0

08 Jan 10:31
fc952a3
Compare
Choose a tag to compare

CUDA v5.6.0

Diff since v5.5.2

CUDA.jl v5.6 is a relatively minor release, which the most important change being behind the scenes: GPUArrays.jl v11 has switched to KernelAbstractions.jl (#2524).

Features

  • Update to CUDA 12.6.2 (#2512)
  • CUSOLVER: support for Xgeev! (#2513), XsyevBatched (#2577), gesv! and gels! (#2406)
  • CUBLAS: added multiplication of transpose / adjoint matrices by diagonal matrices (#2518, #2538)
  • Improve handle cache performance in the presence of many short-lived tasks (#2583)
  • CUFFT: Pre-allocate the buffer required for complex-to-real FFTs only once (#2578)
  • Improved batched pointer conversion for very large batches (#2608)

Bug fixes

  • Fix findall with an empty CuArray (#2554)
  • CUBLAS: Fix use of level 1 methods with strided arrays (#2528)
  • CUSOLVER: Fix Xgesvdr! (#2556)
  • Preserve the array buffer type with more linear algebra operations (#2534)
    Work around LinearAlgebra.jl breakage in Julia 1.11.2 concerning generic triangular (l/r)mul! - (#2585)
  • Fix ambiguity of LinearAlgebra.dot (#2569)
  • Native RNG: Fixes when working with very large arrays (#2561)
  • Avoid a deadlock due do union splitting in the mapreduce kernel (#2595)
  • Fix pinning of resized CPU memory by automatically re-pinning (#2599)

Merged pull requests:

Closed issues:

  • Inference failure with sort(::CuMatrix) after loading MLDatasets (#2258)
  • Kron Support for CuSparseMatrixCSC (#2370)
  • Broadcasting a function returning an anonymous function with a constructor over CUDA arrays fails to compile, "not isbits" (#2514)
  • CuArray view has different variable type outside x inside the cuda kernel (#2516)
  • Can't build cuDNN on centos7.8 (#2517)
  • Precompile errors (#2519)
  • Precompile errors (#2520)
  • Error returned from CUDA function in CUDA-aware MPI multi-GPU test (#2522)
  • Broadcasting over random static array errors on Julia 1.11 (#2523)
  • gemm_strided_batched only using strided CUDA kernel when first matrix is transposed (#2529)
  • CUDA runtime libraries are loaded from a system path due to LD_LIBRARY_PATH being set (#2530)
  • [Bug] UnifiedMemory buffer changes during LinearAlgebra operations (#2533)
  • Improve system library warning when running under profiler (#2540)
  • Local CUDA settings not propagated to Pkg.test (#2545)
  • Out of Memory when working with Distributed for Small Matricies (#2548)
  • findall is not working with an empty vector of bool (#2553)
  • CUDA code does not return when running under VSC Debugging mode (#2558)
  • dot is quite slow in multinest Arrays (#2559)
  • UndefVarError: backend not defined in GPUArrays (#2564)
  • view() returns CuArray instead of view for 1-D CuArrays (#2566)
  • dot ambiguity (#2568)
  • InvalidIRError thrown only if critical function is not previously compiled (#2573)
  • circular dependency during precompilation (#2579)
  • Sparse MatVec Is Nondeterministic? (#2582)
  • CUDA triggers long Circular dependency list (#2586)
  • Release v5.5.3 for GPUArray v11? (#2587)
  • 'dot' gives different answers when viewing rather than slicing multidimensional arrays (#2589)
  • Scalar indexing when performing kron on two CuVectors (#2591)
  • Faster strided-batched to batched wrapper (#2592)
  • Error when copying data to pinned and resized CPU array (#2594)
  • mapreducedim! size-dependent fail when narrowing float element types (#2595)
  • Missing Enzyme.make_zero in Enzyme extension leads to incorrect behaviour (#2598)
  • 'ArgumentError: array must be non-empty' when attempting to pop idle handles from HandleCache (#2603)
  • Do a release as current one doesn't support GPUArrays v11 (#2606)

v5.5.2

26 Sep 05:51
a1db081
Compare
Choose a tag to compare

CUDA v5.5.2

Diff since v5.5.1

Merged pull requests: