Skip to content

Conversation

@tam724
Copy link

@tam724 tam724 commented Nov 4, 2025

Closes #2952 and #2607.
The (m x 0) * (0 x n) matmatmul and the (m x 0) * (0) matvecmul edgecase should probably be tested in the GPUArrays.jl testsuite (for all GPU backends). I'll add a PR there (JuliaGPU/GPUArrays.jl#646).

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: c154079 Previous: f4c05e0 Ratio
latency/precompile 56440455535 ns 56743162658.5 ns 0.99
latency/ttfp 8158073880 ns 8292887489.5 ns 0.98
latency/import 4496920342 ns 4493784612 ns 1.00
integration/volumerhs 9614290.5 ns 9612835.5 ns 1.00
integration/byval/slices=1 147247 ns 146961 ns 1.00
integration/byval/slices=3 426238.5 ns 425977 ns 1.00
integration/byval/reference 145275 ns 145162 ns 1.00
integration/byval/slices=2 286621.5 ns 286531 ns 1.00
integration/cudadevrt 103695 ns 103664 ns 1.00
kernel/indexing 14623 ns 14225 ns 1.03
kernel/indexing_checked 15017.5 ns 14963.5 ns 1.00
kernel/occupancy 671.2961783439491 ns 712.5909090909091 ns 0.94
kernel/launch 2245.8888888888887 ns 2140.1111111111113 ns 1.05
kernel/rand 16277 ns 17014 ns 0.96
array/reverse/1d 20414 ns 19857 ns 1.03
array/reverse/2dL_inplace 66994 ns 66720 ns 1.00
array/reverse/1dL 70621 ns 70068 ns 1.01
array/reverse/2d 22093 ns 21721 ns 1.02
array/reverse/1d_inplace 9904.5 ns 11535 ns 0.86
array/reverse/2d_inplace 13574 ns 13153 ns 1.03
array/reverse/2dL 74199.5 ns 73755 ns 1.01
array/reverse/1dL_inplace 67050 ns 66862 ns 1.00
array/copy 21232 ns 20647 ns 1.03
array/iteration/findall/int 159783 ns 158235 ns 1.01
array/iteration/findall/bool 141590 ns 139770.5 ns 1.01
array/iteration/findfirst/int 161911 ns 161047 ns 1.01
array/iteration/findfirst/bool 162355 ns 162113 ns 1.00
array/iteration/scalar 75128 ns 73378 ns 1.02
array/iteration/logical 219323.5 ns 216537 ns 1.01
array/iteration/findmin/1d 51897 ns 50322 ns 1.03
array/iteration/findmin/2d 97022 ns 96281.5 ns 1.01
array/reductions/reduce/Int64/1d 43894 ns 43275 ns 1.01
array/reductions/reduce/Int64/dims=1 55441 ns 44878 ns 1.24
array/reductions/reduce/Int64/dims=2 61803 ns 61376 ns 1.01
array/reductions/reduce/Int64/dims=1L 89289 ns 89018 ns 1.00
array/reductions/reduce/Int64/dims=2L 88383 ns 87717 ns 1.01
array/reductions/reduce/Float32/1d 37343 ns 36706 ns 1.02
array/reductions/reduce/Float32/dims=1 47678 ns 41841.5 ns 1.14
array/reductions/reduce/Float32/dims=2 60266 ns 59890 ns 1.01
array/reductions/reduce/Float32/dims=1L 52643 ns 52369 ns 1.01
array/reductions/reduce/Float32/dims=2L 72547.5 ns 71845 ns 1.01
array/reductions/mapreduce/Int64/1d 43990 ns 43034 ns 1.02
array/reductions/mapreduce/Int64/dims=1 46234 ns 44568 ns 1.04
array/reductions/mapreduce/Int64/dims=2 61850 ns 61598 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 89271.5 ns 88831 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 88676 ns 88197 ns 1.01
array/reductions/mapreduce/Float32/1d 37376 ns 36550 ns 1.02
array/reductions/mapreduce/Float32/dims=1 42040 ns 51845 ns 0.81
array/reductions/mapreduce/Float32/dims=2 60371 ns 60046 ns 1.01
array/reductions/mapreduce/Float32/dims=1L 53317.5 ns 52895 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 72675 ns 72274 ns 1.01
array/broadcast 20631.5 ns 20228 ns 1.02
array/copyto!/gpu_to_gpu 13559 ns 12997 ns 1.04
array/copyto!/cpu_to_gpu 216920.5 ns 214588 ns 1.01
array/copyto!/gpu_to_cpu 283648.5 ns 283061 ns 1.00
array/accumulate/Int64/1d 125592 ns 124766 ns 1.01
array/accumulate/Int64/dims=1 84564 ns 83121 ns 1.02
array/accumulate/Int64/dims=2 158606 ns 157489 ns 1.01
array/accumulate/Int64/dims=1L 1710160 ns 1708744 ns 1.00
array/accumulate/Int64/dims=2L 967189 ns 966369 ns 1.00
array/accumulate/Float32/1d 109583 ns 109029 ns 1.01
array/accumulate/Float32/dims=1 81043 ns 80115 ns 1.01
array/accumulate/Float32/dims=2 148178.5 ns 147066 ns 1.01
array/accumulate/Float32/dims=1L 1618740.5 ns 1617852.5 ns 1.00
array/accumulate/Float32/dims=2L 698802.5 ns 697700.5 ns 1.00
array/construct 1289.7 ns 1284.9 ns 1.00
array/random/randn/Float32 49870 ns 44088.5 ns 1.13
array/random/randn!/Float32 25113 ns 24724 ns 1.02
array/random/rand!/Int64 27825 ns 27197 ns 1.02
array/random/rand!/Float32 8914.333333333334 ns 8847.666666666666 ns 1.01
array/random/rand/Int64 30447 ns 29769 ns 1.02
array/random/rand/Float32 13414.5 ns 13169 ns 1.02
array/permutedims/4d 60045 ns 60066.5 ns 1.00
array/permutedims/2d 54855 ns 53803 ns 1.02
array/permutedims/3d 55596 ns 54690 ns 1.02
array/sorting/1d 2759173 ns 2756717 ns 1.00
array/sorting/by 3371294.5 ns 3343987 ns 1.01
array/sorting/2d 1089235 ns 1080056.5 ns 1.01
cuda/synchronization/stream/auto 1020.5 ns 1028.4 ns 0.99
cuda/synchronization/stream/nonblocking 7551.4 ns 7619.4 ns 0.99
cuda/synchronization/stream/blocking 805.1063829787234 ns 806.3333333333334 ns 1.00
cuda/synchronization/context/auto 1183.5 ns 1172.7 ns 1.01
cuda/synchronization/context/nonblocking 7267.2 ns 7177 ns 1.01
cuda/synchronization/context/blocking 908.8 ns 911.1923076923077 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wrong matmul with empty matrices

1 participant