-
Notifications
You must be signed in to change notification settings - Fork 256
Fixes zero-dim matmatmul & matvecmul #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tam724
wants to merge
5
commits into
JuliaGPU:master
Choose a base branch
from
tam724:zero-dim_matmatmul
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+14
−2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
| Benchmark suite | Current: c154079 | Previous: f4c05e0 | Ratio |
|---|---|---|---|
latency/precompile |
56440455535 ns |
56743162658.5 ns |
0.99 |
latency/ttfp |
8158073880 ns |
8292887489.5 ns |
0.98 |
latency/import |
4496920342 ns |
4493784612 ns |
1.00 |
integration/volumerhs |
9614290.5 ns |
9612835.5 ns |
1.00 |
integration/byval/slices=1 |
147247 ns |
146961 ns |
1.00 |
integration/byval/slices=3 |
426238.5 ns |
425977 ns |
1.00 |
integration/byval/reference |
145275 ns |
145162 ns |
1.00 |
integration/byval/slices=2 |
286621.5 ns |
286531 ns |
1.00 |
integration/cudadevrt |
103695 ns |
103664 ns |
1.00 |
kernel/indexing |
14623 ns |
14225 ns |
1.03 |
kernel/indexing_checked |
15017.5 ns |
14963.5 ns |
1.00 |
kernel/occupancy |
671.2961783439491 ns |
712.5909090909091 ns |
0.94 |
kernel/launch |
2245.8888888888887 ns |
2140.1111111111113 ns |
1.05 |
kernel/rand |
16277 ns |
17014 ns |
0.96 |
array/reverse/1d |
20414 ns |
19857 ns |
1.03 |
array/reverse/2dL_inplace |
66994 ns |
66720 ns |
1.00 |
array/reverse/1dL |
70621 ns |
70068 ns |
1.01 |
array/reverse/2d |
22093 ns |
21721 ns |
1.02 |
array/reverse/1d_inplace |
9904.5 ns |
11535 ns |
0.86 |
array/reverse/2d_inplace |
13574 ns |
13153 ns |
1.03 |
array/reverse/2dL |
74199.5 ns |
73755 ns |
1.01 |
array/reverse/1dL_inplace |
67050 ns |
66862 ns |
1.00 |
array/copy |
21232 ns |
20647 ns |
1.03 |
array/iteration/findall/int |
159783 ns |
158235 ns |
1.01 |
array/iteration/findall/bool |
141590 ns |
139770.5 ns |
1.01 |
array/iteration/findfirst/int |
161911 ns |
161047 ns |
1.01 |
array/iteration/findfirst/bool |
162355 ns |
162113 ns |
1.00 |
array/iteration/scalar |
75128 ns |
73378 ns |
1.02 |
array/iteration/logical |
219323.5 ns |
216537 ns |
1.01 |
array/iteration/findmin/1d |
51897 ns |
50322 ns |
1.03 |
array/iteration/findmin/2d |
97022 ns |
96281.5 ns |
1.01 |
array/reductions/reduce/Int64/1d |
43894 ns |
43275 ns |
1.01 |
array/reductions/reduce/Int64/dims=1 |
55441 ns |
44878 ns |
1.24 |
array/reductions/reduce/Int64/dims=2 |
61803 ns |
61376 ns |
1.01 |
array/reductions/reduce/Int64/dims=1L |
89289 ns |
89018 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88383 ns |
87717 ns |
1.01 |
array/reductions/reduce/Float32/1d |
37343 ns |
36706 ns |
1.02 |
array/reductions/reduce/Float32/dims=1 |
47678 ns |
41841.5 ns |
1.14 |
array/reductions/reduce/Float32/dims=2 |
60266 ns |
59890 ns |
1.01 |
array/reductions/reduce/Float32/dims=1L |
52643 ns |
52369 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
72547.5 ns |
71845 ns |
1.01 |
array/reductions/mapreduce/Int64/1d |
43990 ns |
43034 ns |
1.02 |
array/reductions/mapreduce/Int64/dims=1 |
46234 ns |
44568 ns |
1.04 |
array/reductions/mapreduce/Int64/dims=2 |
61850 ns |
61598 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
89271.5 ns |
88831 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
88676 ns |
88197 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
37376 ns |
36550 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=1 |
42040 ns |
51845 ns |
0.81 |
array/reductions/mapreduce/Float32/dims=2 |
60371 ns |
60046 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
53317.5 ns |
52895 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=2L |
72675 ns |
72274 ns |
1.01 |
array/broadcast |
20631.5 ns |
20228 ns |
1.02 |
array/copyto!/gpu_to_gpu |
13559 ns |
12997 ns |
1.04 |
array/copyto!/cpu_to_gpu |
216920.5 ns |
214588 ns |
1.01 |
array/copyto!/gpu_to_cpu |
283648.5 ns |
283061 ns |
1.00 |
array/accumulate/Int64/1d |
125592 ns |
124766 ns |
1.01 |
array/accumulate/Int64/dims=1 |
84564 ns |
83121 ns |
1.02 |
array/accumulate/Int64/dims=2 |
158606 ns |
157489 ns |
1.01 |
array/accumulate/Int64/dims=1L |
1710160 ns |
1708744 ns |
1.00 |
array/accumulate/Int64/dims=2L |
967189 ns |
966369 ns |
1.00 |
array/accumulate/Float32/1d |
109583 ns |
109029 ns |
1.01 |
array/accumulate/Float32/dims=1 |
81043 ns |
80115 ns |
1.01 |
array/accumulate/Float32/dims=2 |
148178.5 ns |
147066 ns |
1.01 |
array/accumulate/Float32/dims=1L |
1618740.5 ns |
1617852.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698802.5 ns |
697700.5 ns |
1.00 |
array/construct |
1289.7 ns |
1284.9 ns |
1.00 |
array/random/randn/Float32 |
49870 ns |
44088.5 ns |
1.13 |
array/random/randn!/Float32 |
25113 ns |
24724 ns |
1.02 |
array/random/rand!/Int64 |
27825 ns |
27197 ns |
1.02 |
array/random/rand!/Float32 |
8914.333333333334 ns |
8847.666666666666 ns |
1.01 |
array/random/rand/Int64 |
30447 ns |
29769 ns |
1.02 |
array/random/rand/Float32 |
13414.5 ns |
13169 ns |
1.02 |
array/permutedims/4d |
60045 ns |
60066.5 ns |
1.00 |
array/permutedims/2d |
54855 ns |
53803 ns |
1.02 |
array/permutedims/3d |
55596 ns |
54690 ns |
1.02 |
array/sorting/1d |
2759173 ns |
2756717 ns |
1.00 |
array/sorting/by |
3371294.5 ns |
3343987 ns |
1.01 |
array/sorting/2d |
1089235 ns |
1080056.5 ns |
1.01 |
cuda/synchronization/stream/auto |
1020.5 ns |
1028.4 ns |
0.99 |
cuda/synchronization/stream/nonblocking |
7551.4 ns |
7619.4 ns |
0.99 |
cuda/synchronization/stream/blocking |
805.1063829787234 ns |
806.3333333333334 ns |
1.00 |
cuda/synchronization/context/auto |
1183.5 ns |
1172.7 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7267.2 ns |
7177 ns |
1.01 |
cuda/synchronization/context/blocking |
908.8 ns |
911.1923076923077 ns |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2952 and #2607.
The (m x 0) * (0 x n) matmatmul and the (m x 0) * (0) matvecmul edgecase should probably be tested in the GPUArrays.jl testsuite (for all GPU backends). I'll add a PR there (JuliaGPU/GPUArrays.jl#646).