-
-
Couldn't load subscription status.
- Fork 5.7k
Faster linear indexing for SubArrays, dims 1-5 #4427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Is |
|
Might be worthwhile to add support for C's See: http://stackoverflow.com/questions/3895081/divide-and-get-remainder-at-the-same-time, http://en.cppreference.com/w/c/numeric/math/div |
|
Hmm, you're right. I should have said that the profiler assigns that kind of blame to It is a bit of a mystery, then, why linear indexing is still so much slower even with this patch. @kmsquire, if it meant a call out to |
|
Okay, I can see that a call out to A better suggestion, then, is to add direct support to julia for |
|
Yes, we can add divmod, particularly after Keno's patch that makes tuples faster. |
|
Given Jeff's observation about Timing results, C (compiled with Timing results, Julia: Not too bad on the I'm still not great at reading assembly, so I thought I'd post the two machine codes (for the C version, I'm just assuming I've got the whole thing, but I'm not certain). C (with Julia: The main thing I noticed are the |
|
One of the calls is for |
|
Thanks, @simonster; I was just about to dig into this myself. I've updated my PR to avoid the call to size, and it does make a modest difference. Certainly one gap between the array performance and subarray performance is inlining. In playing with this I discovered that gets inlined but even as simple a change as does not. Hopefully #3796 will fix this. Based on my tests this may account for about a factor of 5 in performance, but in 5d we still have an additional factor of 8 that does not make a lot of sense. One of the things that I noticed is that after each There turns out to be a more than two-fold difference in the execution time of code that calls this function: depending on whether that one line is commented out! (Note that in neither case does the result of the function depend on what happens to |
|
To elaborate on the last point: it seems completely nonsensical that an extra multiply and subtract should more than double the execution time of an algorithm that has 13 other arithmetic operations and is generating a cache miss on every 8th call. The C version demonstrates that the cache misses should be the only thing that matters: in terms of performance, the C subarray version is indistinguishable from the C plain-array version, so all these arithmetic operations must have negligible impact on performance (they are local to the CPU). Yet they dominate the performance for the Julia version. Something is fishy. |
Faster linear indexing for SubArrays, dims 1-5
I'm not even sure why we were generating code to check for a zero denominator – this seems to still raise an exception as expected. The other special case just isn't worth it to emit more than one instruction for div – and div/rem since x86 does both together.
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Contains the following commits: 5bfd1161e * STDLIBS_BY_VERSION: Check sorted & require same minor (JuliaLang#4414) ce986129c * Fix completion on empty command (JuliaLang#4418) 8d74d35d1 * update SHA compat (JuliaLang#4436) 14c5ae327 * add a docstring to Registry module (JuliaLang#4432) bbb9e6d23 * allow `generate` into an empty directory (JuliaLang#4430) a1818b9a9 * Drop the REPL search keymap (JuliaLang#4425) baa7981c7 * do not try to update registries in an unwritable folder (JuliaLang#4429) 01690b54b * make `.path` field consistently be relative manifest and convert to project relative upon writing to a project file (JuliaLang#4427) 3306ed522 * Deduplicate suggestions in package completions (JuliaLang#4431)
Since many of the algorithms in base working on
AbstractArrays use linear indexing, it seemed reasonable to write hand-tuned linear-indexing for the most commonly-used dimensions. This is still much slower than cartesian indexing, but a big improvement over what we have now. In 5 dimensions the improvement is something like 40-fold, as shown below.Test script:
and so forth up through 5 dimensions (always with
10^6total elements).Before this patch:
In 5d, it's approximately 2000-fold slower to use a
SubArray.With this patch:
Now in 5d it's a mere 50-fold slower to use a
SubArray. The main remaining issue is thatdivis approximately ten-fold more expensive than*and/.