-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8266431: Dual-Pivot Quicksort improvements (Radix sort) #27411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8266431: Dual-Pivot Quicksort improvements (Radix sort) #27411
Conversation
The main achievements - introduced Radix in parallel sorting which shows several times boost of performance and has linear complexity instead of n*ln(n) - improved mixed insertion sort (makes whole sorting faster) - improved merging sort for almost sorted data - optimized parallel sorting - improved step for pivot candidates and pivot partitioning - suggested better buffer allocation: if no memory, it is switched to in-place sorting with no OutOfMemoryError - minor javadoc and comment changes - extended existing tests - added tests for radix sort, heap sort, insertion sort - added benchmarking JMH tests - improved test coverage
|
👋 Welcome back VladimirIaroslavski! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@VladimirIaroslavski The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
Sorting on a server with AVX512 instructions OS name: Fedora Linux 38 (Workstation Edition) OpenJDK version "24-beta" 2025-03-18 Sequential sorting
Parallel sorting
|
|
Sorting on a computer without AVX512 support OS name: Edition Windows 10 Pro OpenJDK 26-ea 2026-03-17 Sequential sorting
Parallel sorting
|
|
Nice work, Vladimir! Congratulations for all the hard work, needed to upgrade DPQS once again! I will look at this new PR. |
|
Please check your file encoding (windows vs jnix) see jcheck report! |
Webrevs
|
| * data, taking into account parallel context. | ||
| */ | ||
| boolean isLargeRandom = | ||
| // size > MIN_RADIX_SORT_SIZE && (sorter == null || bits > 0) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an outcommented line of code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean isLargeRandom =
// size > MIN_RADIX_SORT_SIZE && (sorter == null || bits > 0) &&
size > MIN_RADIX_SORT_SIZE && (sorter != null && bits > 0) &&
(a[e1] > a[e2] || a[e2] > a[e3] || a[e3] > a[e4] || a[e4] > a[e5]);
This code runs Radix sort during parallel sort only.
If you want to use Radix sort during sequential or parallel sort also,
you need to switch to the first line.
Agree, both lines should contain explanations.
If we agree to move Radix sort out from this PR, these lines go away from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered a private static final boolean USE_RADIX_SORT_WITH_PARALLEL = false;? Then we don't need to comment the code, and one could flip the behavior by altering the flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion, will apply in the next PR if Radix sort is introduced.
|
Do you have any sense on if parallelSort is widely used? There is a lot of code and complexity in this proposal and I think important to understand if it is worth it. |
|
It would be nice to have a high-level chart showing how the actual sorting algorithm is selected based on parallel/sequential, input size and whatever else. For an outsider, it's hard to grasp the whole picture. Is radix sort used for parallel sorting only? Sorry if such a chart already exists and I missed it. If improvements of parallel sorting and sequential sorting are at least partially independent, I think it may make sense to deliver them in separate PRs. This would really simplify the review process. |
|
Alan, Tagir, Thank you for your comments! New version of sources contains optimizations of sequential and parallel sorts, such as, improvements of insertion sort, better partitioning, better pivot selection, merging sort and counting sort (for byte/char/short). I think also it makes sense to separate this PR into two pull requests. Do you agree, if I move Radix sort out from this PR and re-run benchmarking? |
Yes, splitting this into two would be good as it allow both proposals to be discussed/reviewed on their own merit. |
* Moved Radix sort out from sorting
I added the chart to the first post, thank you for advice! |
|
I moved Radix sort out from sources and re-run benchmarking of parallel sorting. Parallel sorting (without Radix sort)
|
| */ | ||
| private static final int MAX_INSERTION_SORT_SIZE = 44; | ||
| private static final int MAX_INSERTION_SORT_SIZE = 51; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change justified by benchmarking? Why was the best value different before? Due to hardware changes or due to HotSpot/JIT changes or due to algorithmic changes below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are due to algorithmic changes - more optimized insertion sort (combination of mixed, pin and simple insertion sortings), therefore we can run insertion sort on larger arrays.
| private static <A> void sort(Class<?> elemType, A array, long offset, int low, int high, SortOperation<A> so) { | ||
| so.sort(array, low, high); | ||
| @IntrinsicCandidate | ||
| private static <T> void sort(Class<?> elemType, T a, long offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this method? Offset and elemType are not used here. Probably it could be inlined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods sort() and partition() were introduced in PR #14227 where intrinsics for sorting and partitioning based on AVX512 instructions were added. The signature of these methods are in sync with native implementations from linux/native/libsimdsort package.
* Added @java.io.Serial * Added information about the best data types for thresholds of sorting * Added comments about native implementation based on AVX512 instructions
|
Thank you for your comments and improvements. In general, looks good to me, but please note that I'm not really an expert in this area. It would be great to get the comments from more relevant people. |
|
I also sponsor this effort, I helped on benchmarking this DPQS 2 years ago... LGTM |
|
Hi expert/reviewers, @stuart-marks Stuart Marks I'd appreciate it if you find a time to review this PR related to optimization of sorting. Of course, there are a lot of changes, but each sorting of primitive was modified in the same way, so actually ~1/7 of sources should be reviewed only. If you need, we can set up a meeting at any time where I can provide an explanation for better understanding of new sources, I'm reached by mail vlv.spb.ru gmail com Paul, Sandhya, Jatin, you reviewed AVX512 optimization for sorting a few years ago, maybe you have time now, please, to review this optimization? Thank you in advance, |
The goal of the PR is to improve both, sequential and parallel, sorting of primitives.
The main achievements
introduced Radix in parallel sorting which shows several times boost of performance and has linear complexity instead of n*ln(n)
improved mixed insertion sort (makes whole sorting faster)
improved merging sort for almost sorted data
optimized parallel sorting
improved step for pivot candidates and pivot partitioning
suggested better buffer allocation: if no memory, it is switched to in-place sorting with no OutOfMemoryError
minor javadoc and comment changes
extended existing tests
added tests for radix sort, heap sort, insertion sort
added benchmarking JMH tests
improved test coverage
The summary of benchmarking:
Sequential sorting (Arrays.sort)
byte: up to 50% faster
char: 4-7 times faster
short: 2-6 times faster
int: 1.2-5 times faster
long: 1.2-5 times faster
float: 1.2-5 times faster
double: 1.2-4 times faster
Parallel sorting (Arrays.parallelSort)
int: 1.2-9 times faster
long: 1.2-9 times faster
float: 1.2-4 times faster
double: 1.2-4 times faster
AVX512 support
Vamsi Parasa suggested faster sort routines by taking advantage of AVX512 instructions, see #14227, sources of sorting were modified. Therefore, I performed benchmarking of the final version (which includes optimizations by Vamsi Parasa and optimizations from my side) on a server with CPUs supported AVX512 instructions, no regression of performance was found, see detailed benchmarking results.
High-level chart showing how the actual sorting algorithm is selected
based on parallel/sequential and input size
int/long/float/double
if size < MAX_INSERTION_SORT_SIZE(=51) => [ mixed ] insertion sort
if array is almost sorted and size > MIN_MERGING_SORT_SIZE(=512) => merging sort
if recursion depth > MAX_RECURSION_DEPTH(=64) => heap sort
if random data => two pivots quicksort, else => one pivot quicksort
byte
if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
else => counting sort
char/short
if size > MIN_COUNTING_SORT_SIZE(640) => counting sort
if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort
if recursion depth > MAX_RECURSION_DEPTH(=64) => counting sort
if random data => two pivots quicksort, else => one pivot quicksort
parallel sorting (int/long/float/double)
if size < MIN_PARALLEL_SORT_SIZE(3K) => sequential sort
invoke parallel merge sort, depth depends on parallelism level
then switch to parallel quicksort.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27411/head:pull/27411$ git checkout pull/27411Update a local copy of the PR:
$ git checkout pull/27411$ git pull https://git.openjdk.org/jdk.git pull/27411/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27411View PR using the GUI difftool:
$ git pr show -t 27411Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27411.diff
Using Webrev
Link to Webrev Comment