-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Dedicated ranges::minmax vectorization that does not unnecessarily track element pointer
#4384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: A. Jiang <[email protected]>
StephanTLavavej
requested changes
Feb 11, 2024
This is a bug fix, not only a change to address a pedantic comment!
…_Val1` and `_Val2`.
Regex renamed:
`\[\]\((__m128[id]?) _First, \1 _Second\) \{ return (\w+)\(_First, _Second\); \}`
to:
`[]($1 _Val1, $1 _Val2) { return $2(_Val1, _Val2); }`
Remove comments from `_mm_min_epi16` and `_mm_max_epi16` - they're SSE2, not SSE4.1.
StephanTLavavej
approved these changes
Feb 12, 2024
|
Thanks! I resolved the remaining feedback comments, and decided to simply require SSE 4.2 for the new optimization, consistent with the previous optimization. This is a lot easier to reason about.
|
|
I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
CaseyCarter
approved these changes
Feb 16, 2024
|
Thanks for maximizing performance! 😹 🚀 😻 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Towards #2803
New header
Can't have
extern "C"functions that use templates, so the new header accommodates non-templateminmaxstructure variations.Sign handing
Unlike
_elementcounterpart, the integer signedness affects function selection rather than passed as runtime parameter.For
minmax_element, computing element pointer requires index, socmpgt+blendvpair is needed to capture it.cmpgtis only available in signed form, so unsigned types are supported by adjustment of the input values.For
minmaxvalues, computing only vertical min/max without the index for most of types doesn't need thecmpgt+blendvpair, as there are signed and unsignedmin/maxoperation. The only exception is 64-bit integer support: there are nomin/maxops, so still usingblendv. For consistency, enforced also by the template use, even for 64-bit integers the sign is still passed as function selectionBenchmark
_valare new results, we indirect for value thereI put output of the benchmark before and after in a single table
Some rows show order of magnitude improvement.
Some show difference within results variation.
I interpret these results as the change being pure improvement, and the row that don't show it are memory-bound rather than CPU bound, and this can change on a different machine or by reducing input data amount.