Further improve ProbabilisticMap on Avx512 #107798
Merged
+28
−28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains 3 separate changes that take advantage of Avx512's permute behavior.
The first change is skipping an AND instruction by taking advantage of the fact that
PermuteVar64x8will only look at the bottom 6 bits (64 values) of the control.Note that we were doing
AND 31instead of63though.In this case this is okay because we know that the
charMapis just a duplicated 256-bit lookup.The 6th bit that we aren't masking off anymore will impact whether we pick from the first 0-31, or 32-63 values, but that doesn't matter since the two are the same.
The second change is also taking advantage of the above observation.
It recognizes that the
values >>> 5operation is emulated asvalues.AsInt32() >>> 5).AsByte() & Vector128.Create((byte)7)since there's no instruction for>>>on bytes on X86.We can skip that
& 7operation if we swap out the shuffle for a permute. This way we're again only looking at the lower 6 bits. As before, we now have bits 4/5/6 that aren't getting masked off, but that's okay since the values are duplicated 8 times.The third change is taking advantage of the
PermuteVar32x8x2instruction to pick alternating bytes from the two source vectors, instead of shifting the bytes around and doing a saturating pack.Since
PackUnsignedSaturatealso shuffles inputs around a bit, we needed to reverse that by callingFixUpPackedVector512Result(another permute) if we did find any potential matches.PermuteVar64x8x2keeps the input order as-is, meaning we can skip that "fix up".Overall, it adds up to a ~20% improvement.