-
Couldn't load subscription status.
- Fork 5.2k
Description
Description
I was comparing VectorT.ConditionalSelect (128-bit in particular) and Sse41.BlendVariable disassembly output and found out that the first method gets only optimized into the second one only if the mask is the result of Compare* intrinsic. However we can actually optimize it if the mask is constant (e.g. Vector.Create) since we can check its contents in the JIT during compilation. Here is the reproduction repo. I’ve also implemented an optimization in JIT here (worth mentioning that I’ve currently haven’t optimized VectorT.ConditionalSelect(Vector.Create(fieldOrVariable)) since I didn’t find a way to inspect the values of field/variable, GT_LCL_VAR and GT_LCL_FLD in the JIT tree).
Configuration
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3737/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.5.24307.3
[Host] : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2
DefaultJob : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2
Regression?
Not a regression.
Data
Reproduction, benchmarks, and disassembly: https://github.com/ezhevita/ConditionalSelectReproduce
Analysis
The current implementation only checks for Compare* intrinsics however we can actually check and prove that the mask is indeed per-element if the vector is constant.
Also another solution might be a new method in the API for the consumers which restricts the vector to be per-element mask.