Skip to content

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

@ezhevita

Description

@ezhevita

Description

I was comparing VectorT.ConditionalSelect (128-bit in particular) and Sse41.BlendVariable disassembly output and found out that the first method gets only optimized into the second one only if the mask is the result of Compare* intrinsic. However we can actually optimize it if the mask is constant (e.g. Vector.Create) since we can check its contents in the JIT during compilation. Here is the reproduction repo. I’ve also implemented an optimization in JIT here (worth mentioning that I’ve currently haven’t optimized VectorT.ConditionalSelect(Vector.Create(fieldOrVariable)) since I didn’t find a way to inspect the values of field/variable, GT_LCL_VAR and GT_LCL_FLD in the JIT tree).

Configuration

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3737/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.5.24307.3
  [Host]     : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2

Regression?

Not a regression.

Data

Reproduction, benchmarks, and disassembly: https://github.com/ezhevita/ConditionalSelectReproduce

Analysis

The current implementation only checks for Compare* intrinsics however we can actually check and prove that the mask is indeed per-element if the vector is constant.
Also another solution might be a new method in the API for the consumers which restricts the vector to be per-element mask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions