[Arm64] SIMD HW Intrinsic API scope and high level design

Most of the intrinsics with clear and exact match to X86 have been proposed and have open issues.  

This is intended as a draft design/scoping exercise for the SIMD class to help ease further API reviews.

**Naming conventions**
+ Intrinsic names will roughly follow instruction descriptions in ARMv8 ARM tables from section C3 A64 Instruction Set Overview
+ Drop adjectives ~~Floating~~, ~~Signed~~, ~~unsigned~~.  These will be handled by type system
+ Use `*Add`, `*Subtract` postfix for accumulating forms
+ Use modifiers w/o abbreviation
      `Absolute`, `Halving`, `Numeric`, `Extend`, `Polynomial`, `Saturating`. `Rounding`, `Doubling`, 
      `High`, `Long`, `Wide`, `Narrow`, `Upper`, `Lower`
      `RoundEven`, `RoundZero`, `RoundPos`, `RoundNeg`, `RoundAway`
+ For example `SQDMULH` `Signed saturating doubling multiply returning high half` would naturally become `SaturatingDoublingMultiplyHigh`.  and would be the proposed intrinsic name

**Argument conventions**
+ Binary operators will take `left` and `right` arguments
+ Unary operators will take a `value` argument
+ Instruction which insert into high half, will take a source operand which is the `target` register to be inserted into.  This will typically be the first argument.  The Method name will typically have a suffix `Upper`
+ Instruction with adding or subtracting accumulators will take a source operand which is the `acc` register.  This will be the left operand in the add/subtract.  
+ Argument order will typically be in left to right order following ARM assembly conventions.  Exception can and will occur.  Especially when copying a X86 C# API.

**Lowering/Containment**
+ Whenever an intrinsic can easily be expressed through containment without loss it should be dropped
+ If there are intermediate truncation/rounding/overflow issues, this rejects containment as identical results can not be guaranteed.
+ By element forms will typically be exposed through containment.

**Scope/state of instructions/intrinsics** 
Outline follows `ARMv8 ARM reference Manual C3. A64 Instruction Set Overview` with focus on SIMD
If intrinsic design looks straight forward, no comments are shown.  
Outline is exhaustive to allow for discussion.

- [ ] Load/Store scalar SIMD
    - [x] Load/Store scalar SIMD dotnet/runtime#24771
    - [ ] Load/Store scalar SIMD register pair
          Recommendation: 
          `ValueTuple<Vector64<A>, Vector64<B>> LoadVector64Pair<A,B>(void * address)`
          `void Store<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>)`
    - [ ] Load/Store scalar SIMD register Non-temporal pair
          Recommendation: 
          `ValueTuple<Vector64<A>, Vector64<B>> LoadVector64NonTemporalPair<A,B>(void * address)`
          `void StoreNonTemporal<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>)`
- [ ] Load/Store Vector
    - [ ] Load/Store structures (multiple structures)
          Recommendation: 
          `Vector64<A> LoadVector64<A>(void * address, Vector64<A> target)`
          `ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target)`
          `void Store<A>(void * address, Vector64<A> target)`
          `void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target)`
    - [ ] Load/Store structures (single structures)
          Recommendation: 
          `Vector64<A> LoadVector64<A>(void * address, Vector64<A> target, byte index)`
          `ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target, byte index)`
          `void Store<A>(void * address, Vector64<A> target, byte index)`
          `void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target, byte index)`
    - [x] Load single structure and replicate
          Recommendation: 
          `Vector64<A> LoadAllVector64<A>(void * address)`
          `ValueTuple<Vector64<A>, ... Vector64<D>> LoadAllVector64Tuple<A,B,C,D>(void * address)`
- [x] Floating-point conversion
    - [x] convert to floating-point
          Recommendation: 
          `Vector64<float> ConvertToVector64Single(Vector64<int> a)`
          `Vector128<doulble> ConvertToVector128Double(Vector128<ulong> a)`
- [x] SIMD move
- [x] SIMD arithmetic 
   - [x] Partial see dotnet/runtime#24584 for basic ops
   - [ ] Rest mostly simple application of naming conventions above
- [x] SIMD compare   
- [x] SIMD widening and narrowing arithmetic
- [x] SIMD unary arithmetic
     Use `ReverseElementBits` for `REV`
     Use `ReverseElementBytes` for `REV16`, `REV32`, `REV64` (separate names would make implementation slightly simpler.)
- [x] SIMD by element arithmetic
      Whenever possible treat the element as the base type & contain the `Extract` element intrinsic
- [x] SIMD permute
- [x] SIMD immediate
    Handle these when feasible by containment/lowering
- [x] SIMD shift (immediate)
- [x] SIMD floating-point and integer conversion
    `ConvertTo*` i.e. `ConvertToSingleRoundNearest`
- [x] SIMD reduce (across vector lanes)
    Use `*Across` per ARM convention ~~(or `Horizontal*` per X86 convention.)~~
- [x] SIMD pairwise arithmetic
    `*Pairwise`
- [x] SIMD table lookup


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Arm64] SIMD HW Intrinsic API scope and high level design #24790

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Arm64] SIMD HW Intrinsic API scope and high level design #24790

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions