Skip to content

[Arm64] SIMD HW Intrinsic API scope and high level design #24790

@sdmaclea

Description

@sdmaclea

Most of the intrinsics with clear and exact match to X86 have been proposed and have open issues.

This is intended as a draft design/scoping exercise for the SIMD class to help ease further API reviews.

Naming conventions

  • Intrinsic names will roughly follow instruction descriptions in ARMv8 ARM tables from section C3 A64 Instruction Set Overview
  • Drop adjectives Floating, Signed, unsigned. These will be handled by type system
  • Use *Add, *Subtract postfix for accumulating forms
  • Use modifiers w/o abbreviation
    Absolute, Halving, Numeric, Extend, Polynomial, Saturating. Rounding, Doubling,
    High, Long, Wide, Narrow, Upper, Lower
    RoundEven, RoundZero, RoundPos, RoundNeg, RoundAway
  • For example SQDMULH Signed saturating doubling multiply returning high half would naturally become SaturatingDoublingMultiplyHigh. and would be the proposed intrinsic name

Argument conventions

  • Binary operators will take left and right arguments
  • Unary operators will take a value argument
  • Instruction which insert into high half, will take a source operand which is the target register to be inserted into. This will typically be the first argument. The Method name will typically have a suffix Upper
  • Instruction with adding or subtracting accumulators will take a source operand which is the acc register. This will be the left operand in the add/subtract.
  • Argument order will typically be in left to right order following ARM assembly conventions. Exception can and will occur. Especially when copying a X86 C# API.

Lowering/Containment

  • Whenever an intrinsic can easily be expressed through containment without loss it should be dropped
  • If there are intermediate truncation/rounding/overflow issues, this rejects containment as identical results can not be guaranteed.
  • By element forms will typically be exposed through containment.

Scope/state of instructions/intrinsics
Outline follows ARMv8 ARM reference Manual C3. A64 Instruction Set Overview with focus on SIMD
If intrinsic design looks straight forward, no comments are shown.
Outline is exhaustive to allow for discussion.

  • Load/Store scalar SIMD
    • Load/Store scalar SIMD API Proposal : Arm64 Load & Store #24771
    • Load/Store scalar SIMD register pair
      Recommendation:
      ValueTuple<Vector64<A>, Vector64<B>> LoadVector64Pair<A,B>(void * address)
      void Store<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>)
    • Load/Store scalar SIMD register Non-temporal pair
      Recommendation:
      ValueTuple<Vector64<A>, Vector64<B>> LoadVector64NonTemporalPair<A,B>(void * address)
      void StoreNonTemporal<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>)
  • Load/Store Vector
    • Load/Store structures (multiple structures)
      Recommendation:
      Vector64<A> LoadVector64<A>(void * address, Vector64<A> target)
      ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target)
      void Store<A>(void * address, Vector64<A> target)
      void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target)
    • Load/Store structures (single structures)
      Recommendation:
      Vector64<A> LoadVector64<A>(void * address, Vector64<A> target, byte index)
      ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target, byte index)
      void Store<A>(void * address, Vector64<A> target, byte index)
      void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target, byte index)
    • Load single structure and replicate
      Recommendation:
      Vector64<A> LoadAllVector64<A>(void * address)
      ValueTuple<Vector64<A>, ... Vector64<D>> LoadAllVector64Tuple<A,B,C,D>(void * address)
  • Floating-point conversion
    • convert to floating-point
      Recommendation:
      Vector64<float> ConvertToVector64Single(Vector64<int> a)
      Vector128<doulble> ConvertToVector128Double(Vector128<ulong> a)
  • SIMD move
  • SIMD arithmetic
  • SIMD compare
  • SIMD widening and narrowing arithmetic
  • SIMD unary arithmetic
    Use ReverseElementBits for REV
    Use ReverseElementBytes for REV16, REV32, REV64 (separate names would make implementation slightly simpler.)
  • SIMD by element arithmetic
    Whenever possible treat the element as the base type & contain the Extract element intrinsic
  • SIMD permute
  • SIMD immediate
    Handle these when feasible by containment/lowering
  • SIMD shift (immediate)
  • SIMD floating-point and integer conversion
    ConvertTo* i.e. ConvertToSingleRoundNearest
  • SIMD reduce (across vector lanes)
    Use *Across per ARM convention (or Horizontal* per X86 convention.)
  • SIMD pairwise arithmetic
    *Pairwise
  • SIMD table lookup

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions