-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Milestone
Description
Most of the intrinsics with clear and exact match to X86 have been proposed and have open issues.
This is intended as a draft design/scoping exercise for the SIMD class to help ease further API reviews.
Naming conventions
- Intrinsic names will roughly follow instruction descriptions in ARMv8 ARM tables from section C3 A64 Instruction Set Overview
- Drop adjectives
Floating,Signed,unsigned. These will be handled by type system - Use
*Add,*Subtractpostfix for accumulating forms - Use modifiers w/o abbreviation
Absolute,Halving,Numeric,Extend,Polynomial,Saturating.Rounding,Doubling,
High,Long,Wide,Narrow,Upper,Lower
RoundEven,RoundZero,RoundPos,RoundNeg,RoundAway - For example
SQDMULHSigned saturating doubling multiply returning high halfwould naturally becomeSaturatingDoublingMultiplyHigh. and would be the proposed intrinsic name
Argument conventions
- Binary operators will take
leftandrightarguments - Unary operators will take a
valueargument - Instruction which insert into high half, will take a source operand which is the
targetregister to be inserted into. This will typically be the first argument. The Method name will typically have a suffixUpper - Instruction with adding or subtracting accumulators will take a source operand which is the
accregister. This will be the left operand in the add/subtract. - Argument order will typically be in left to right order following ARM assembly conventions. Exception can and will occur. Especially when copying a X86 C# API.
Lowering/Containment
- Whenever an intrinsic can easily be expressed through containment without loss it should be dropped
- If there are intermediate truncation/rounding/overflow issues, this rejects containment as identical results can not be guaranteed.
- By element forms will typically be exposed through containment.
Scope/state of instructions/intrinsics
Outline follows ARMv8 ARM reference Manual C3. A64 Instruction Set Overview with focus on SIMD
If intrinsic design looks straight forward, no comments are shown.
Outline is exhaustive to allow for discussion.
- Load/Store scalar SIMD
- Load/Store scalar SIMD API Proposal : Arm64 Load & Store #24771
- Load/Store scalar SIMD register pair
Recommendation:
ValueTuple<Vector64<A>, Vector64<B>> LoadVector64Pair<A,B>(void * address)
void Store<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>) - Load/Store scalar SIMD register Non-temporal pair
Recommendation:
ValueTuple<Vector64<A>, Vector64<B>> LoadVector64NonTemporalPair<A,B>(void * address)
void StoreNonTemporal<A,B>(void * address, ValueTuple<Vector64<A>, Vector64<B>>)
- Load/Store Vector
- Load/Store structures (multiple structures)
Recommendation:
Vector64<A> LoadVector64<A>(void * address, Vector64<A> target)
ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target)
void Store<A>(void * address, Vector64<A> target)
void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target) - Load/Store structures (single structures)
Recommendation:
Vector64<A> LoadVector64<A>(void * address, Vector64<A> target, byte index)
ValueTuple<Vector64<A>, ... > LoadVector64Tuple<A,B,C,D>(void * address, ValueTuple<...> target, byte index)
void Store<A>(void * address, Vector64<A> target, byte index)
void Store<A,B,C,D>(void * address, ValueTuple<Vector...> target, byte index) - Load single structure and replicate
Recommendation:
Vector64<A> LoadAllVector64<A>(void * address)
ValueTuple<Vector64<A>, ... Vector64<D>> LoadAllVector64Tuple<A,B,C,D>(void * address)
- Load/Store structures (multiple structures)
- Floating-point conversion
- convert to floating-point
Recommendation:
Vector64<float> ConvertToVector64Single(Vector64<int> a)
Vector128<doulble> ConvertToVector128Double(Vector128<ulong> a)
- convert to floating-point
- SIMD move
- SIMD arithmetic
- Partial see API Proposal : ARM64 Simd simple ops #24584 for basic ops
- Rest mostly simple application of naming conventions above
- SIMD compare
- SIMD widening and narrowing arithmetic
- SIMD unary arithmetic
UseReverseElementBitsforREV
UseReverseElementBytesforREV16,REV32,REV64(separate names would make implementation slightly simpler.) - SIMD by element arithmetic
Whenever possible treat the element as the base type & contain theExtractelement intrinsic - SIMD permute
- SIMD immediate
Handle these when feasible by containment/lowering - SIMD shift (immediate)
- SIMD floating-point and integer conversion
ConvertTo*i.e.ConvertToSingleRoundNearest - SIMD reduce (across vector lanes)
Use*Acrossper ARM convention(orHorizontal*per X86 convention.) - SIMD pairwise arithmetic
*Pairwise - SIMD table lookup