Skip to content

Conversation

@amanasifkhalid
Copy link
Contributor

Part of #94549. Adds the following encodings:

  • IF_SVE_GU_3A
  • IF_SVE_GU_3B
  • IF_SVE_GU_3C
  • IF_SVE_GX_3A
  • IF_SVE_GX_3B
  • IF_SVE_GX_3C
  • IF_SVE_FF_3A
  • IF_SVE_FF_3B
  • IF_SVE_FF_3C
  • IF_SVE_GY_3B

cstool output:

fmla  z0.s, z2.s, z1.s[0]
fmla  z4.s, z6.s, z3.s[1]
fmls  z8.s, z10.s, z5.s[2]
fmls  z12.s, z14.s, z7.s[3]
fmla  z1.d, z0.d, z0.d[0]
fmla  z3.d, z2.d, z5.d[1]
fmls  z5.d, z4.d, z10.d[0]
fmls  z7.d, z6.d, z15.d[1]
bfmla z1.h, z2.h, z0.h[0]
bfmla z3.h, z4.h, z2.h[2]
bfmls z5.h, z6.h, z4.h[5]
bfmls z7.h, z8.h, z7.h[7]
fmul  z0.s, z2.s, z1.s[0]
fmul  z4.s, z6.s, z3.s[1]
fmul  z8.s, z10.s, z5.s[2]
fmul  z12.s, z14.s, z7.s[3]
fmul  z1.d, z0.d, z0.d[0]
fmul  z3.d, z2.d, z5.d[1]
fmul  z5.d, z4.d, z10.d[0]
fmul  z7.d, z6.d, z15.d[1]
bfmul z1.h, z2.h, z0.h[0]
bfmul z3.h, z4.h, z2.h[2]
bfmul z5.h, z6.h, z4.h[5]
bfmul z7.h, z8.h, z7.h[7]
fdot  z0.s, z2.h, z1.h[0]
fdot  z4.s, z6.h, z3.h[1]
bfdot z8.s, z10.h, z5.h[2]
bfdot z12.s, z14.h, z7.h[3]
mla   z0.h, z1.h, z1.h[1]
mla   z2.h, z3.h, z3.h[3]
mls   z4.h, z5.h, z5.h[5]
mls   z6.h, z7.h, z7.h[7]
mla   z8.s, z9.s, z1.s[0]
mla   z10.s, z11.s, z3.s[1]
mls   z12.s, z13.s, z5.s[2]
mls   z14.s, z15.s, z7.s[3]
mla   z16.d, z17.d, z0.d[0]
mla   z18.d, z19.d, z5.d[1]
mls   z20.d, z21.d, z10.d[0]
mls   z22.d, z23.d, z15.d[1]

JitDisasm output:

fmla    z0.s, z2.s, z1.s[0]
fmla    z4.s, z6.s, z3.s[1]
fmls    z8.s, z10.s, z5.s[2]
fmls    z12.s, z14.s, z7.s[3]
fmla    z1.d, z0.d, z0.d[0]
fmla    z3.d, z2.d, z5.d[1]
fmls    z5.d, z4.d, z10.d[0]
fmls    z7.d, z6.d, z15.d[1]
bfmla   z1.h, z2.h, z0.h[0]
bfmla   z3.h, z4.h, z2.h[2]
bfmls   z5.h, z6.h, z4.h[5]
bfmls   z7.h, z8.h, z7.h[7]
fmul    z0.s, z2.s, z1.s[0]
fmul    z4.s, z6.s, z3.s[1]
fmul    z8.s, z10.s, z5.s[2]
fmul    z12.s, z14.s, z7.s[3]
fmul    z1.d, z0.d, z0.d[0]
fmul    z3.d, z2.d, z5.d[1]
fmul    z5.d, z4.d, z10.d[0]
fmul    z7.d, z6.d, z15.d[1]
bfmul   z1.h, z2.h, z0.h[0]
bfmul   z3.h, z4.h, z2.h[2]
bfmul   z5.h, z6.h, z4.h[5]
bfmul   z7.h, z8.h, z7.h[7]
fdot    z0.s, z2.h, z1.h[0]
fdot    z4.s, z6.h, z3.h[1]
bfdot   z8.s, z10.h, z5.h[2]
bfdot   z12.s, z14.h, z7.h[3]
mla     z0.h, z1.h, z1.h[1]
mla     z2.h, z3.h, z3.h[3]
mls     z4.h, z5.h, z5.h[5]
mls     z6.h, z7.h, z7.h[7]
mla     z8.s, z9.s, z1.s[0]
mla     z10.s, z11.s, z3.s[1]
mls     z12.s, z13.s, z5.s[2]
mls     z14.s, z15.s, z7.s[3]
mla     z16.d, z17.d, z0.d[0]
mla     z18.d, z19.d, z5.d[1]
mls     z20.d, z21.d, z10.d[0]
mls     z22.d, z23.d, z15.d[1]

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

cc @dotnet/arm64-contrib

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 7, 2024
@ghost ghost assigned amanasifkhalid Feb 7, 2024
@ghost
Copy link

ghost commented Feb 7, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Adds the following encodings:

  • IF_SVE_GU_3A
  • IF_SVE_GU_3B
  • IF_SVE_GU_3C
  • IF_SVE_GX_3A
  • IF_SVE_GX_3B
  • IF_SVE_GX_3C
  • IF_SVE_FF_3A
  • IF_SVE_FF_3B
  • IF_SVE_FF_3C
  • IF_SVE_GY_3B

cstool output:

fmla  z0.s, z2.s, z1.s[0]
fmla  z4.s, z6.s, z3.s[1]
fmls  z8.s, z10.s, z5.s[2]
fmls  z12.s, z14.s, z7.s[3]
fmla  z1.d, z0.d, z0.d[0]
fmla  z3.d, z2.d, z5.d[1]
fmls  z5.d, z4.d, z10.d[0]
fmls  z7.d, z6.d, z15.d[1]
bfmla z1.h, z2.h, z0.h[0]
bfmla z3.h, z4.h, z2.h[2]
bfmls z5.h, z6.h, z4.h[5]
bfmls z7.h, z8.h, z7.h[7]
fmul  z0.s, z2.s, z1.s[0]
fmul  z4.s, z6.s, z3.s[1]
fmul  z8.s, z10.s, z5.s[2]
fmul  z12.s, z14.s, z7.s[3]
fmul  z1.d, z0.d, z0.d[0]
fmul  z3.d, z2.d, z5.d[1]
fmul  z5.d, z4.d, z10.d[0]
fmul  z7.d, z6.d, z15.d[1]
bfmul z1.h, z2.h, z0.h[0]
bfmul z3.h, z4.h, z2.h[2]
bfmul z5.h, z6.h, z4.h[5]
bfmul z7.h, z8.h, z7.h[7]
fdot  z0.s, z2.h, z1.h[0]
fdot  z4.s, z6.h, z3.h[1]
bfdot z8.s, z10.h, z5.h[2]
bfdot z12.s, z14.h, z7.h[3]
mla   z0.h, z1.h, z1.h[1]
mla   z2.h, z3.h, z3.h[3]
mls   z4.h, z5.h, z5.h[5]
mls   z6.h, z7.h, z7.h[7]
mla   z8.s, z9.s, z1.s[0]
mla   z10.s, z11.s, z3.s[1]
mls   z12.s, z13.s, z5.s[2]
mls   z14.s, z15.s, z7.s[3]
mla   z16.d, z17.d, z0.d[0]
mla   z18.d, z19.d, z5.d[1]
mls   z20.d, z21.d, z10.d[0]
mls   z22.d, z23.d, z15.d[1]

JitDisasm output:

fmla    z0.s, z2.s, z1.s[0]
fmla    z4.s, z6.s, z3.s[1]
fmls    z8.s, z10.s, z5.s[2]
fmls    z12.s, z14.s, z7.s[3]
fmla    z1.d, z0.d, z0.d[0]
fmla    z3.d, z2.d, z5.d[1]
fmls    z5.d, z4.d, z10.d[0]
fmls    z7.d, z6.d, z15.d[1]
bfmla   z1.h, z2.h, z0.h[0]
bfmla   z3.h, z4.h, z2.h[2]
bfmls   z5.h, z6.h, z4.h[5]
bfmls   z7.h, z8.h, z7.h[7]
fmul    z0.s, z2.s, z1.s[0]
fmul    z4.s, z6.s, z3.s[1]
fmul    z8.s, z10.s, z5.s[2]
fmul    z12.s, z14.s, z7.s[3]
fmul    z1.d, z0.d, z0.d[0]
fmul    z3.d, z2.d, z5.d[1]
fmul    z5.d, z4.d, z10.d[0]
fmul    z7.d, z6.d, z15.d[1]
bfmul   z1.h, z2.h, z0.h[0]
bfmul   z3.h, z4.h, z2.h[2]
bfmul   z5.h, z6.h, z4.h[5]
bfmul   z7.h, z8.h, z7.h[7]
fdot    z0.s, z2.h, z1.h[0]
fdot    z4.s, z6.h, z3.h[1]
bfdot   z8.s, z10.h, z5.h[2]
bfdot   z12.s, z14.h, z7.h[3]
mla     z0.h, z1.h, z1.h[1]
mla     z2.h, z3.h, z3.h[3]
mls     z4.h, z5.h, z5.h[5]
mls     z6.h, z7.h, z7.h[7]
mla     z8.s, z9.s, z1.s[0]
mla     z10.s, z11.s, z3.s[1]
mls     z12.s, z13.s, z5.s[2]
mls     z14.s, z15.s, z7.s[3]
mla     z16.d, z17.d, z0.d[0]
mla     z18.d, z19.d, z5.d[1]
mls     z20.d, z21.d, z10.d[0]
mls     z22.d, z23.d, z15.d[1]

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

cc @dotnet/arm64-contrib

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 7, 2024
@amanasifkhalid amanasifkhalid added this to the 9.0.0 milestone Feb 7, 2024
Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my point of view

@ryujit-bot
Copy link

Diff results for #98136

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very happy to see lots of code reuse happening.

@a74nh
Copy link
Contributor

a74nh commented Feb 8, 2024

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

Try https://docsmirror.github.io/A64/2023-09/sveindex.html - note it's the 2023-09 release instead of 2023-06. There have been a few new instructions added for FEAT_FP8DOT2 and FEAT_FP8DOT4.

I'm not surprised these are not supported in cstool yet.

There were some other instructions not in cstool. For those the encodings were fairly straightforward, so I added the code and then ifdefed out the tests using ALL_ARM64_EMITTER_UNIT_TESTS_SVE_UNSUPPORTED. Plus, I added a unreached() in the emitIns_R_R_*() function. When supported is added in cstool it'll be fairly quick to test (and it's easier to write the code now as it's in our heads).

Interestingly, there is now a 2023-12 release. But there's nothing on docsmirror yet and the autogenerated stuff is based on 2023-09, so we'll stick with that and should aim to get all of 2023-09 support in. But, it'll be a few years before anything very recent gets into real hardware.

@amanasifkhalid
Copy link
Contributor Author

@a74nh thanks for the updated docs link. I'll follow your lead with adding and disabling those encodings in a follow-up PR.

Thank you both for the reviews!

@amanasifkhalid
Copy link
Contributor Author

Failures are known.

@amanasifkhalid amanasifkhalid merged commit 247f8cd into dotnet:main Feb 8, 2024
@amanasifkhalid amanasifkhalid deleted the sve-mul branch February 8, 2024 15:49
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants