Skip to content

Conversation

@amanasifkhalid
Copy link
Contributor

Part of #94549. Implements the following encodings:

  • If_SVE_FE_3A
  • If_SVE_FE_3B
  • If_SVE_FG_3A
  • If_SVE_FG_3B
  • If_SVE_FH_3A
  • If_SVE_FH_3B
  • If_SVE_FI_3A
  • If_SVE_FI_3B
  • If_SVE_FI_3C
  • If_SVE_FJ_3A
  • If_SVE_FJ_3B

cstool output:

smullb        z0.s, z1.h, z0.h[0]
smullb        z2.s, z3.h, z1.h[1]
smullt        z4.s, z5.h, z2.h[2]
smullt        z6.s, z7.h, z3.h[3]
umullb        z8.s, z9.h, z4.h[4]
umullb        z10.s, z11.h, z5.h[5]
umullt        z12.s, z13.h, z6.h[6]
umullt        z14.s, z15.h, z7.h[7]
smullb        z0.d, z1.s, z0.s[0]
smullb        z2.d, z3.s, z2.s[1]
smullt        z4.d, z5.s, z4.s[2]
smullt        z6.d, z7.s, z6.s[3]
umullb        z8.d, z9.s, z8.s[0]
umullb        z10.d, z11.s, z10.s[1]
umullt        z12.d, z13.s, z12.s[2]
umullt        z14.d, z15.s, z14.s[3]
smlalb        z0.s, z1.h, z0.h[0]
smlalt        z2.s, z3.h, z1.h[1]
smlslb        z4.s, z5.h, z2.h[2]
smlslt        z6.s, z7.h, z3.h[3]
umlalb        z8.s, z9.h, z4.h[4]
umlalt        z10.s, z11.h, z5.h[5]
umlslb        z12.s, z13.h, z6.h[6]
umlslt        z14.s, z15.h, z7.h[7]
smlalb        z0.d, z1.s, z0.s[0]
smlalt        z2.d, z3.s, z2.s[1]
smlslb        z4.d, z5.s, z4.s[2]
smlslt        z6.d, z7.s, z6.s[3]
umlalb        z8.d, z9.s, z8.s[0]
umlalt        z10.d, z11.s, z10.s[1]
umlslb        z12.d, z13.s, z12.s[2]
umlslt        z14.d, z15.s, z14.s[3]
sqdmullb      z0.s, z2.h, z1.h[1]
sqdmullb      z4.s, z6.h, z3.h[3]
sqdmullt      z8.s, z10.h, z5.h[5]
sqdmullt      z12.s, z14.h, z7.h[7]
sqdmullb      z0.d, z2.s, z0.s[0]
sqdmullb      z4.d, z6.s, z5.s[1]
sqdmullt      z8.d, z10.s, z10.s[2]
sqdmullt      z12.d, z14.s, z15.s[3]
sqdmulh       z0.h, z1.h, z1.h[1]
sqdmulh       z2.h, z3.h, z3.h[3]
sqrdmulh      z4.h, z5.h, z5.h[5]
sqrdmulh      z6.h, z7.h, z7.h[7]
sqdmulh       z8.s, z9.s, z0.s[0]
sqdmulh       z10.s, z11.s, z2.s[1]
sqrdmulh      z12.s, z13.s, z4.s[2]
sqrdmulh      z14.s, z15.s, z6.s[3]
sqdmulh       z16.d, z17.d, z0.d[0]
sqdmulh       z18.d, z19.d, z5.d[1]
sqrdmulh      z20.d, z21.d, z10.d[0]
sqrdmulh      z22.d, z23.d, z15.d[1]
sqdmlalb      z0.s, z1.h, z1.h[1]
sqdmlalt      z2.s, z3.h, z3.h[3]
sqdmlslb      z4.s, z5.h, z5.h[5]
sqdmlslt      z6.s, z0.h, z7.h[7]
sqdmlalb      z8.d, z9.s, z0.s[0]
sqdmlalt      z10.d, z11.s, z5.s[1]
sqdmlslb      z12.d, z13.s, z10.s[2]
sqdmlslt      z14.d, z15.s, z15.s[3]

JitDisasm output:

smullb  z0.s, z1.h, z0.h[0]
smullb  z2.s, z3.h, z1.h[1]
smullt  z4.s, z5.h, z2.h[2]
smullt  z6.s, z7.h, z3.h[3]
umullb  z8.s, z9.h, z4.h[4]
umullb  z10.s, z11.h, z5.h[5]
umullt  z12.s, z13.h, z6.h[6]
umullt  z14.s, z15.h, z7.h[7]
smullb  z0.d, z1.s, z0.s[0]
smullb  z2.d, z3.s, z2.s[1]
smullt  z4.d, z5.s, z4.s[2]
smullt  z6.d, z7.s, z6.s[3]
umullb  z8.d, z9.s, z8.s[0]
umullb  z10.d, z11.s, z10.s[1]
umullt  z12.d, z13.s, z12.s[2]
umullt  z14.d, z15.s, z14.s[3]
smlalb  z0.s, z1.h, z0.h[0]
smlalt  z2.s, z3.h, z1.h[1]
smlslb  z4.s, z5.h, z2.h[2]
smlslt  z6.s, z7.h, z3.h[3]
umlalb  z8.s, z9.h, z4.h[4]
umlalt  z10.s, z11.h, z5.h[5]
umlslb  z12.s, z13.h, z6.h[6]
umlslt  z14.s, z15.h, z7.h[7]
smlalb  z0.d, z1.s, z0.s[0]
smlalt  z2.d, z3.s, z2.s[1]
smlslb  z4.d, z5.s, z4.s[2]
smlslt  z6.d, z7.s, z6.s[3]
umlalb  z8.d, z9.s, z8.s[0]
umlalt  z10.d, z11.s, z10.s[1]
umlslb  z12.d, z13.s, z12.s[2]
umlslt  z14.d, z15.s, z14.s[3]
sqdmullb z0.s, z2.h, z1.h[1]
sqdmullb z4.s, z6.h, z3.h[3]
sqdmullt z8.s, z10.h, z5.h[5]
sqdmullt z12.s, z14.h, z7.h[7]
sqdmullb z0.d, z2.s, z0.s[0]
sqdmullb z4.d, z6.s, z5.s[1]
sqdmullt z8.d, z10.s, z10.s[2]
sqdmullt z12.d, z14.s, z15.s[3]
sqdmulh z0.h, z1.h, z1.h[1]
sqdmulh z2.h, z3.h, z3.h[3]
sqrdmulh z4.h, z5.h, z5.h[5]
sqrdmulh z6.h, z7.h, z7.h[7]
sqdmulh z8.s, z9.s, z0.s[0]
sqdmulh z10.s, z11.s, z2.s[1]
sqrdmulh z12.s, z13.s, z4.s[2]
sqrdmulh z14.s, z15.s, z6.s[3]
sqdmulh z16.d, z17.d, z0.d[0]
sqdmulh z18.d, z19.d, z5.d[1]
sqrdmulh z20.d, z21.d, z10.d[0]
sqrdmulh z22.d, z23.d, z15.d[1]
sqdmlalb z0.s, z1.h, z1.h[1]
sqdmlalt z2.s, z3.h, z3.h[3]
sqdmlslb z4.s, z5.h, z5.h[5]
sqdmlslt z6.s, z0.h, z7.h[7]
sqdmlalb z8.d, z9.s, z0.s[0]
sqdmlalt z10.d, z11.s, z5.s[1]
sqdmlslb z12.d, z13.s, z10.s[2]
sqdmlslt z14.d, z15.s, z15.s[3]

cc @dotnet/arm64-contrib.

@ghost ghost assigned amanasifkhalid Feb 8, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2024
@ghost
Copy link

ghost commented Feb 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

  • If_SVE_FE_3A
  • If_SVE_FE_3B
  • If_SVE_FG_3A
  • If_SVE_FG_3B
  • If_SVE_FH_3A
  • If_SVE_FH_3B
  • If_SVE_FI_3A
  • If_SVE_FI_3B
  • If_SVE_FI_3C
  • If_SVE_FJ_3A
  • If_SVE_FJ_3B

cstool output:

smullb        z0.s, z1.h, z0.h[0]
smullb        z2.s, z3.h, z1.h[1]
smullt        z4.s, z5.h, z2.h[2]
smullt        z6.s, z7.h, z3.h[3]
umullb        z8.s, z9.h, z4.h[4]
umullb        z10.s, z11.h, z5.h[5]
umullt        z12.s, z13.h, z6.h[6]
umullt        z14.s, z15.h, z7.h[7]
smullb        z0.d, z1.s, z0.s[0]
smullb        z2.d, z3.s, z2.s[1]
smullt        z4.d, z5.s, z4.s[2]
smullt        z6.d, z7.s, z6.s[3]
umullb        z8.d, z9.s, z8.s[0]
umullb        z10.d, z11.s, z10.s[1]
umullt        z12.d, z13.s, z12.s[2]
umullt        z14.d, z15.s, z14.s[3]
smlalb        z0.s, z1.h, z0.h[0]
smlalt        z2.s, z3.h, z1.h[1]
smlslb        z4.s, z5.h, z2.h[2]
smlslt        z6.s, z7.h, z3.h[3]
umlalb        z8.s, z9.h, z4.h[4]
umlalt        z10.s, z11.h, z5.h[5]
umlslb        z12.s, z13.h, z6.h[6]
umlslt        z14.s, z15.h, z7.h[7]
smlalb        z0.d, z1.s, z0.s[0]
smlalt        z2.d, z3.s, z2.s[1]
smlslb        z4.d, z5.s, z4.s[2]
smlslt        z6.d, z7.s, z6.s[3]
umlalb        z8.d, z9.s, z8.s[0]
umlalt        z10.d, z11.s, z10.s[1]
umlslb        z12.d, z13.s, z12.s[2]
umlslt        z14.d, z15.s, z14.s[3]
sqdmullb      z0.s, z2.h, z1.h[1]
sqdmullb      z4.s, z6.h, z3.h[3]
sqdmullt      z8.s, z10.h, z5.h[5]
sqdmullt      z12.s, z14.h, z7.h[7]
sqdmullb      z0.d, z2.s, z0.s[0]
sqdmullb      z4.d, z6.s, z5.s[1]
sqdmullt      z8.d, z10.s, z10.s[2]
sqdmullt      z12.d, z14.s, z15.s[3]
sqdmulh       z0.h, z1.h, z1.h[1]
sqdmulh       z2.h, z3.h, z3.h[3]
sqrdmulh      z4.h, z5.h, z5.h[5]
sqrdmulh      z6.h, z7.h, z7.h[7]
sqdmulh       z8.s, z9.s, z0.s[0]
sqdmulh       z10.s, z11.s, z2.s[1]
sqrdmulh      z12.s, z13.s, z4.s[2]
sqrdmulh      z14.s, z15.s, z6.s[3]
sqdmulh       z16.d, z17.d, z0.d[0]
sqdmulh       z18.d, z19.d, z5.d[1]
sqrdmulh      z20.d, z21.d, z10.d[0]
sqrdmulh      z22.d, z23.d, z15.d[1]
sqdmlalb      z0.s, z1.h, z1.h[1]
sqdmlalt      z2.s, z3.h, z3.h[3]
sqdmlslb      z4.s, z5.h, z5.h[5]
sqdmlslt      z6.s, z0.h, z7.h[7]
sqdmlalb      z8.d, z9.s, z0.s[0]
sqdmlalt      z10.d, z11.s, z5.s[1]
sqdmlslb      z12.d, z13.s, z10.s[2]
sqdmlslt      z14.d, z15.s, z15.s[3]

JitDisasm output:

smullb  z0.s, z1.h, z0.h[0]
smullb  z2.s, z3.h, z1.h[1]
smullt  z4.s, z5.h, z2.h[2]
smullt  z6.s, z7.h, z3.h[3]
umullb  z8.s, z9.h, z4.h[4]
umullb  z10.s, z11.h, z5.h[5]
umullt  z12.s, z13.h, z6.h[6]
umullt  z14.s, z15.h, z7.h[7]
smullb  z0.d, z1.s, z0.s[0]
smullb  z2.d, z3.s, z2.s[1]
smullt  z4.d, z5.s, z4.s[2]
smullt  z6.d, z7.s, z6.s[3]
umullb  z8.d, z9.s, z8.s[0]
umullb  z10.d, z11.s, z10.s[1]
umullt  z12.d, z13.s, z12.s[2]
umullt  z14.d, z15.s, z14.s[3]
smlalb  z0.s, z1.h, z0.h[0]
smlalt  z2.s, z3.h, z1.h[1]
smlslb  z4.s, z5.h, z2.h[2]
smlslt  z6.s, z7.h, z3.h[3]
umlalb  z8.s, z9.h, z4.h[4]
umlalt  z10.s, z11.h, z5.h[5]
umlslb  z12.s, z13.h, z6.h[6]
umlslt  z14.s, z15.h, z7.h[7]
smlalb  z0.d, z1.s, z0.s[0]
smlalt  z2.d, z3.s, z2.s[1]
smlslb  z4.d, z5.s, z4.s[2]
smlslt  z6.d, z7.s, z6.s[3]
umlalb  z8.d, z9.s, z8.s[0]
umlalt  z10.d, z11.s, z10.s[1]
umlslb  z12.d, z13.s, z12.s[2]
umlslt  z14.d, z15.s, z14.s[3]
sqdmullb z0.s, z2.h, z1.h[1]
sqdmullb z4.s, z6.h, z3.h[3]
sqdmullt z8.s, z10.h, z5.h[5]
sqdmullt z12.s, z14.h, z7.h[7]
sqdmullb z0.d, z2.s, z0.s[0]
sqdmullb z4.d, z6.s, z5.s[1]
sqdmullt z8.d, z10.s, z10.s[2]
sqdmullt z12.d, z14.s, z15.s[3]
sqdmulh z0.h, z1.h, z1.h[1]
sqdmulh z2.h, z3.h, z3.h[3]
sqrdmulh z4.h, z5.h, z5.h[5]
sqrdmulh z6.h, z7.h, z7.h[7]
sqdmulh z8.s, z9.s, z0.s[0]
sqdmulh z10.s, z11.s, z2.s[1]
sqrdmulh z12.s, z13.s, z4.s[2]
sqrdmulh z14.s, z15.s, z6.s[3]
sqdmulh z16.d, z17.d, z0.d[0]
sqdmulh z18.d, z19.d, z5.d[1]
sqrdmulh z20.d, z21.d, z10.d[0]
sqrdmulh z22.d, z23.d, z15.d[1]
sqdmlalb z0.s, z1.h, z1.h[1]
sqdmlalt z2.s, z3.h, z3.h[3]
sqdmlslb z4.s, z5.h, z5.h[5]
sqdmlslt z6.s, z0.h, z7.h[7]
sqdmlalb z8.d, z9.s, z0.s[0]
sqdmlalt z10.d, z11.s, z5.s[1]
sqdmlslb z12.d, z13.s, z10.s[2]
sqdmlslt z14.d, z15.s, z15.s[3]

cc @dotnet/arm64-contrib.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 8, 2024
@ryujit-bot
Copy link

Diff results for #98142

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryujit-bot
Copy link

Diff results for #98142

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%

Details here


Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@amanasifkhalid amanasifkhalid merged commit 87c1431 into dotnet:main Feb 8, 2024
@amanasifkhalid amanasifkhalid deleted the sve-smullb branch February 8, 2024 23:27
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants