Skip to content

Conversation

@MacDue
Copy link
Member

@MacDue MacDue commented Jul 23, 2024

Somewhat confusingly a SCEVMulExpr is a SCEVNAryExpr, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the SCEVMulExpr, so would ignore any operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a vscale * vscale multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first 16 * vscale * vscale (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by #4, mul vl, which is an offset of 16 * vscale (* 4 bytes).

@llvmbot
Copy link
Member

llvmbot commented Jul 23, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Benjamin Maxwell (MacDue)

Changes

Somewhat confusingly a SCEVMulExpr is a SCEVNAryExpr, so can have > 2 operands. Previously, the vscale immediate matching did not check the number of operands of the SCEVMulExpr, so would ignore any operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE (https://github.com/iree-org/iree), which sometimes addresses things that are a vscale * vscale multiple away. The test added with this change shows an example reduced from IREE. The second write should be offset from the first 16 * vscale * vscale (* 4 bytes), however, previously LSR dropped the second vscale and instead offset the write by #<!-- -->4, mul vl, which is an offset of 16 * vscale (* 4 bytes).


Full diff: https://github.com/llvm/llvm-project/pull/100080.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp (+2-1)
  • (modified) llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll (+51)
diff --git a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
index 11f9f7822a15c..3ba56a1a3af9d 100644
--- a/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
@@ -947,7 +947,8 @@ static Immediate ExtractImmediate(const SCEV *&S, ScalarEvolution &SE) {
                            SCEV::FlagAnyWrap);
     return Result;
   } else if (EnableVScaleImmediates)
-    if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(S))
+    if (const SCEVMulExpr *M = dyn_cast<SCEVMulExpr>(S);
+        M && M->getNumOperands() == 2)
       if (const SCEVConstant *C = dyn_cast<SCEVConstant>(M->getOperand(0)))
         if (isa<SCEVVScale>(M->getOperand(1))) {
           S = SE.getConstant(M->getType(), 0);
diff --git a/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll b/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
index 483955c1c57a0..588696d20227f 100644
--- a/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
+++ b/llvm/test/Transforms/LoopStrengthReduce/AArch64/vscale-fixups.ll
@@ -384,4 +384,55 @@ for.exit:
   ret void
 }
 
+;; Here are two writes that should be `16 * vscale * vscale` apart, so MUL VL
+;; addressing cannot be used to offset the second write, as for example,
+;; `#4, mul vl` would only be an offset of `16 * vscale` (dropping a vscale).
+define void @vscale_squared_offset(ptr %alloc) #0 {
+; COMMON-LABEL: vscale_squared_offset:
+; COMMON:       // %bb.0: // %entry
+; COMMON-NEXT:    rdvl x9, #1
+; COMMON-NEXT:    fmov z0.s, #4.00000000
+; COMMON-NEXT:    mov x8, xzr
+; COMMON-NEXT:    lsr x9, x9, #4
+; COMMON-NEXT:    fmov z1.s, #8.00000000
+; COMMON-NEXT:    cntw x10
+; COMMON-NEXT:    ptrue p0.s, vl1
+; COMMON-NEXT:    umull x9, w9, w9
+; COMMON-NEXT:    lsl x9, x9, #6
+; COMMON-NEXT:    cmp x8, x10
+; COMMON-NEXT:    b.ge .LBB6_2
+; COMMON-NEXT:  .LBB6_1: // %for.body
+; COMMON-NEXT:    // =>This Inner Loop Header: Depth=1
+; COMMON-NEXT:    add x11, x0, x9
+; COMMON-NEXT:    st1w { z0.s }, p0, [x0]
+; COMMON-NEXT:    add x8, x8, #1
+; COMMON-NEXT:    st1w { z1.s }, p0, [x11]
+; COMMON-NEXT:    addvl x0, x0, #1
+; COMMON-NEXT:    cmp x8, x10
+; COMMON-NEXT:    b.lt .LBB6_1
+; COMMON-NEXT:  .LBB6_2: // %for.exit
+; COMMON-NEXT:    ret
+entry:
+  %vscale = call i64 @llvm.vscale.i64()
+  %c4_vscale = mul i64 %vscale, 4
+  br label %for.check
+for.check:
+  %i = phi i64 [ %next_i, %for.body ], [ 0, %entry ]
+  %is_lt = icmp slt i64 %i, %c4_vscale
+  br i1 %is_lt, label %for.body, label %for.exit
+for.body:
+  %mask = call <vscale x 4 x i1> @llvm.aarch64.sve.whilelt.nxv4i1.i64(i64 0, i64 1)
+  %upper_offset = mul i64 %i, %c4_vscale
+  %upper_ptr = getelementptr float, ptr %alloc, i64 %upper_offset
+  call void @llvm.masked.store.nxv4f32.p0(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 4.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), ptr %upper_ptr, i32 4, <vscale x 4 x i1> %mask)
+  %lower_i = add i64 %i, %c4_vscale
+  %lower_offset = mul i64 %lower_i, %c4_vscale
+  %lower_ptr = getelementptr float, ptr %alloc, i64 %lower_offset
+  call void @llvm.masked.store.nxv4f32.p0(<vscale x 4 x float> shufflevector (<vscale x 4 x float> insertelement (<vscale x 4 x float> poison, float 8.000000e+00, i64 0), <vscale x 4 x float> poison, <vscale x 4 x i32> zeroinitializer), ptr %lower_ptr, i32 4, <vscale x 4 x i1> %mask)
+  %next_i = add i64 %i, 1
+  br label %for.check
+for.exit:
+  ret void
+}
+
 attributes #0 = { "target-features"="+sve2" vscale_range(1,16) }

@MacDue
Copy link
Member Author

MacDue commented Jul 23, 2024

Note: The pre-commit Precommit vscale-fixups.ll test shows the current (incorrect) codegen.

Copy link
Collaborator

@huntergr-arm huntergr-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

MacDue added a commit that referenced this pull request Jul 23, 2024
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`llvm#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).
@MacDue MacDue merged commit 7fad04e into llvm:main Jul 24, 2024
@MacDue MacDue deleted the addressing_fix branch July 24, 2024 09:06
@nikic nikic added this to the LLVM 19.X Release milestone Jul 24, 2024
@nikic
Copy link
Contributor

nikic commented Jul 24, 2024

/cherry-pick c1b70fa 7fad04e

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024
Precommit test for llvm#100080.

(cherry picked from commit c1b70fa)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).

(cherry picked from commit 7fad04e)
@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2024

/pull-request #100359

tru pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024
Precommit test for llvm#100080.

(cherry picked from commit c1b70fa)
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Jul 24, 2024
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).

(cherry picked from commit 7fad04e)
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary: Precommit test for #100080.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251062
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Summary:
Somewhat confusingly a `SCEVMulExpr` is a `SCEVNAryExpr`, so can have
> 2 operands. Previously, the vscale immediate matching did not check
the number of operands of the `SCEVMulExpr`, so would ignore any
operands after the first two.

This led to incorrect codegen (and results) for ArmSME in IREE
(https://github.com/iree-org/iree), which sometimes addresses things
that are a `vscale * vscale` multiple away. The test added with this
change shows an example reduced from IREE. The second write should
be offset from the first `16 * vscale * vscale` (* 4 bytes), however,
previously LSR dropped the second vscale and instead offset the write by
`#4, mul vl`, which is an offset of `16 * vscale` (* 4 bytes).

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250721
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

5 participants