-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AArch64] Return Invalid partial reduction cost for i128 accumulator. #162066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AArch64] Return Invalid partial reduction cost for i128 accumulator. #162066
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesPR #158641 introduced an issue where i128 accumulator types resulted This fixes #162009 Full diff: https://github.com/llvm/llvm-project/pull/162066.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 50a8754d74075..bb3b9d5bf2c74 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5666,18 +5666,21 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
VectorType *AccumVectorType =
VectorType::get(AccumType, VF.divideCoefficientBy(Ratio));
// We don't yet support all kinds of legalization.
- auto TA = TLI->getTypeAction(AccumVectorType->getContext(),
- EVT::getEVT(AccumVectorType));
- switch (TA) {
+ auto TC = TLI->getTypeConversion(AccumVectorType->getContext(),
+ EVT::getEVT(AccumVectorType));
+ switch (TC.first) {
default:
return Invalid;
case TargetLowering::TypeLegal:
case TargetLowering::TypePromoteInteger:
case TargetLowering::TypeSplitVector:
+ // The legalised type (e.g. after splitting) must be legal too.
+ if (TLI->getTypeAction(AccumVectorType->getContext(), TC.second) !=
+ TargetLowering::TypeLegal)
+ return Invalid;
break;
}
- // Check what kind of type-legalisation happens.
std::pair<InstructionCost, MVT> AccumLT =
getTypeLegalizationCost(AccumVectorType);
std::pair<InstructionCost, MVT> InputLT =
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll
new file mode 100644
index 0000000000000..6095b246ccdad
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll
@@ -0,0 +1,79 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s | FileCheck %s --check-prefixes=CHECK-NO-PARTIAL-REDUCTION
+
+target triple = "aarch64"
+
+define i128 @add_reduc_i32_i128_unsupported(ptr %a, ptr %b) "target-features"="+dotprod" {
+; CHECK-NO-PARTIAL-REDUCTION-LABEL: define i128 @add_reduc_i32_i128_unsupported(
+; CHECK-NO-PARTIAL-REDUCTION-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[ENTRY:.*:]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK-NO-PARTIAL-REDUCTION: [[VECTOR_PH]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK-NO-PARTIAL-REDUCTION: [[VECTOR_BODY]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[VEC_PHI:%.*]] = phi <4 x i128> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP7:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP0:%.*]] = getelementptr i32, ptr [[A]], i64 [[INDEX]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP0]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP1:%.*]] = zext <4 x i32> [[WIDE_LOAD]] to <4 x i64>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP2:%.*]] = getelementptr i32, ptr [[B]], i64 [[INDEX]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP2]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP3:%.*]] = zext <4 x i32> [[WIDE_LOAD1]] to <4 x i64>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP4:%.*]] = mul nuw <4 x i64> [[TMP1]], [[TMP3]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP5:%.*]] = zext <4 x i64> [[TMP4]] to <4 x i128>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP7]] = add <4 x i128> [[VEC_PHI]], [[TMP5]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4024
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NO-PARTIAL-REDUCTION: [[MIDDLE_BLOCK]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[TMP8:%.*]] = call i128 @llvm.vector.reduce.add.v4i128(<4 x i128> [[TMP7]])
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br label %[[SCALAR_PH:.*]]
+; CHECK-NO-PARTIAL-REDUCTION: [[SCALAR_PH]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br label %[[FOR_BODY:.*]]
+; CHECK-NO-PARTIAL-REDUCTION: [[FOR_BODY]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[IV:%.*]] = phi i64 [ 4024, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[ACCUM:%.*]] = phi i128 [ [[TMP8]], %[[SCALAR_PH]] ], [ [[ADD:%.*]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[GEP_A:%.*]] = getelementptr i32, ptr [[A]], i64 [[IV]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[LOAD_A:%.*]] = load i32, ptr [[GEP_A]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[EXT_A:%.*]] = zext i32 [[LOAD_A]] to i64
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[GEP_B:%.*]] = getelementptr i32, ptr [[B]], i64 [[IV]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[LOAD_B:%.*]] = load i32, ptr [[GEP_B]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[EXT_B:%.*]] = zext i32 [[LOAD_B]] to i64
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[MUL:%.*]] = mul nuw i64 [[EXT_A]], [[EXT_B]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[MUL_ZEXT:%.*]] = zext i64 [[MUL]] to i128
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[ADD]] = add i128 [[ACCUM]], [[MUL_ZEXT]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 4025
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: br i1 [[EXITCOND_NOT]], label %[[FOR_EXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NO-PARTIAL-REDUCTION: [[FOR_EXIT]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: [[ADD_LCSSA:%.*]] = phi i128 [ [[ADD]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT: ret i128 [[ADD_LCSSA]]
+;
+entry:
+ br label %for.body
+
+for.body:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+ %accum = phi i128 [ 0, %entry ], [ %add, %for.body ]
+ %gep.a = getelementptr i32, ptr %a, i64 %iv
+ %load.a = load i32, ptr %gep.a, align 1
+ %ext.a = zext i32 %load.a to i64
+ %gep.b = getelementptr i32, ptr %b, i64 %iv
+ %load.b = load i32, ptr %gep.b, align 1
+ %ext.b = zext i32 %load.b to i64
+ %mul = mul nuw i64 %ext.a, %ext.b
+ %mul.zext = zext i64 %mul to i128
+ %add = add i128 %accum, %mul.zext
+ %iv.next = add i64 %iv, 1
+ %exitcond.not = icmp eq i64 %iv.next, 4025
+ br i1 %exitcond.not, label %for.exit, label %for.body
+
+for.exit:
+ ret i128 %add
+}
+;.
+; CHECK-NO-PARTIAL-REDUCTION: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK-NO-PARTIAL-REDUCTION: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-NO-PARTIAL-REDUCTION: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-NO-PARTIAL-REDUCTION: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
PR llvm#158641 introduced an issue where i128 accumulator types resulted in a valid cost, because for a <2 x i128> type the code that checks for unsupported type legalization would see a type action of 'TypeSplitVector' which is supported, even though the legalised type of <1 x i128> would require further scalarization.
76d4a50
to
6d1b7c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
PR #158641 introduced an issue where i128 accumulator types resulted
in a valid cost, because for a <2 x i128> type the code that
checks for unsupported type legalization would see a type action
of 'TypeSplitVector' which is supported, even though the legalised
type of <1 x i128> would require further scalarization.
This fixes #162009