Skip to content

Conversation

sdesmalen-arm
Copy link
Collaborator

PR #158641 introduced an issue where i128 accumulator types resulted
in a valid cost, because for a <2 x i128> type the code that
checks for unsupported type legalization would see a type action
of 'TypeSplitVector' which is supported, even though the legalised
type of <1 x i128> would require further scalarization.

This fixes #162009

@llvmbot
Copy link
Member

llvmbot commented Oct 6, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

PR #158641 introduced an issue where i128 accumulator types resulted
in a valid cost, because for a <2 x i128> type the code that
checks for unsupported type legalization would see a type action
of 'TypeSplitVector' which is supported, even though the legalised
type of <1 x i128> would require further scalarization.

This fixes #162009


Full diff: https://github.com/llvm/llvm-project/pull/162066.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+7-4)
  • (added) llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll (+79)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 50a8754d74075..bb3b9d5bf2c74 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5666,18 +5666,21 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
   VectorType *AccumVectorType =
       VectorType::get(AccumType, VF.divideCoefficientBy(Ratio));
   // We don't yet support all kinds of legalization.
-  auto TA = TLI->getTypeAction(AccumVectorType->getContext(),
-                               EVT::getEVT(AccumVectorType));
-  switch (TA) {
+  auto TC = TLI->getTypeConversion(AccumVectorType->getContext(),
+                             EVT::getEVT(AccumVectorType));
+  switch (TC.first) {
   default:
     return Invalid;
   case TargetLowering::TypeLegal:
   case TargetLowering::TypePromoteInteger:
   case TargetLowering::TypeSplitVector:
+    // The legalised type (e.g. after splitting) must be legal too.
+    if (TLI->getTypeAction(AccumVectorType->getContext(), TC.second) !=
+        TargetLowering::TypeLegal)
+      return Invalid;
     break;
   }
 
-  // Check what kind of type-legalisation happens.
   std::pair<InstructionCost, MVT> AccumLT =
       getTypeLegalizationCost(AccumVectorType);
   std::pair<InstructionCost, MVT> InputLT =
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll b/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll
new file mode 100644
index 0000000000000..6095b246ccdad
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/pr162009.ll
@@ -0,0 +1,79 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -enable-epilogue-vectorization=false -S < %s | FileCheck %s --check-prefixes=CHECK-NO-PARTIAL-REDUCTION
+
+target triple = "aarch64"
+
+define i128 @add_reduc_i32_i128_unsupported(ptr %a, ptr %b) "target-features"="+dotprod" {
+; CHECK-NO-PARTIAL-REDUCTION-LABEL: define i128 @add_reduc_i32_i128_unsupported(
+; CHECK-NO-PARTIAL-REDUCTION-SAME: ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:  [[ENTRY:.*:]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[VECTOR_PH]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[VECTOR_BODY]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i128> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP7:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP0:%.*]] = getelementptr i32, ptr [[A]], i64 [[INDEX]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP0]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP1:%.*]] = zext <4 x i32> [[WIDE_LOAD]] to <4 x i64>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP2:%.*]] = getelementptr i32, ptr [[B]], i64 [[INDEX]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP2]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP3:%.*]] = zext <4 x i32> [[WIDE_LOAD1]] to <4 x i64>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP4:%.*]] = mul nuw <4 x i64> [[TMP1]], [[TMP3]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP5:%.*]] = zext <4 x i64> [[TMP4]] to <4 x i128>
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP7]] = add <4 x i128> [[VEC_PHI]], [[TMP5]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4024
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[MIDDLE_BLOCK]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[TMP8:%.*]] = call i128 @llvm.vector.reduce.add.v4i128(<4 x i128> [[TMP7]])
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br label %[[SCALAR_PH:.*]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[SCALAR_PH]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[FOR_BODY]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[IV:%.*]] = phi i64 [ 4024, %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[ACCUM:%.*]] = phi i128 [ [[TMP8]], %[[SCALAR_PH]] ], [ [[ADD:%.*]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[GEP_A:%.*]] = getelementptr i32, ptr [[A]], i64 [[IV]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[LOAD_A:%.*]] = load i32, ptr [[GEP_A]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[EXT_A:%.*]] = zext i32 [[LOAD_A]] to i64
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[GEP_B:%.*]] = getelementptr i32, ptr [[B]], i64 [[IV]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[LOAD_B:%.*]] = load i32, ptr [[GEP_B]], align 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[EXT_B:%.*]] = zext i32 [[LOAD_B]] to i64
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[MUL:%.*]] = mul nuw i64 [[EXT_A]], [[EXT_B]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[MUL_ZEXT:%.*]] = zext i64 [[MUL]] to i128
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[ADD]] = add i128 [[ACCUM]], [[MUL_ZEXT]]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[IV_NEXT]] = add i64 [[IV]], 1
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[IV_NEXT]], 4025
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_EXIT:.*]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NO-PARTIAL-REDUCTION:       [[FOR_EXIT]]:
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    [[ADD_LCSSA:%.*]] = phi i128 [ [[ADD]], %[[FOR_BODY]] ]
+; CHECK-NO-PARTIAL-REDUCTION-NEXT:    ret i128 [[ADD_LCSSA]]
+;
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %accum = phi i128 [ 0, %entry ], [ %add, %for.body ]
+  %gep.a = getelementptr i32, ptr %a, i64 %iv
+  %load.a = load i32, ptr %gep.a, align 1
+  %ext.a = zext i32 %load.a to i64
+  %gep.b = getelementptr i32, ptr %b, i64 %iv
+  %load.b = load i32, ptr %gep.b, align 1
+  %ext.b = zext i32 %load.b to i64
+  %mul = mul nuw i64 %ext.a, %ext.b
+  %mul.zext = zext i64 %mul to i128
+  %add = add i128 %accum, %mul.zext
+  %iv.next = add i64 %iv, 1
+  %exitcond.not = icmp eq i64 %iv.next, 4025
+  br i1 %exitcond.not, label %for.exit, label %for.body
+
+for.exit:
+  ret i128 %add
+}
+;.
+; CHECK-NO-PARTIAL-REDUCTION: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; CHECK-NO-PARTIAL-REDUCTION: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}
+; CHECK-NO-PARTIAL-REDUCTION: [[META2]] = !{!"llvm.loop.unroll.runtime.disable"}
+; CHECK-NO-PARTIAL-REDUCTION: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+;.

Copy link

github-actions bot commented Oct 6, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

PR llvm#158641 introduced an issue where i128 accumulator types resulted
in a valid cost, because for a <2 x i128> type the code that
checks for unsupported type legalization would see a type action
of 'TypeSplitVector' which is supported, even though the legalised
type of <1 x i128> would require further scalarization.
@sdesmalen-arm sdesmalen-arm force-pushed the fix-tti-partial-reduce-cost-i128 branch from 76d4a50 to 6d1b7c1 Compare October 6, 2025 10:59
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] fatal error: error in backend: Do not know how to scalarize the result of this operator!
3 participants