[InstCombine] Strip leading zero indices from GEP #155415

nikic · 2025-08-26T13:37:41Z

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.

GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`. Canonicalize these to `gep %T, ptr %p, i64 %idx`. This enables transforms that only support one GEP index to work and improves CSE.

llvmbot · 2025-08-26T13:38:14Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: Nikita Popov (nikic)

Changes

GEPs are often in the form gep [N x %T], ptr %p, i64 0, i64 %idx. Canonicalize these to gep %T, ptr %p, i64 %idx.

This enables transforms that only support one GEP index to work and improves CSE.

Various transforms were recently hardened to make sure they still work without the leading index. #155404 is the last problematic transform I'm aware of.

Patch is 213.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155415.diff

56 Files Affected:

(modified) clang/test/CodeGen/attr-counted-by.c (+94-106)
(modified) clang/test/CodeGen/union-tbaa1.c (+3-3)
(modified) llvm/lib/Transforms/InstCombine/InstructionCombining.cpp (+13)
(modified) llvm/test/CodeGen/AMDGPU/vector-alloca-addrspacecast.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/2006-12-15-Range-Test.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/memcpy-from-constant.ll (+10-10)
(modified) llvm/test/Transforms/InstCombine/canonicalize-gep-constglob.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/cast.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/gep-addrspace.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/gep-alias.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/gep-vector.ll (+16-16)
(modified) llvm/test/Transforms/InstCombine/getelementptr.ll (+5-5)
(modified) llvm/test/Transforms/InstCombine/load-bitcast-select.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/load-cmp.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/loadstore-alignment.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/mem-gep-zidx.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memchr-11.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcmp-8.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/memcpy-addrspace.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/memcpy-from-global.ll (+11-11)
(modified) llvm/test/Transforms/InstCombine/opaque-ptr.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/pr58901.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/ptrtoint-nullgep.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/remove-loop-phi-multiply-by-zero.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/strcmp-3.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strlen-1.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/strlen-4.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/strlen-7.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/strlen-8.ll (+5-5)
(modified) llvm/test/Transforms/InstCombine/strlen-9.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/strnlen-3.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/strnlen-4.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/strnlen-5.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/vec_demanded_elts.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/wcslen-1.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/wcslen-3.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/wcslen-5.ll (+8-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+10-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+22-22)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86_fp80-vector-store.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+5-5)
(modified) llvm/test/Transforms/LoopVectorize/multiple-address-spaces.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/non-const-n.ll (+3-3)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+24-24)
(modified) llvm/test/Transforms/PhaseOrdering/AArch64/matrix-extract-insert.ll (+26-26)
(modified) llvm/test/Transforms/PhaseOrdering/X86/excessive-unrolling.ll (+6-6)
(modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions2.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/X86/merge-functions3.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/X86/simplifycfg-late.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/scev-custom-dl.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/single-iteration-loop-sroa.ll (+2-2)
(modified) llvm/test/Transforms/SimplifyCFG/Hexagon/switch-to-lookup-table.ll (+1-1)

diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 59e1b134850a9..cb23efdb8f263 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -72,7 +72,7 @@ struct anon_struct {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4:![0-9]+]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -81,7 +81,7 @@ struct anon_struct {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -90,7 +90,7 @@ struct anon_struct {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -99,7 +99,7 @@ struct anon_struct {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 [[VAL]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2:![0-9]+]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -120,7 +120,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP2]], 2
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
@@ -134,7 +134,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = tail call i32 @llvm.smax.i32(i32 [[COUNTED_BY_LOAD]], i32 0)
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV:%.*]] = shl i32 [[TMP0]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV]], ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -142,7 +142,7 @@ void test1(struct annotated *p, int index, int val) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -150,7 +150,7 @@ void test1(struct annotated *p, int index, int val) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -243,7 +243,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -251,7 +251,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -259,7 +259,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -267,7 +267,7 @@ size_t test2_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-SAME: ptr noundef [[P:%.*]], i64 noundef [[INDEX:%.*]]) local_unnamed_addr #[[ATTR0]] {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[INDEX]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[INDEX]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -350,7 +350,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 244
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP4:%.*]] = and i32 [[RESULT]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV2:%.*]] = select i1 [[TMP3]], i32 [[TMP4]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV2]], ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTNOT81:%.*]] = icmp eq i32 [[DOTCOUNTED_BY_LOAD]], 3
 // SANITIZE-WITH-ATTR-NEXT:    br i1 [[DOTNOT81]], label [[HANDLER_OUT_OF_BOUNDS18:%.*]], label [[CONT19:%.*]], !prof [[PROF8:![0-9]+]], !nosanitize [[META2]]
@@ -370,7 +370,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    [[RESULT25:%.*]] = add i32 [[FLEXIBLE_ARRAY_MEMBER_SIZE]], 240
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP7:%.*]] = and i32 [[RESULT25]], 252
 // SANITIZE-WITH-ATTR-NEXT:    [[CONV27:%.*]] = select i1 [[TMP6]], i32 [[TMP7]], i32 0
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM31]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM31]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV27]], ptr [[ARRAYIDX36]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM42:%.*]] = sext i32 [[FAM_IDX]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[DOTCOUNTED_BY_LOAD44:%.*]] = load i32, ptr [[DOTCOUNTED_BY_GEP]], align 4
@@ -389,7 +389,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITH-ATTR-NEXT:    tail call void @__ubsan_handle_out_of_bounds_abort(ptr nonnull @[[GLOB10:[0-9]+]], i64 [[IDXPROM60]]) #[[ATTR8]], !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont67:
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM60]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX65:%.*]] = getelementptr inbounds nuw i32, ptr [[ARRAY]], i64 [[IDXPROM60]]
 // SANITIZE-WITH-ATTR-NEXT:    [[COUNT50:%.*]] = sext i32 [[DOTCOUNTED_BY_LOAD44]] to i64
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP10:%.*]] = sub nsw i64 [[COUNT50]], [[IDXPROM42]]
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP11:%.*]] = tail call i64 @llvm.smax.i64(i64 [[TMP10]], i64 0)
@@ -411,7 +411,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = and i32 [[RESULT]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV1:%.*]] = select i1 [[TMP0]], i32 [[TMP1]], i32 0
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV1]], ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD7:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE9:%.*]] = shl i32 [[COUNTED_BY_LOAD7]], 2
@@ -419,9 +419,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP2:%.*]] = icmp sgt i32 [[COUNTED_BY_LOAD7]], 3
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP3:%.*]] = and i32 [[RESULT10]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV12:%.*]] = select i1 [[TMP2]], i32 [[TMP3]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM14:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM14]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX15:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV12]], ptr [[ARRAYIDX15]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[FAM_IDX]] to i64
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[COUNTED_BY_LOAD20:%.*]] = load i32, ptr [[COUNTED_BY_GEP]], align 4
@@ -434,9 +432,7 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP8:%.*]] = shl i32 [[DOTTR]], 2
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP9:%.*]] = and i32 [[TMP8]], 252
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[CONV26:%.*]] = select i1 [[TMP7]], i32 [[TMP9]], i32 0
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ADD28:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM29:%.*]] = sext i32 [[ADD28]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM29]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX30:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 [[CONV26]], ptr [[ARRAYIDX30]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -445,15 +441,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX5]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM17:%.*]] = sext i32 [[ADD]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM17]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX18:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 4
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX18]], align 4, !tbaa [[TBAA2]]
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD31:%.*]] = add nsw i32 [[INDEX]], 2
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM32:%.*]] = sext i32 [[ADD31]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM32]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX33:%.*]] = getelementptr i8, ptr [[ARRAYIDX5]], i64 8
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX33]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -462,15 +454,11 @@ size_t test3_bdos_cast(struct annotated *p) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAY:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds i32, ptr [[ARRAY]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX3]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD:%.*]] = add nsw i32 [[INDEX]], 1
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM9:%.*]] = sext i32 [[ADD]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM9]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX10:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 4
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX10]], align 4, !tbaa [[TBAA2]]
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ADD17:%.*]] = add nsw i32 [[INDEX]], 2
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM18:%.*]] = sext i32 [[ADD17]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds [0 x i32], ptr [[ARRAY]], i64 0, i64 [[IDXPROM18]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr i8, ptr [[ARRAYIDX3]], i64 8
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 255, ptr [[ARRAYIDX19]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -632,7 +620,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont3:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA4]]
 // SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -641,7 +629,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITH-ATTR-NEXT:    ret void
 //
@@ -650,7 +638,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -659,7 +647,7 @@ size_t test4_bdos_cast2(struct annotated *p, int index) {
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[IDXPROM:%.*]] = sext i32 [[INDEX]] to i64
-// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP0]], i64 0, i64 [[IDXPROM]]
+// NO-SANITIZE-WITHOUT-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i64 [[IDXPROM]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    store i32 -1, ptr [[ARRAYIDX]], align 4, !tbaa [[TBAA2]]
 // NO-SANITIZE-WITHOUT-ATTR-NEXT:    ret void
 //
@@ -704,7 +692,7 @@ size_t test5_bdos(struct anon_struct *p) {
 // SANITIZE-WITH-ATTR-NEXT:    unreachable, !nosanitize [[META2]]
 // SANITIZE-WITH-ATTR:       cont6:
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 16
-// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i32], ptr [[TMP1]], i64 0, i64 [[IDXPROM]]
+// SANITIZE-WITH-ATTR-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 [[IDXPROM]]
 // SANITIZE-WITH-ATTR-NEXT:    [[FLEXIBLE_ARRAY_MEMBER_SIZE:%.*]] = shl nuw i64 [[COUNTED_BY_LOAD]], 2
 // SANITIZE-WITH-ATTR-...
[truncated]

davemgreen · 2025-08-26T13:59:51Z

Oh - I was about to put up fd286bd. That was only going to try with array types, not with vectors too, and I believe the only other semi-useful part was some DA tests I had tried adding without array indices (which looked fine for all the ones I tested).

The differences I have seen have mostly been extra hoisting causing extra unrolling. I would be interested in if @dtcxzyw finds anything more.

nikic · 2025-08-26T19:55:46Z

@zyw-bot csmith-fuzz

nikic · 2025-08-26T20:29:12Z

Based on llvm-opt-benchmark, the main impact this has is enabling the gep + add -> gep + gep in many more cases (there are a few other transforms that only support single index gep, but they're less significant). This mostly has positive effects (as many gep + adds become a constant offset gep where the base was CSEd). The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

nikic · 2025-08-27T13:17:57Z

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

nikic · 2025-08-28T13:15:25Z

The main issue I saw is that we now sometimes fail to eliminate null checks of the form gep inbounds (p, x-1) == null, which are converted to gep (gep p, x), -1 now, losing flags.

I checked whether we can mitigate this by folding null checks before doing this transform, but at least in the one case I looked at, the null check only gets inlined after this fold has already happened.

I also checked whether not going this transform for constant negative offsets might make sense, but it turns out that it's pretty widely beneficial even in that case.

So I don't think there's anything we can really do about those regressions.

dtcxzyw

LG

jplehr · 2025-09-02T09:54:05Z

It appears that this broke our libc on GPU buildbot: https://lab.llvm.org/buildbot/#/builders/10/builds/12560
I bisected the breakage locally down to this change and reverting fixes it.
@jhuber6 for awareness.

A failing configuration can be reproduced with cmake -GNinja -C offload/cmake/caches/AMDGPULibcBot.cmake llvm. Though the fail is a runtime fail afaict.

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

nikic · 2025-09-02T10:33:48Z

@jplehr I don't have an AMDGPU environment to debug this. Is it possible to reduce this further?

jplehr · 2025-09-02T10:36:04Z

Yeah, I'm sure it is. Is IR dump what's needed here, either original or before some specific pass?

nikic · 2025-09-02T10:39:38Z

Not sure whether it's going to be sufficient, but the original IR dump would be a good starting point at least.

Adjust issue-118306.rs test after LLVM change This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

Rollup merge of #146116 - zmodem:issue_118306_fix, r=nikic Adjust issue-118306.rs test after LLVM change This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

jplehr · 2025-09-02T19:53:22Z

I finally got some IR and maybe some other info.
With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue.
after-lcssa.txt

jhuber6 · 2025-09-02T20:01:21Z

I finally got some IR and maybe some other info. With -opt-bisect-limit it appears that the issue is coming from GVN on the function _ZN47LlvmLibcCharacterConverterUTF32To8Test_FourByte3RunEv. After that the particular tests fails at runtime.

I attached the IR after the pass of LCSSA which runs before GVN. Let me know how helpful that is or what I can do to help better with this issue. after-lcssa.txt

I don't see anything immediately problematic looking at that function, https://godbolt.org/z/34fq6Erz8. But I'm not sure overall, cc @arsenm.

nikic · 2025-09-03T09:42:35Z

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

arsenm · 2025-09-03T10:17:15Z

GVN doesn't seem to do anything particularly interesting there. I tried running this through llc to see how the codegen changes, but that gives:

LLVM ERROR: Cannot select: intrinsic %llvm.amdgcn.if

Does this need extra llc flags beyond the triple?

This IR is not suitable for running through codegen again. We cannot deal with IR that has already had control flow intrinsics inserted. Reproducers should be IR at the start of codegen, not the end of the codegen IR pipeline

nikic · 2025-09-03T10:22:44Z

Oh, I see. I thought this was a GVN run in the middle-end, but it's one in the backend.

As a side note, I noticed this comment:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Lines 1380 to 1393 in 2f5500e

    
           // EarlyCSE is not always strong enough to clean up what LSR produces. For 
        
           // example, GVN can combine 
        
           // 
        
           //   %0 = add %a, %b 
        
           //   %1 = add %b, %a 
        
           // 
        
           // and 
        
           // 
        
           //   %0 = shl nsw %a, 2 
        
           //   %1 = shl %a, 2 
        
           // 
        
           // but EarlyCSE can do neither of them. 
        
           if (isPassEnabled(EnableScalarIRPasses)) 
        
             addEarlyCSEOrGVNPass();

This looks very outdated. EarlyCSE supports both commutative instructions and poison flag intersection. You may be able to save some compile-time by using EarlyCSE instead of GVN there...

jplehr · 2025-09-03T10:29:13Z

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

arsenm · 2025-09-03T10:39:54Z

If there is value in it, I can see to extract the IR earlier in the pipeline and attach that here.

Yes, need the IR at the start of codegen

jplehr · 2025-09-03T11:27:13Z

Attached is the IR from pre codegen. Apologies that this may again not be what is needed. I always find the clang-linker-wrapper thing a bit confusing.
So, I hope this helps and thanks for the time.
libc-gpu-precodegen.txt

jplehr · 2025-09-04T09:54:59Z

I reverted this patch locally as this restores the tests back to passing and then retrieved the IR from the same point in the pipeline -- attached below.
There is two differences, that I am not sure how relevant they are. I'll quote one here, another similar one is found much further towards the end of the file.

Good case, line 189

 %arrayidx.i.i.i.i = getelementptr inbounds [64 x %"struct.__llvm_libc_22_0_0_git::AtExitUnit"], ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 0, i64 %1

Bad case, line 189

  %arrayidx.i.i.i.i = getelementptr inbounds %"struct.__llvm_libc_22_0_0_git::AtExitUnit", ptr addrspace(1) @_ZN22__llvm_libc_22_0_0_git16atexit_callbacksE, i64 %1

I do not know whether this is any significant though. Mostly caught my attention as a difference and that the integer literal that went away (64) is the number of lanes on our GPU.
Maybe that info is a piece to this puzzle.

IR:
libc-gpu-precodegen-good.txt

nikic · 2025-09-04T11:13:32Z

I looked at the diff between those two, and all the changes look correct to me. The only real difference is related to __cxa_thread_finalize, which is probably just a difference in baselines (it was added yesterday).

nikic · 2025-09-04T12:44:01Z

There are more diffs in the llc output, but I find it hard to interpret those without more knowledge about AMDGPU.

jayfoad · 2025-09-04T13:37:58Z

I think the relevant part of the diff is probably:

--- libc-gpu-precodegen-good.txt        2025-09-04 14:13:56.751818563 +0100
+++ libc-gpu-precodegen.txt     2025-09-04 14:13:49.402694527 +0100
@@ -719 +720 @@
-  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef 240, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #28
+  %call.i572 = tail call fastcc noundef zeroext i1 @_ZN22__llvm_libc_22_0_0_git7testing8internal4testIiEEbPNS1_10RunContextENS0_8TestCondET_S6_PKcS8_NS1_8LocationE(ptr noundef %2, i32 noundef poison, i32 noundef 240, ptr noundef addrspacecast (ptr addrspace(4) @.str.10 to ptr), ptr noundef addrspacecast (ptr addrspace(4) @.str.24 to ptr), i32 133) #27

Note that in the good case the first three arguments to the call are "%2, 240, 240" while in the bad case they are "%2, poison, 240".

nikic · 2025-09-04T14:13:54Z

Ah yes, good point. That means the issue was already introduced at an earlier point in the middle-end pipeline, not during codegen.

jplehr · 2025-09-05T09:29:48Z

I checked out main at 69d0c3e yesterday.
Then reverted this commit and validated that this still has the tests pass. I produced "good" intermediate files with that version.
I then reverted the revert and validated that this has the tests fail. I produced "bad" intermediate files with that version.

I have attached all IR files (except precodegen), but for more insight at what exact points in the pipeline they are produced @jhuber6 needs to help me. These are produced with -Wl,-save-temps on the final step of the ninja --verbose to build that particular test; if that helps in figuring things out.
There are a few differences in the preopt IR files between good and bad. I cannot judge their significance though. Similar for the internalize files.
If there is something else that I can/should provide to help with this, please let me know. I will need guidance then.

01-preopt-bad.txt
01-preopt-good.txt
02-internalize-bad.txt
02-internalize-good.txt
03-opt-bad.txt
03-opt-good.txt

jayfoad · 2025-09-05T10:10:34Z

This diff looks suspicious to me in function _ZN22__llvm_libc_22_0_0_git8internal18CharacterConverter8pop_utf8Ev:

--- 01-preopt-good.txt  2025-09-05 10:39:57.226224451 +0100
+++ 01-preopt-bad.txt   2025-09-05 10:39:56.149207707 +0100
@@ -11112,10 +11113,10 @@
 if.then4:                                         ; preds = %if.end
-  %sub7 = add nsw i64 %conv, -1
-  %4 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %sub7
-  %5 = load i32, ptr %0, align 4, !tbaa !58
+  %4 = add nuw nsw i64 %conv, 4294967295
+  %5 = extractelement <4 x i8> <i8 0, i8 -64, i8 -32, i8 -16>, i64 %4
+  %6 = load i32, ptr %0, align 4, !tbaa !58
   %sh_prom = trunc nuw nsw i64 %mul to i32
-  %shr = lshr i32 %5, %sh_prom
+  %shr = lshr i32 %6, %sh_prom
   %shr.tr = trunc i32 %shr to i8
-  %or.narrow = or i8 %4, %shr.tr
-  %6 = zext i8 %or.narrow to i32
+  %or.narrow = or i8 %5, %shr.tr
+  %7 = zext i8 %or.narrow to i32
   br label %if.end15

I am guessing that the add nsw i64 %conv, -1 vs add nuw nsw i64 %conv, 4294967295 are the result of transforming some getelementptr instructions, and it seems suspicious that the 64-bit offset is small and negative in the good case but large and positive in the bad case.

jplehr · 2025-09-05T11:34:37Z

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

nikic · 2025-09-08T10:29:07Z

Given this is in the preopt IR, the change would occur in an even earlier stage, right? I'll see if I can get some IR out of whatever stage that may be.

Yes, you're currently dumping the pre-LTO IR, but we'd want the full optimized one produced by the compiler. (Should be possible to get it with -S -emit-llvm -Xclang -disable-llvm-optzns.)

Does anyone know which transform is capable of converting a load from a constant global into a dynamic extract from a vector constant? I tried and failed to reproduce that behavior on simpler test cases.

ritter-x2a · 2025-09-09T08:04:45Z

I think I might have found the culprit: AMDGPUPromoteAllocaToVectorPass
Attached is the IR before and after that pass, with interesting points marked with ; HERE: .
A %arrayidx = getelementptr i8, ptr addrspace(5) %14, i64 -1 (AS 5 is the private/scratch address space, so pointers have 32 bits) seems to be translated into %15 = add i64 4294967295, %conv6, which adds a zero-extended 32-bit -1 in 64 bits. That probably should be sign-extended instead.

around-AMDGPUPromoteAllocaToVectorPass.ll.txt

nikic · 2025-09-09T08:26:32Z

@ritter-x2a Nice find! I guess this code is the culprit:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Line 404 in db8cad0

static Value *GEPToVectorIndex(GetElementPtrInst *GEP, AllocaInst *Alloca,

It seems to treat ConstOffset as an unsigned number, while it is actually signed.

ritter-x2a · 2025-09-09T09:00:14Z

@nikic yes, using SExt instead of ZExt and sdivrem instead of udivrem in GEPToVectorIndex fixes the problematic test for me, but it causes memory access faults in some other tests. I'll have to think a bit more about which bitwidths we need at the individual operations, I guess.

ritter-x2a · 2025-09-09T14:52:45Z

PR to fix the AMDGPUPromoteAlloca issue: #157682
I'm still observing crashes in different tests, libc.test.src.math.smoke.ufromfpf_test and similar variants, but I see those with and without my fix, and with and without the patch from this PR reverted, so they are probably caused by a separate problem.

nathanchance · 2025-09-10T20:39:04Z

For what it's worth, I bisected the introduction of an instance of -Wframe-larger-than when building the Linux kernel with CONFIG_KASAN to this change.

# bad: [0fb2f524f8bc7a945fb1795e70ebe80adc08e237] [LLD] Recognize 'mipspe' as a PE target name (#157305)
# good: [1ffc38ca4920e6fdf0e640d09f50f397d6831a09] [TableGen][DecoderEmitter] Remove unused variables (NFC) (#153262)
git bisect start '0fb2f524f8bc7a945fb1795e70ebe80adc08e237' '1ffc38ca4920e6fdf0e640d09f50f397d6831a09'
# good: [3d722f5ed9497cc6118c119dece7db6b694e093a] [gn build] Port 2ab4c2880db6
git bisect good 3d722f5ed9497cc6118c119dece7db6b694e093a
# bad: [c945022f2fd8321559d84e2272005487c5ced924] [MLIR][NVVM] Support packed registers in `inline_ptx` (#154904)
git bisect bad c945022f2fd8321559d84e2272005487c5ced924
# good: [5dfc4f79655d6e36088aaac5c702b5034731a754] [docs][OpenMP] Add docs section for OpenMP 6.1 implementation status (#155651)
git bisect good 5dfc4f79655d6e36088aaac5c702b5034731a754
# good: [257975fada8bb40f729976a9caa06d2bbb4e9f12] [mlir] Fix node numbering order in SliceMatchers example (#155684)
git bisect good 257975fada8bb40f729976a9caa06d2bbb4e9f12
# bad: [d7d870326202a3dce9cec7a12aa097eb48c559d2] [Dexter] Implement DexStepFunction and DexContinue (#152721)
git bisect bad d7d870326202a3dce9cec7a12aa097eb48c559d2
# good: [34fbf1f0b79750f69d391cf9f778d8bca2ca5f05] [memprof] Make the AllocSites and CallSites sections optional in YAML (#156226)
git bisect good 34fbf1f0b79750f69d391cf9f778d8bca2ca5f05
# good: [09350bd1c5ebba6cae96afcb70ef8ac097f7e1de] [NFC][PowerPC] adding the arguments for register names and VSR to VR (#155991)
git bisect good 09350bd1c5ebba6cae96afcb70ef8ac097f7e1de
# good: [a987022f33a27610732544b0c5f4475ce818c982] [MemoryBuiltins] Add getBaseObjectSize() (NFCI) (#155911)
git bisect good a987022f33a27610732544b0c5f4475ce818c982
# bad: [9ad8e12c573f248b3f0ca4cca01fee1e5ed22c7b] [AMDGPU] Expand scratch atomics to flat atomics if GAS is enabled (#154710)
git bisect bad 9ad8e12c573f248b3f0ca4cca01fee1e5ed22c7b
# good: [e82abde49454d0bd2c2d37b6209e1d21fbcbaa56] [SLP][NFC] Refactor `if`s into `&&` (#156278)
git bisect good e82abde49454d0bd2c2d37b6209e1d21fbcbaa56
# bad: [c128b8c46f2a3b750c9abcba1e303f92d6531e5f] [clang-reorder-fields] Fix unused private field warning (NFC)
git bisect bad c128b8c46f2a3b750c9abcba1e303f92d6531e5f
# bad: [055bfc027141bbfafd51fb43f5ab81ba3b480649] [InstCombine] Strip leading zero indices from GEP (#155415)
git bisect bad 055bfc027141bbfafd51fb43f5ab81ba3b480649
# first bad commit: [055bfc027141bbfafd51fb43f5ab81ba3b480649] [InstCombine] Strip leading zero indices from GEP (#155415)

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 clean defconfig

$ scripts/config -e COMPILE_TEST -e DRM_OMAP -e KASAN

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig drivers/gpu/drm/omapdrm/dss/dispc.o
...
drivers/gpu/drm/omapdrm/dss/dispc.c:4720:27: warning: stack frame size (2256) exceeds limit (2048) in 'dispc_runtime_suspend' [-Wframe-larger-than]
 4720 | static __maybe_unused int dispc_runtime_suspend(struct device *dev)
      |                           ^
...

cvise spits out

cvise reproducer

enum omap_plane_id {
  OMAP_DSS_GFX,
  OMAP_DSS_VIDEO1,
  OMAP_DSS_VIDEO2,
  OMAP_DSS_VIDEO3,
  OMAP_DSS_WB
};
struct dispc_device *dispc;
struct dispc_device {
  int ctx[];
};
enum omap_plane_id DISPC_PIC_SIZE_OFFSET_plane, DISPC_FIR_COEF_HV2_OFFSET_plane,
    DISPC_FIR_COEF_V_OFFSET_plane;
void *base;
int dispc_save_context___trans_tmp_21, dispc_save_context___trans_tmp_17,
    dispc_save_context___trans_tmp_40, dispc_save_context___trans_tmp_38,
    dispc_save_context___trans_tmp_37, dispc_save_context___trans_tmp_37,
    dispc_save_context___trans_tmp_35;
int __raw_readl(void *addr) {
  int val;
  asm volatile(".if "
               "1"
               " == 1\n"
               ".endif\n"
               : "=r"(val)
               : "r"(addr));
  return val;
}
short DISPC_OVL_BASE(enum omap_plane_id plane) {
  switch (plane) {
  case OMAP_DSS_GFX:
    return 0;
  case OMAP_DSS_VIDEO1:
    return 188;
  case OMAP_DSS_VIDEO2:
    return 332;
  case OMAP_DSS_VIDEO3:
    return 768;
  case OMAP_DSS_WB:
    return 1280;
  default:
    __builtin_unreachable();
  }
}
short DISPC_BA0_UV_OFFSET(enum omap_plane_id plane) {
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
    return 44;
  case OMAP_DSS_VIDEO2:
    return 2;
  case OMAP_DSS_VIDEO3:
    return 0;
  case OMAP_DSS_WB:
    return 18;
  }
}
short DISPC_PIX_INC_OFFSET(enum omap_plane_id plane) {
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
  case OMAP_DSS_VIDEO2:
    return 20;
  case OMAP_DSS_VIDEO3:
  case OMAP_DSS_WB:
    return 8;
  }
}
short DISPC_FIR2_OFFSET(enum omap_plane_id plane) {
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
    return 408;
  case OMAP_DSS_VIDEO2:
    return 2;
  case OMAP_DSS_VIDEO3:
    return 4;
  case OMAP_DSS_WB:
    return 90;
  }
}
short DISPC_ACCU2_0_OFFSET(enum omap_plane_id plane) {
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
    return 1412;
  case OMAP_DSS_VIDEO2:
    return 60;
  case OMAP_DSS_VIDEO3:
    return 8;
  case OMAP_DSS_WB:
    return 4;
  }
}
short DISPC_FIR_COEF_H_OFFSET(enum omap_plane_id plane, short i) {
  switch (plane) {
  case OMAP_DSS_VIDEO1:
  case OMAP_DSS_VIDEO2:
    return 52 + i * 8;
  case OMAP_DSS_WB:
    return 16 + i * 8;
  default:
    __builtin_unreachable();
  }
}
short DISPC_FIR_COEF_H2_OFFSET(short i) {
  enum omap_plane_id plane = 0;
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
    return 1420 + i * 8;
  case OMAP_DSS_VIDEO2:
  case OMAP_DSS_VIDEO3:
  case OMAP_DSS_WB:
    return 8;
  }
}
short DISPC_FIR_COEF_HV_OFFSET(enum omap_plane_id plane, short i) {
  switch (plane) {
  case OMAP_DSS_VIDEO1:
  case OMAP_DSS_VIDEO2:
    return 56 + i * 8;
  case OMAP_DSS_WB:
    return 14 + i * 8;
  default:
    __builtin_unreachable();
  }
}
short DISPC_FIR_COEF_HV2_OFFSET(short i) {
  switch (DISPC_FIR_COEF_HV2_OFFSET_plane)
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
  case OMAP_DSS_VIDEO2:
  case OMAP_DSS_VIDEO3:
  case OMAP_DSS_WB:
    return 676 + i;
}
short DISPC_FIR_COEF_V_OFFSET(short i) {
  switch (DISPC_FIR_COEF_V_OFFSET_plane) {
  case OMAP_DSS_VIDEO1:
    return 292 + i * 4;
  default:
    __builtin_unreachable();
  }
}
short DISPC_FIR_COEF_V2_OFFSET(short i) {
  enum omap_plane_id plane = 0;
  switch (plane) {
  case OMAP_DSS_GFX:
  case OMAP_DSS_VIDEO1:
    return 1484 + i;
  case OMAP_DSS_VIDEO2:
  case OMAP_DSS_VIDEO3:
  case OMAP_DSS_WB:
    return 4;
  }
}
int dispc_read_reg(short idx) { return __raw_readl(base + idx); }
int dispc_get_num_ovls();
static void dispc_save_context(struct dispc_device *dispc) {
  short __trans_tmp_36, __trans_tmp_7, __trans_tmp_6, __trans_tmp_4,
      __trans_tmp_3, __trans_tmp_2, __trans_tmp_1;
  int __trans_tmp_19, __trans_tmp_13, i, j;
  i = 0;
  for (; dispc_get_num_ovls(); i++) {
    dispc->ctx[(DISPC_OVL_BASE(i) + DISPC_PIX_INC_OFFSET(i)) / sizeof(int)] = 0;
    if (i == OMAP_DSS_GFX)
      continue;
    {
      short idx = 0;
      void *addr = base;
      int val;
      asm volatile(".if "
                   "1"
                   " == 1\n"
                   ".endif\n"
                   : "=r"(val)
                   : "r"(addr));
      dispc_save_context___trans_tmp_35 = val;
      __trans_tmp_13 = dispc_save_context___trans_tmp_35 + idx;
      switch (DISPC_PIC_SIZE_OFFSET_plane)
      case OMAP_DSS_GFX:
      case OMAP_DSS_VIDEO1:
      case OMAP_DSS_VIDEO2:
      case OMAP_DSS_VIDEO3:
      case OMAP_DSS_WB:
        __trans_tmp_36 = 94;
    }
    dispc->ctx[DISPC_OVL_BASE(i) + __trans_tmp_36] = __trans_tmp_13;
    dispc->ctx[DISPC_OVL_BASE(i) / sizeof(int)] =
        dispc->ctx[(DISPC_OVL_BASE(i) + 4) / sizeof(int)];
    j = 0;
    for (; j < 8; j++) {
      __trans_tmp_1 = DISPC_OVL_BASE(i);
      short idx = __trans_tmp_1;
      __raw_readl(base + idx);
      dispc->ctx[(DISPC_OVL_BASE(i) + DISPC_FIR_COEF_H_OFFSET(i, j)) /
                 sizeof(int)] = dispc_save_context___trans_tmp_37;
    }
    j = 0;
    for (; j < 8; j++) {
      __trans_tmp_2 = DISPC_OVL_BASE(i);
      short idx = __trans_tmp_2;
      __raw_readl(base + idx);
      dispc->ctx[(DISPC_OVL_BASE(i) + DISPC_FIR_COEF_HV_OFFSET(i, j)) /
                 sizeof(int)] = dispc_save_context___trans_tmp_38;
    }
    j = 0;
    for (; j < 5; j++) {
      void *addr = base;
      int val;
      asm volatile(".if "
                   "1"
                   " == 1\n"
                   ".endif\n"
                   : "=r"(val)
                   : "r"(addr));
      dispc->ctx[DISPC_OVL_BASE(i) + 116 + j * 4 / sizeof(int)] =
          dispc_save_context___trans_tmp_17;
    }
    j = 0;
    for (; j < 8; j++) {
      __trans_tmp_3 = DISPC_OVL_BASE(i);
      short idx = __trans_tmp_3;
      __raw_readl(base + idx);
      dispc->ctx[(DISPC_OVL_BASE(i) + DISPC_FIR_COEF_V_OFFSET(j)) /
                 sizeof(int)] = dispc_save_context___trans_tmp_40;
    }
    dispc->ctx[500 + DISPC_BA0_UV_OFFSET(i)] =
        dispc->ctx[DISPC_OVL_BASE(i) + 788 / sizeof(int)] =
            dispc->ctx[DISPC_FIR2_OFFSET(i)];
    short idx = 0;
    __trans_tmp_19 = idx;
    dispc->ctx[500 + DISPC_ACCU2_0_OFFSET(i)] = __trans_tmp_19;
    dispc->ctx[8] = j = 0;
    for (; j < 8; j++) {
      __trans_tmp_4 = DISPC_OVL_BASE(i);
      {
        idx = __trans_tmp_4;
        dispc_save_context___trans_tmp_37 = __raw_readl(base + idx);
      }
      dispc->ctx[DISPC_OVL_BASE(i) + DISPC_FIR_COEF_H2_OFFSET(j)] =
          dispc_save_context___trans_tmp_37;
    }
    j = 0;
    for (; j < 8; j++)
      dispc->ctx[DISPC_OVL_BASE(i) + DISPC_FIR_COEF_HV2_OFFSET(j)] =
          dispc_save_context___trans_tmp_21;
    j = 0;
    for (; j < 8; j++) {
      __trans_tmp_6 = DISPC_OVL_BASE(i);
      __trans_tmp_7 = 0;
      dispc->ctx[DISPC_OVL_BASE(i) + DISPC_FIR_COEF_V2_OFFSET(j)] =
          dispc_read_reg(__trans_tmp_6 + __trans_tmp_7);
    }
  }
}
void dispc_runtime_suspend() { dispc_save_context(dispc); }

which shows the stack usage tripling with -fsanitize=kernel-address:

GCC 15.2.0:

$ aarch64-linux-gcc -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i: In function 'DISPC_BA0_UV_OFFSET':
dispc.i:57:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   57 | }
      | ^
dispc.i: In function 'DISPC_FIR2_OFFSET':
dispc.i:81:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   81 | }
      | ^
dispc.i: In function 'DISPC_ACCU2_0_OFFSET':
dispc.i:94:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   94 | }
      | ^
dispc.i: In function 'dispc_runtime_suspend':
dispc.i:258:59: warning: the frame size of 64 bytes is larger than 1 bytes [-Wframe-larger-than=]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |                                                           ^

$ x86_64-linux-gcc -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i: In function 'DISPC_BA0_UV_OFFSET':
dispc.i:57:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   57 | }
      | ^
dispc.i: In function 'DISPC_FIR2_OFFSET':
dispc.i:81:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   81 | }
      | ^
dispc.i: In function 'DISPC_ACCU2_0_OFFSET':
dispc.i:94:1: warning: the frame size of 16 bytes is larger than 1 bytes [-Wframe-larger-than=]
   94 | }
      | ^
dispc.i: In function 'dispc_runtime_suspend':
dispc.i:258:59: warning: the frame size of 48 bytes is larger than 1 bytes [-Wframe-larger-than=]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |                                                           ^

clang @ e82abde:

$ clang --target=aarch64-linux-gnu -O2 -Wframe-larger-than=1 -c -o /dev/null dispc.i
dispc.i:258:6: warning: stack frame size (128) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=aarch64-linux-gnu -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i:29:7: warning: stack frame size (32) exceeds limit (1) in 'DISPC_OVL_BASE' [-Wframe-larger-than]
   29 | short DISPC_OVL_BASE(enum omap_plane_id plane) {
      |       ^
dispc.i:258:6: warning: stack frame size (704) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=x86_64-linux-gnu -O2 -Wframe-larger-than=1 -c -o /dev/null dispc.i
dispc.i:258:6: warning: stack frame size (88) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=x86_64-linux-gnu -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i:29:7: warning: stack frame size (8) exceeds limit (1) in 'DISPC_OVL_BASE' [-Wframe-larger-than]
   29 | short DISPC_OVL_BASE(enum omap_plane_id plane) {
      |       ^
dispc.i:258:6: warning: stack frame size (488) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

clang @ 055bfc0:

$ clang --target=aarch64-linux-gnu -O2 -Wframe-larger-than=1 -c -o /dev/null dispc.i
dispc.i:258:6: warning: stack frame size (96) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=aarch64-linux-gnu -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i:29:7: warning: stack frame size (32) exceeds limit (1) in 'DISPC_OVL_BASE' [-Wframe-larger-than]
   29 | short DISPC_OVL_BASE(enum omap_plane_id plane) {
      |       ^
dispc.i:258:6: warning: stack frame size (2064) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=x86_64-linux-gnu -O2 -Wframe-larger-than=1 -c -o /dev/null dispc.i
dispc.i:258:6: warning: stack frame size (104) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

$ clang --target=x86_64-linux-gnu -O2 -Wframe-larger-than=1 -fsanitize=kernel-address -c -o /dev/null dispc.i
dispc.i:29:7: warning: stack frame size (8) exceeds limit (1) in 'DISPC_OVL_BASE' [-Wframe-larger-than]
   29 | short DISPC_OVL_BASE(enum omap_plane_id plane) {
      |       ^
dispc.i:258:6: warning: stack frame size (1464) exceeds limit (1) in 'dispc_runtime_suspend' [-Wframe-larger-than]
  258 | void dispc_runtime_suspend() { dispc_save_context(dispc); }
      |      ^

This is the same code that triggered #143908 but that reproducer does not change with or without this patch.

nikic · 2025-09-11T16:41:51Z

This is the same code that triggered #143908 but that reproducer does not change with or without this patch.

The root cause here is the same. This change just ends up unrolling additional cases and triggers the same underlying problem with address calculations being unprofitably hoisted out of a loop and spilled. I left a comment on that issue.

With latest llvm22, I got the following verification failure: ... ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207 0: (b4) w6 = 1 ; R6_w=1 ... ; if (err) @ verifier_arena_large.c:233 53: (56) if w6 != 0x0 goto pc+62 ; R6=0 54: (b7) r7 = -4 ; R7_w=-4 55: (18) r8 = 0x7f4000000000 ; R8_w=scalar() 57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena 58: (b4) w6 = 5 ; R6_w=5 ; pg = page[i]; @ verifier_arena_large.c:238 59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4 60: (07) r1 += 4 ; R1_w=0 61: (79) r2 = *(u64 *)(r9 +0) ; R2_w=scalar() R9_w=arena ; if (*pg != i) @ verifier_arena_large.c:239 62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena 63: (71) r3 = *(u8 *)(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) 64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0 ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241 65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0) 67: (b4) w3 = 2 ; R3_w=2 68: (85) call bpf_arena_free_pages#72675 ; 69: (b7) r1 = 0 ; R1_w=0 ; page[i + 1] = NULL; @ verifier_arena_large.c:243 70: (7b) *(u64 *)(r8 +8) = r1 R8 invalid mem access 'scalar' processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2 ============= torvalds#489/5 verifier_arena_large/big_alloc2:FAIL The main reason is that 'r8' in insn '70' is not an arena pointer. Further debugging at llvm side shows that llvm commit ([1]) caused the failure. For the original code: page[i] = NULL; page[i + 1] = NULL; the llvm transformed it to something like below at source level: __builtin_memset(&page[i], 0, 16) Such transformation prevents llvm BPFCheckAndAdjustIR pass from generating proper addr_space_cast insns ([2]). Adding support in llvm BPFCheckAndAdjustIR pass should work, but not sure that such a pattern exists or not in real applications. At the same time, simply adding a memory barrier between two 'page' assignment can fix the issue. [1] llvm/llvm-project#155415 [2] llvm/llvm-project#84410 Cc: Eduard Zingerman <[email protected]> Signed-off-by: Yonghong Song <[email protected]>

With latest llvm22, I got the following verification failure: ... ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207 0: (b4) w6 = 1 ; R6_w=1 ... ; if (err) @ verifier_arena_large.c:233 53: (56) if w6 != 0x0 goto pc+62 ; R6=0 54: (b7) r7 = -4 ; R7_w=-4 55: (18) r8 = 0x7f4000000000 ; R8_w=scalar() 57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena 58: (b4) w6 = 5 ; R6_w=5 ; pg = page[i]; @ verifier_arena_large.c:238 59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4 60: (07) r1 += 4 ; R1_w=0 61: (79) r2 = *(u64 *)(r9 +0) ; R2_w=scalar() R9_w=arena ; if (*pg != i) @ verifier_arena_large.c:239 62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena 63: (71) r3 = *(u8 *)(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) 64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0 ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241 65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0) 67: (b4) w3 = 2 ; R3_w=2 68: (85) call bpf_arena_free_pages#72675 ; 69: (b7) r1 = 0 ; R1_w=0 ; page[i + 1] = NULL; @ verifier_arena_large.c:243 70: (7b) *(u64 *)(r8 +8) = r1 R8 invalid mem access 'scalar' processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2 ============= #489/5 verifier_arena_large/big_alloc2:FAIL The main reason is that 'r8' in insn '70' is not an arena pointer. Further debugging at llvm side shows that llvm commit ([1]) caused the failure. For the original code: page[i] = NULL; page[i + 1] = NULL; the llvm transformed it to something like below at source level: __builtin_memset(&page[i], 0, 16) Such transformation prevents llvm BPFCheckAndAdjustIR pass from generating proper addr_space_cast insns ([2]). Adding support in llvm BPFCheckAndAdjustIR pass should work, but not sure that such a pattern exists or not in real applications. At the same time, simply adding a memory barrier between two 'page' assignment can fix the issue. [1] llvm/llvm-project#155415 [2] llvm/llvm-project#84410 Cc: Eduard Zingerman <[email protected]> Signed-off-by: Yonghong Song <[email protected]>

With latest llvm22, I got the following verification failure: ... ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207 0: (b4) w6 = 1 ; R6_w=1 ... ; if (err) @ verifier_arena_large.c:233 53: (56) if w6 != 0x0 goto pc+62 ; R6=0 54: (b7) r7 = -4 ; R7_w=-4 55: (18) r8 = 0x7f4000000000 ; R8_w=scalar() 57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena 58: (b4) w6 = 5 ; R6_w=5 ; pg = page[i]; @ verifier_arena_large.c:238 59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4 60: (07) r1 += 4 ; R1_w=0 61: (79) r2 = *(u64 *)(r9 +0) ; R2_w=scalar() R9_w=arena ; if (*pg != i) @ verifier_arena_large.c:239 62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena 63: (71) r3 = *(u8 *)(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) 64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0 ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241 65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0) 67: (b4) w3 = 2 ; R3_w=2 68: (85) call bpf_arena_free_pages#72675 ; 69: (b7) r1 = 0 ; R1_w=0 ; page[i + 1] = NULL; @ verifier_arena_large.c:243 70: (7b) *(u64 *)(r8 +8) = r1 R8 invalid mem access 'scalar' processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2 ============= #489/5 verifier_arena_large/big_alloc2:FAIL The main reason is that 'r8' in insn '70' is not an arena pointer. Further debugging at llvm side shows that llvm commit ([1]) caused the failure. For the original code: page[i] = NULL; page[i + 1] = NULL; the llvm transformed it to something like below at source level: __builtin_memset(&page[i], 0, 16) Such transformation prevents llvm BPFCheckAndAdjustIR pass from generating proper addr_space_cast insns ([2]). Adding support in llvm BPFCheckAndAdjustIR pass should work, but not sure that such a pattern exists or not in real applications. At the same time, simply adding a memory barrier between two 'page' assignment can fix the issue. [1] llvm/llvm-project#155415 [2] llvm/llvm-project#84410 Cc: Eduard Zingerman <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

With latest llvm22, I got the following verification failure: ... ; int big_alloc2(void *ctx) @ verifier_arena_large.c:207 0: (b4) w6 = 1 ; R6_w=1 ... ; if (err) @ verifier_arena_large.c:233 53: (56) if w6 != 0x0 goto pc+62 ; R6=0 54: (b7) r7 = -4 ; R7_w=-4 55: (18) r8 = 0x7f4000000000 ; R8_w=scalar() 57: (bf) r9 = addr_space_cast(r8, 0, 1) ; R8_w=scalar() R9_w=arena 58: (b4) w6 = 5 ; R6_w=5 ; pg = page[i]; @ verifier_arena_large.c:238 59: (bf) r1 = r7 ; R1_w=-4 R7_w=-4 60: (07) r1 += 4 ; R1_w=0 61: (79) r2 = *(u64 *)(r9 +0) ; R2_w=scalar() R9_w=arena ; if (*pg != i) @ verifier_arena_large.c:239 62: (bf) r3 = addr_space_cast(r2, 0, 1) ; R2_w=scalar() R3_w=arena 63: (71) r3 = *(u8 *)(r3 +0) ; R3_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) 64: (5d) if r1 != r3 goto pc+51 ; R1_w=0 R3_w=0 ; bpf_arena_free_pages(&arena, (void __arena *)pg, 2); @ verifier_arena_large.c:241 65: (18) r1 = 0xff11000114548000 ; R1_w=map_ptr(map=arena,ks=0,vs=0) 67: (b4) w3 = 2 ; R3_w=2 68: (85) call bpf_arena_free_pages#72675 ; 69: (b7) r1 = 0 ; R1_w=0 ; page[i + 1] = NULL; @ verifier_arena_large.c:243 70: (7b) *(u64 *)(r8 +8) = r1 R8 invalid mem access 'scalar' processed 61 insns (limit 1000000) max_states_per_insn 0 total_states 6 peak_states 6 mark_read 2 ============= #489/5 verifier_arena_large/big_alloc2:FAIL The main reason is that 'r8' in insn '70' is not an arena pointer. Further debugging at llvm side shows that llvm commit ([1]) caused the failure. For the original code: page[i] = NULL; page[i + 1] = NULL; the llvm transformed it to something like below at source level: __builtin_memset(&page[i], 0, 16) Such transformation prevents llvm BPFCheckAndAdjustIR pass from generating proper addr_space_cast insns ([2]). Adding support in llvm BPFCheckAndAdjustIR pass should work, but not sure that such a pattern exists or not in real applications. At the same time, simply adding a memory barrier between two 'page' assignment can fix the issue. [1] llvm/llvm-project#155415 [2] llvm/llvm-project#84410 Cc: Eduard Zingerman <[email protected]> Signed-off-by: Yonghong Song <[email protected]>

Adjust issue-118306.rs test after LLVM change This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

[InstCombine] Strip leading zero indices from GEP

87fd263

GEPs are often in the form `gep [N x %T], ptr %p, i64 0, i64 %idx`. Canonicalize these to `gep %T, ptr %p, i64 %idx`. This enables transforms that only support one GEP index to work and improves CSE.

nikic requested review from davemgreen and dtcxzyw August 26, 2025 13:37

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Aug 26, 2025

nikic mentioned this pull request Aug 26, 2025

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Aug 26, 2025

pre-commit: PR155415 dtcxzyw/llvm-opt-benchmark#2716

Closed

zyw-bot mentioned this pull request Aug 26, 2025

Fuzz PR155415 dtcxzyw/llvm-fuzz-service#129

Closed

dtcxzyw approved these changes Aug 29, 2025

View reviewed changes

nikic merged commit 055bfc0 into llvm:main Sep 1, 2025
14 checks passed

nikic deleted the instcombine-gep-leading-zeros branch September 1, 2025 07:58

s-barannikov mentioned this pull request Sep 1, 2025

[TableGen][Decoder] Decode operands with zero width or all bits known #156358

Merged

zmodem added a commit to zmodem/rust that referenced this pull request Sep 2, 2025

Adjust issue-118306.rs test after LLVM change

4ecca58

This updates tests/codegen-llvm/issues/issue-118306.rs to pass also after llvm/llvm-project#155415

zmodem mentioned this pull request Sep 2, 2025

Adjust issue-118306.rs test after LLVM change rust-lang/rust#146116

Merged

lukel97 mentioned this pull request Sep 3, 2025

[RISCV] Unprofitable vectorization in llvm-test-suite since a49030e493b76bc2b294d36709c7ed84aef557f4 #156569

Open

ritter-x2a mentioned this pull request Sep 9, 2025

[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca #157682

Merged

[InstCombine] Strip leading zero indices from GEP #155415

[InstCombine] Strip leading zero indices from GEP #155415

Uh oh!

Conversation

nikic commented Aug 26, 2025

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Aug 26, 2025

Uh oh!

nikic commented Aug 26, 2025

Uh oh!

nikic commented Aug 26, 2025

Uh oh!

nikic commented Aug 27, 2025

Uh oh!

nikic commented Aug 28, 2025

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

nikic commented Sep 2, 2025

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

nikic commented Sep 2, 2025

Uh oh!

jplehr commented Sep 2, 2025

Uh oh!

jhuber6 commented Sep 2, 2025

Uh oh!

nikic commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

nikic commented Sep 3, 2025

Uh oh!

jplehr commented Sep 3, 2025

Uh oh!

arsenm commented Sep 3, 2025

Uh oh!

jplehr commented Sep 3, 2025

Uh oh!

jplehr commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

jayfoad commented Sep 4, 2025

Uh oh!

nikic commented Sep 4, 2025

Uh oh!

jplehr commented Sep 5, 2025

Uh oh!

jayfoad commented Sep 5, 2025

Uh oh!

jplehr commented Sep 5, 2025

Uh oh!

nikic commented Sep 8, 2025

Uh oh!

ritter-x2a commented Sep 9, 2025

Uh oh!

nikic commented Sep 9, 2025

Uh oh!

ritter-x2a commented Sep 9, 2025

Uh oh!

ritter-x2a commented Sep 9, 2025

Uh oh!

nathanchance commented Sep 10, 2025

Uh oh!

nikic commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

llvmbot commented Aug 26, 2025 •

edited

Loading