Skip to content

Conversation

@vg0204
Copy link
Contributor

@vg0204 vg0204 commented Jun 17, 2024

The __builtin_alloca was returning a flat pointer with no address space when compiled using openCL1.2 or below but worked fine with openCL2.0 and above. This accounts to the fact that later uses the concept of generic address space which supports cast to other address space(i.e to private address space which is used for stack allocation) .

But, in actuality, as it returns pointer to the stack, it should be pointing to private address space irrespective of openCL version becuase builtin_alloca allocates stack memory used for current function in which it is called. Thus,it requires redefintion of the builtin function with appropraite return pointer to private address space.

@vg0204 vg0204 added clang:frontend Language frontend issues, e.g. anything involving "Sema" clang-cl `clang-cl` driver. Don't use for other compiler parts labels Jun 17, 2024
@vg0204 vg0204 self-assigned this Jun 17, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 17, 2024
@vg0204 vg0204 requested a review from arsenm June 17, 2024 08:23
@llvmbot
Copy link
Member

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-clang

Author: Vikash Gupta (vg0204)

Changes

The __builtin_alloca was returning a flat pointer with no address space when compiled using openCL1.2 or below but worked fine with openCL2.0 and above. This accounts to the fact that later uses the concept of generic address space which supports cast to other address space(i.e to private address space which is used for stack allocation) .

So, in case of openCL1.2 and below __built_alloca is supposed to return pointer to private address space to eliminate the need of casting as not supported here. Thus,it requires redefintion of the builtin function with appropraite return pointer to appropriate address space.


Full diff: https://github.com/llvm/llvm-project/pull/95750.diff

3 Files Affected:

  • (modified) clang/lib/Sema/SemaExpr.cpp (+20-3)
  • (added) clang/test/CodeGenOpenCL/builtins-alloca.cl (+86)
  • (modified) clang/test/CodeGenOpenCL/memcpy.cl ()
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 99a8704298314..e12c2a9209706 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -6231,7 +6231,10 @@ bool Sema::CheckArgsForPlaceholders(MultiExprArg args) {
 ///                  it does not contain any pointer arguments without
 ///                  an address space qualifer.  Otherwise the rewritten
 ///                  FunctionDecl is returned.
-/// TODO: Handle pointer return types.
+///
+/// Pointer return type with no explicit address space is assigned the
+/// default address space where pointer points to based on the language
+/// option used to compile it.
 static FunctionDecl *rewriteBuiltinFunctionDecl(Sema *Sema, ASTContext &Context,
                                                 FunctionDecl *FDecl,
                                                 MultiExprArg ArgExprs) {
@@ -6275,13 +6278,27 @@ static FunctionDecl *rewriteBuiltinFunctionDecl(Sema *Sema, ASTContext &Context,
     OverloadParams.push_back(Context.getPointerType(PointeeType));
   }
 
+  QualType ReturnTy = FT->getReturnType();
+  QualType OverloadReturnTy = ReturnTy;
+  if (ReturnTy->isPointerType() &&
+      !ReturnTy->getPointeeType().hasAddressSpace()) {
+    if (Sema->getLangOpts().OpenCL) {
+      NeedsNewDecl = true;
+
+      QualType ReturnPtTy = ReturnTy->getPointeeType();
+      LangAS defClAS = Context.getDefaultOpenCLPointeeAddrSpace();
+      ReturnPtTy = Context.getAddrSpaceQualType(ReturnPtTy, defClAS);
+      OverloadReturnTy = Context.getPointerType(ReturnPtTy);
+    }
+  }
+
   if (!NeedsNewDecl)
     return nullptr;
 
   FunctionProtoType::ExtProtoInfo EPI;
   EPI.Variadic = FT->isVariadic();
-  QualType OverloadTy = Context.getFunctionType(FT->getReturnType(),
-                                                OverloadParams, EPI);
+  QualType OverloadTy =
+      Context.getFunctionType(OverloadReturnTy, OverloadParams, EPI);
   DeclContext *Parent = FDecl->getParent();
   FunctionDecl *OverloadDecl = FunctionDecl::Create(
       Context, Parent, FDecl->getLocation(), FDecl->getLocation(),
diff --git a/clang/test/CodeGenOpenCL/builtins-alloca.cl b/clang/test/CodeGenOpenCL/builtins-alloca.cl
new file mode 100644
index 0000000000000..74a86955f2e4f
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/builtins-alloca.cl
@@ -0,0 +1,86 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL2.0 -emit-llvm -o - | FileCheck --check-prefix=OPENCL20 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 -emit-llvm -o - | FileCheck --check-prefix=OPENCL30 %s
+// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL3.0 -cl-ext=+__opencl_c_generic_address_space -emit-llvm -o - | FileCheck --check-prefix=OPENCL30-EXT %s
+
+// OPENCL12-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL12-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
+// OPENCL12-NEXT:    [[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL12-NEXT:    store ptr addrspace(5) [[TMP0]], ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:    [[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:    ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL20-LABEL: define dso_local ptr @test1(
+// OPENCL20-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:    [[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL20-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to ptr
+// OPENCL20-NEXT:    store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL20-NEXT:    [[TMP2:%.*]] = load ptr, ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL20-NEXT:    ret ptr [[TMP2]]
+//
+// OPENCL30-LABEL: define dso_local ptr addrspace(5) @test1(
+// OPENCL30-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
+// OPENCL30-NEXT:    [[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL30-NEXT:    store ptr addrspace(5) [[TMP0]], ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:    [[TMP1:%.*]] = load ptr addrspace(5), ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:    ret ptr addrspace(5) [[TMP1]]
+//
+// OPENCL30-EXT-LABEL: define dso_local ptr @test1(
+// OPENCL30-EXT-SAME: ) #[[ATTR0:[0-9]+]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:    [[TMP0:%.*]] = alloca i8, i64 128, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:    store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:    [[TMP2:%.*]] = load ptr, ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:    ret ptr [[TMP2]]
+//
+float* test1() {
+    float* alloc_ptr = (float*)__builtin_alloca(32 * sizeof(int));
+    return alloc_ptr;
+}
+
+// OPENCL12-LABEL: define dso_local void @test2(
+// OPENCL12-SAME: ) #[[ATTR0]] {
+// OPENCL12-NEXT:  [[ENTRY:.*:]]
+// OPENCL12-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
+// OPENCL12-NEXT:    [[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL12-NEXT:    store ptr addrspace(5) [[TMP0]], ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL12-NEXT:    ret void
+//
+// OPENCL20-LABEL: define dso_local void @test2(
+// OPENCL20-SAME: ) #[[ATTR0]] {
+// OPENCL20-NEXT:  [[ENTRY:.*:]]
+// OPENCL20-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL20-NEXT:    [[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL20-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to ptr
+// OPENCL20-NEXT:    store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL20-NEXT:    ret void
+//
+// OPENCL30-LABEL: define dso_local void @test2(
+// OPENCL30-SAME: ) #[[ATTR0]] {
+// OPENCL30-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
+// OPENCL30-NEXT:    [[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL30-NEXT:    store ptr addrspace(5) [[TMP0]], ptr addrspace(5) [[ALLOC_PTR]], align 4
+// OPENCL30-NEXT:    ret void
+//
+// OPENCL30-EXT-LABEL: define dso_local void @test2(
+// OPENCL30-EXT-SAME: ) #[[ATTR0]] {
+// OPENCL30-EXT-NEXT:  [[ENTRY:.*:]]
+// OPENCL30-EXT-NEXT:    [[ALLOC_PTR:%.*]] = alloca ptr, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:    [[TMP0:%.*]] = alloca i8, i64 28, align 8, addrspace(5)
+// OPENCL30-EXT-NEXT:    [[TMP1:%.*]] = addrspacecast ptr addrspace(5) [[TMP0]] to ptr
+// OPENCL30-EXT-NEXT:    store ptr [[TMP1]], ptr addrspace(5) [[ALLOC_PTR]], align 8
+// OPENCL30-EXT-NEXT:    ret void
+//
+void test2() {
+    void *alloc_ptr = __builtin_alloca(28);
+}
diff --git a/clang/test/CodeGenOpenCL/memcpy.cl b/clang/test/CodeGenOpenCL/memcpy.cl
old mode 100644
new mode 100755

@arsenm arsenm added OpenCL and removed clang-cl `clang-cl` driver. Don't use for other compiler parts labels Jun 17, 2024
@vg0204 vg0204 force-pushed the vg0204/__builtin_alloca-addrSpace-issue-in-openCL1.2 branch 4 times, most recently from 3814833 to 1dbb2d5 Compare June 18, 2024 15:31
@vg0204 vg0204 requested a review from arsenm June 21, 2024 07:07
@vg0204
Copy link
Contributor Author

vg0204 commented Jul 2, 2024

Ping!!

// OPENCL30-EXT-NEXT: [[ALLOC_PTR_UNINITIALIZED:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
// OPENCL30-EXT-NEXT: [[ALLOC_PTR_ALIGN:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
// OPENCL30-EXT-NEXT: [[ALLOC_PTR_ALIGN_UNINITIALIZED:%.*]] = alloca ptr addrspace(5), align 4, addrspace(5)
// OPENCL30-EXT-NEXT: store i32 [[N]], ptr addrspace(5) [[N_ADDR]], align 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check all these extra lines?

@AnastasiaStulova AnastasiaStulova requested a review from svenvh July 2, 2024 13:59
@vg0204 vg0204 changed the title [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below [Clang] [WIP] Added builtin_alloca right Address Space for OpenCL Jul 3, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, since this function also has non-OpenCL specific code, it would make more sense to remove OpenCL from the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename wasn't done. Also should start with lowercase letter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could place this non-OpenCL specific code just before the invocation of this function. My idea to do so was to wrap everything in a function, so just a single function call comes into picture.

@vg0204 vg0204 force-pushed the vg0204/__builtin_alloca-addrSpace-issue-in-openCL1.2 branch from f571a02 to 28b8bc1 Compare July 10, 2024 05:54
@vg0204
Copy link
Contributor Author

vg0204 commented Jul 19, 2024

PING!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rename wasn't done. Also should start with lowercase letter

Comment on lines 1988 to 1989
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an assert since you're only calling for the specific set of builtins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead could only conditionally call the function if S.getLangOpts().OpenCL

@vg0204
Copy link
Contributor Author

vg0204 commented Jul 22, 2024

@arsenm, updated with needed reviewed changes!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return value is unused

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot fail

@arsenm arsenm changed the title [Clang] [WIP] Added builtin_alloca right Address Space for OpenCL [Clang] Use private address space for builtin_alloca return type for OpenCL Jul 23, 2024
Comment on lines 2 to 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the unused version specific check prefixes

@vg0204 vg0204 force-pushed the vg0204/__builtin_alloca-addrSpace-issue-in-openCL1.2 branch from fa1efeb to f500da9 Compare July 24, 2024 07:34
The __builtin_alloca was returning a flat pointer with no address
space when compiled using openCL1.2 or below but worked fine with
openCL2.0 and above. This accounts to the fact that later uses the
concept of generic address space which supports cast to other address
space(i.e to private address space which is used for stack allocation)
.

So, in general for OpenCL, built_alloca should always return pointer
to private address space, thus eliminating need of use of address
space cast. Thus,it requires redefintion of the builtin function with
return pointer type to private address space.
@vg0204 vg0204 force-pushed the vg0204/__builtin_alloca-addrSpace-issue-in-openCL1.2 branch from 87c1153 to c4145d2 Compare July 26, 2024 07:15
@vg0204 vg0204 merged commit d65f037 into llvm:main Jul 26, 2024
Comment on lines +2225 to +2227
if (getLangOpts().OpenCL) {
builtinAllocaAddrSpace(*this, TheCall);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late tot this review, but right now we have getASTAllocaAddressSpace() in CodeGen/, could this be used moved so we can make use of it in Sema instead?

@vg0204 vg0204 deleted the vg0204/__builtin_alloca-addrSpace-issue-in-openCL1.2 branch December 16, 2024 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category OpenCL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants