-
Notifications
You must be signed in to change notification settings - Fork 0
[Clang][SYCL] Introduce clang-sycl-link-wrapper to link SYCL offloading device code #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
sarnex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks! just some comments below.
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Purpose of this wrapper is to provide an interface to link SYCL device bitcode | |
| The purpose of this wrapper is to provide an interface to link SYCL device bitcode |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Device code linking for SYCL offloading kind has a number of known quirks that | |
| Device code linking for SYCL offloading has a number of known quirks that |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| IR bitcode to gaurantee conformance to SYCL standards. This step is unique to | |
| IR bitcode to guarantee conformance to SYCL standards. This step is unique to |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SYCL offloading compilation flow. | |
| the SYCL offloading compilation flow. |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 2. SPIR-V LLVM Translator tool is an extenal tool and hence SPIR-V IR code | |
| 2. SPIR-V LLVM Translator tool is an external tool and hence SPIR-V IR code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i forget, are there plans to have the driver own this list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable to hard-code the list here as this is a SYCL specific tool.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well with the old offload model the driver also owns a copy, so once we have the new model enabled and remove support for the old model, if this is the single list then that's fine, i just want to avoid two copies of the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is nullopt return valid? should it be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user does not specify a valid device library location, then we simply do not link any device lib files. It need not be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user here is the driver/clang-linker-wrapper right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the llvm::sys::path namespace has a filename function to extract the filename from the path, can we use that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to note in a comment or something that we will need to add processing of the linked module (sycl-post-link) later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add a PR comment about upcoming PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's an error, i think we won't remove the temp files even if --save-temps is not passed, right? reportError is noreturn, so we probably need something that will always run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reportError is being used in other tools as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reportError is fine but checking for save_temps after a reportError won't work I think. If the same thing happens in other tools then it's fine for this PR
I think it might be easier to communicate SYCL runtime requirements by adding SYCL operating system type or environment type to the target triple. E.g. CUDA uses One concern is that Codeplay is already using |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ==================== | |
| Clang SYCL link Wrapper | |
| ==================== | |
| ======================= | |
| Clang SYCL link Wrapper | |
| ======================= |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my understanding, the purpose of the tool is to provide standard linker interface and be able to link any object file formats 'compile' step is able to produce for SYCL offload mode. So, I would expect this tool to handle linking of fat objects with device code in other IR like SPIR-V as well as native binary format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bader
Thanks for the feedback. I agree with you. This tool is expected to handle linking of fat objects and SPIR-V objects. However, in its current state, it can support only bitcode files. We can extend this tool easily to support linking of other objects. I was planning to update the document as the tool evolves. I will add more details to the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Let's clearly state the final goal for the tool, current status of the implementation, and plans for the future work.
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| be passed down to downstrea tools like 'llvm-link', 'llvm-spirv', etc. | |
| be passed down to downstrea tools like 'clang', 'llvm-spirv', etc. |
We should use clang instead of llvm-link to link llvm bitcode files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a TODO. I do have a tracker to replace llvm-link with clang. '-only-needed' linking support seems to be missing in clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think llvm-link is not a part of clang distribution, so SYCL linking won't work. We need to resolve that ASAP.
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about AOT targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to add it as the implementation evolves in near future. But I will add it now and provide a PR comment about upcoming changes.
Thanks
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, specify that --sycl-link is the clang driver option, not clang-sycl-link-wrapper option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-paste from another source? I expect this comment reference spir-v instead of nvptx and amdgpu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry. I thought I had cleaned it up. We do not really need this comment. Will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either change sycl to spirv in the option name of vice versa in the description. Otherwise, it looks a bit messy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Special option to pass in llvm-spirv options | |
| // Options to pass to llvm-spirv tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Ssave intermediary results. | |
| /// Save intermediary results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SmallString<128> SPIRVDumpDir; | |
| static SmallString<128> SPIRVDumpDir; |
I think it would be best to avoid that TBH. I briefly looked at the patch, and I don't understand (yet) why we would need extra information. I'll make a review. If we have to, environment would be best IMO so as to not compete with existing TC logic that we can reuse. But the point of the device compiler is to lower to a specific environment (OpenCL, Level 0, HIP, CUDA), so I don't see why the middle or backend should need to know that to function properly and so I would personally prefer an extra flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I have a better understanding now than when I wrote #1 (comment)
You shouldn't need that flag... if the driver was there. If it was, the condition should be if (JA.isDeviceOffloading(Action::OFK_SYCL)) rather than this global modifier. IMHO, the driver change shouldn't be part of this patch:
- there is a better way to do it but can't implement this because it lacks the required sycl offloading driver logic.
- looking at the test, you don't need this to test
clang-sycl-linker-wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @Naghasan
Here, the clang driver is expected to be called from inside clang-linker-wrapper to link device images and we pass --sycl-link (and no -fsycl) to invoke the 'link' stage.
Adding @mdtoguchi for more details here.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of JA.isDeviceOffloading() is useful when we know we are offloading and determining what kind we are offloading to. From what I understand, the clang call from within the wrapper is not to be considered as an offloading enabling invocation, but rather a target specific call to produce a specific type of binary.
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The wrapper will also support the Ahead-Of-Time compilation flow. AOT | |
| The wrapper will also support the Ahead-Of-Time (AOT) compilation flow. AOT |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| generation is deferred until application runtime time. | |
| generation is deferred until application runtime. |
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| makes it difficult to use in a unified offloading setting. Two of the primary | |
| make it difficult to use in a unified offloading setting. Two of the primary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'makes' goes well with 'Device code linking'. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reads interesting, as the association can also be with 'quirks'. It is the quirks that make it difficult - it's not the device code linking that makes it difficult.
clang/docs/ClangSYCLLinkWrapper.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need 'sycl' here? The tool is a SYCL specific tool - seems redundant.
sarnex
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no major flags from me assuming my comments are addressed in a seperate pr
…ng device code This PR is one of the many PRs in the SYCL upstreaming effort focusing on device code linking during the SYCL offload compilation process. RFC: https://discourse.llvm.org/t/rfc-offloading-design-for-sycl-offload-kind-and-spir-targets/74088 In this PR, we introduce a new tool that will be used to perform device code linking for SYCL offload kind. It accepts SYCL device objects in LLVM IR bitcode format and will generate a fully linked device object that can then be wrapped and linked into the host object. A primary use case for this tool is to perform device code linking for objects with SYCL offload kind inside the clang-linker-wrapper. It can also be invoked via clang driver as follows: `clang --target=spirv64 --sycl-link input.bc` Device code linking for SYCL offloading kind has a number of known quirks that makes it difficult to use in a unified offloading setting. Two of the primary issues are: 1. Several finalization steps are required to be run on the fully-linked LLVM IR bitcode to gaurantee conformance to SYCL standards. This step is unique to SYCL offloading compilation flow. 2. SPIR-V LLVM Translator tool is an extenal tool and hence SPIR-V IR code generation cannot be done as part of LTO. This limitation will be lifted once SPIR-V backend is available as a viable LLVM backend. Hence, we introduce this new tool to provide a clean wrapper to perform SYCL device linking. Thanks Signed-off-by: Arvind Sudarsanam <[email protected]>
856c205 to
eff4a03
Compare
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
…of changes missed out from earlier commit. Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
Signed-off-by: Arvind Sudarsanam <[email protected]>
… when only one input is present Signed-off-by: Arvind Sudarsanam <[email protected]>
… some comments Signed-off-by: Arvind Sudarsanam <[email protected]>
When compiling VLS SVE, the compiler often replaces VL-based offsets
with immediate-based ones. This leads to a mismatch in the allowed
addressing modes due to SVE loads/stores generally expecting immediate
offsets relative to VL. For example, given:
```c
svfloat64_t foo(const double *x) {
svbool_t pg = svptrue_b64();
return svld1_f64(pg, x+svcntd());
}
```
When compiled with `-msve-vector-bits=128`, we currently generate:
```gas
foo:
ptrue p0.d
mov x8, #2
ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
ret
```
Instead, we could be generating:
```gas
foo:
ldr z0, [x0, #1, mul vl]
ret
```
Likewise for other types, stores, and other VLS lengths.
This patch achieves the above by extending `SelectAddrModeIndexedSVE`
to let constants through when `vscale` is known.
`TestReportData.py` is failing on the macOS CI with:
```
Traceback (most recent call last):
File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 1784, in test_method
return attrvalue(self)
File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 148, in wrapper
return func(*args, **kwargs)
File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/API/functionalities/asan/TestReportData.py", line 28, in test_libsanitizers_asan
self.asan_tests(libsanitizers=True)
File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/test/API/functionalities/asan/TestReportData.py", line 60, in asan_tests
self.expect(
File "/Users/ec2-user/jenkins/workspace/llvm.org/lldb-cmake/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2490, in expect
self.fail(log_msg)
AssertionError: Ran command:
"thread list"
Got output:
Process 3474 stopped
* thread #1: tid = 0x38b5e9, 0x00007ff80f563b52 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
Expecting sub string: "stopped" (was found)
Expecting sub string: "stop reason = Use of deallocated memory" (was not found)
Process should be stopped due to ASan report
```
There isn't much to go off of in the log, so adding more to help us debug this.
These are macOS tests only and are currently failing on the x86_64 CI and on arm64 on recent versions of macOS/Xcode. The tests are failing because we're stopping in: ``` Process 17458 stopped * thread #1: tid = 0xbda69a, 0x00000002735bd000 libsystem_malloc.dylib`purgeable_print_self.cold.1, stop reason = EXC_BREAKPOINT (code=1, subcode=0x2735bd000) ``` instead of the libsanitizers library. This seems to be related to `-fsanitize-trivial-abi` support Skip these for now until we figure out the root cause.
…d A520 (llvm#132246) Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.
… A510/A520 (llvm#134606) Recommit. This work was done by llvm#132246 but failed buildbots due to the test introduced needing updates Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.
…s=128. (llvm#134068) When compiling with -msve-vector-bits=128 or vscale_range(1, 1) and when the offsets allow it, we can pair SVE LDR/STR instructions into Neon LDP/STP. For example, given: ```cpp #include <arm_sve.h> void foo(double const *ldp, double *stp) { svbool_t pg = svptrue_b64(); svfloat64_t ld1 = svld1_f64(pg, ldp); svfloat64_t ld2 = svld1_f64(pg, ldp+svcntd()); svst1_f64(pg, stp, ld1); svst1_f64(pg, stp+svcntd(), ld2); } ``` When compiled with `-msve-vector-bits=128`, we currently generate: ```gas foo: ldr z0, [x0] ldr z1, [x0, #1, mul vl] str z0, [x1] str z1, [x1, #1, mul vl] ret ``` With this patch, we instead generate: ```gas foo: ldp q0, q1, [x0] stp q0, q1, [x1] ret ``` This is an alternative, more targetted approach to llvm#127500.
…ctor-bits=128." (llvm#134997) Reverts llvm#134068 Caused a stage 2 build failure: https://lab.llvm.org/buildbot/#/builders/41/builds/6016 ``` FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp Opcode has unknown scale! UNREACHABLE executed at ../llvm/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4530! PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp'. 4. Running pass 'AArch64 load / store optimization pass' on function '@"_ZNSt17_Function_handlerIFN4llvm8ExpectedISt8functionIFNS1_ISt10unique_ptrINS0_16CachedFileStreamESt14default_deleteIS4_EEEEjRKNS0_5TwineEEEEEjNS0_9StringRefESB_EZNS0_10localCacheESB_SB_SB_S2_IFvjSB_S3_INS0_12MemoryBufferES5_ISH_EEEEE3$_0E9_M_invokeERKSt9_Any_dataOjOSF_SB_"' #0 0x0000b6eae9b67bf0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c7bf0) #1 0x0000b6eae9b65aec llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c5aec) #2 0x0000b6eae9acd5f4 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 #3 0x0000f16c1aff28f8 (linux-vdso.so.1+0x8f8) #4 0x0000f16c1aacf1f0 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #5 0x0000f16c1aa8a67c gsignal ./signal/../sysdeps/posix/raise.c:27:6 #6 0x0000f16c1aa77130 abort ./stdlib/abort.c:81:7 llvm#7 0x0000b6eae9ad6628 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x8136628) llvm#8 0x0000b6eae72e95a8 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x59495a8) llvm#9 0x0000b6eae74ca9a8 (anonymous namespace)::AArch64LoadStoreOpt::findMatchingInsn(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, (anonymous namespace)::LdStPairFlags&, unsigned int, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#10 0x0000b6eae74c85a8 (anonymous namespace)::AArch64LoadStoreOpt::tryToPairLdStInst(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) AArch64LoadStoreOptimizer.cpp:0:0 llvm#11 0x0000b6eae74c624c (anonymous namespace)::AArch64LoadStoreOpt::optimizeBlock(llvm::MachineBasicBlock&, bool) AArch64LoadStoreOptimizer.cpp:0:0 llvm#12 0x0000b6eae74c429c (anonymous namespace)::AArch64LoadStoreOpt::runOnMachineFunction(llvm::MachineFunction&) AArch64LoadStoreOptimizer.cpp:0:0 ```
…vailable (llvm#135343) When a frame is inlined, LLDB will display its name in backtraces as follows: ``` * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.3 * frame #0: 0x0000000100000398 a.out`func() [inlined] baz(x=10) at inline.cpp:1:42 frame #1: 0x0000000100000398 a.out`func() [inlined] bar() at inline.cpp:2:37 frame #2: 0x0000000100000398 a.out`func() at inline.cpp:4:15 frame #3: 0x00000001000003c0 a.out`main at inline.cpp:7:5 frame #4: 0x000000026eb29ab8 dyld`start + 6812 ``` The longer the names get the more confusing this gets because the first function name that appears is the parent frame. My assumption (which may need some more surveying) is that for the majority of cases we only care about the actual frame name (not the parent). So this patch removes all the special logic that prints the parent frame. Another quirk of the current format is that the inlined frame name does not abide by the `${function.name-XXX}` format variables. We always just print the raw demangled name. With this patch, we would format the inlined frame name according to the `frame-format` setting (see the test-cases). If we really want to have the `parentFrame [inlined] inlinedFrame` format, we could expose it through a new `frame-format` variable (e..g., `${function.inlined-at-name}` and let the user decide where to place things.
Currently, given:
```cpp
uint64_t incb(uint64_t x) {
return x+svcntb();
}
```
LLVM generates:
```gas
incb:
addvl x0, x0, #1
ret
```
Which is equivalent to:
```gas
incb:
incb x0
ret
```
However, on microarchitectures like the Neoverse V2 and Neoverse V3,
the second form (with INCB) can have significantly better latency and
throughput (according to their SWOG). On the Neoverse V2, for example,
ADDVL has a latency and throughput of 2, whereas some forms of INCB
have a latency of 1 and a throughput of 4. The same applies to DECB.
This patch adds patterns to prefer the cheaper INCB/DECB forms over
ADDVL where applicable.
Fixes llvm#123300 What is seen ``` clang-repl> int x = 42; clang-repl> auto capture = [&]() { return x * 2; }; In file included from <<< inputs >>>:1: input_line_4:1:17: error: non-local lambda expression cannot have a capture-default 1 | auto capture = [&]() { return x * 2; }; | ^ zsh: segmentation fault clang-repl --Xcc="-v" (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8) * frame #0: 0x0000000107b4f8b8 libclang-cpp.19.1.dylib`clang::IncrementalParser::CleanUpPTU(clang::PartialTranslationUnit&) + 988 frame #1: 0x0000000107b4f1b4 libclang-cpp.19.1.dylib`clang::IncrementalParser::ParseOrWrapTopLevelDecl() + 416 frame #2: 0x0000000107b4fb94 libclang-cpp.19.1.dylib`clang::IncrementalParser::Parse(llvm::StringRef) + 612 frame #3: 0x0000000107b52fec libclang-cpp.19.1.dylib`clang::Interpreter::ParseAndExecute(llvm::StringRef, clang::Value*) + 180 frame #4: 0x0000000100003498 clang-repl`main + 3560 frame #5: 0x000000018d39a0e0 dyld`start + 2360 ``` Though the error is justified, we shouldn't be interested in exiting through a segfault in such cases. The issue is that empty named decls weren't being taken care of resulting into this assert https://github.com/llvm/llvm-project/blob/c1a229252617ed58f943bf3f4698bd8204ee0f04/clang/include/clang/AST/DeclarationName.h#L503 Can also be seen when the example is attempted through xeus-cpp-lite. 
# Symptom We have seen SIGSEGV like this: ``` * thread #1, name = 'lldb-server', stop reason = SIGSEGV frame #0: 0x00007f39e529c993 libc.so.6`__pthread_kill_internal(signo=11, threadid=<unavailable>) at pthread_kill.c:46:37 ... * frame #5: 0x000056027c94fe48 lldb-server`lldb_private::process_linux::GetPtraceScope() + 72 frame #6: 0x000056027c92f94f lldb-server`lldb_private::process_linux::NativeProcessLinux::Attach(int) + 1087 ... ``` See [full stack trace](https://pastebin.com/X0d6QhYj). This happens on Linux where LLDB doesn't have access to `/proc/sys/kernel/yama/ptrace_scope`. A similar error (an unchecked `Error`) can be reproduced by running the newly added unit test without the fix. See the "Test" section below. # Root cause `GetPtraceScope()` ([code](https://github.com/llvm/llvm-project/blob/328f40f408c218f25695ea42c844e43bef38660b/lldb/source/Plugins/Process/Linux/Procfs.cpp#L77)) has the following `if` statement: ``` llvm::Expected<int> lldb_private::process_linux::GetPtraceScope() { ErrorOr<std::unique_ptr<MemoryBuffer>> ptrace_scope_file = getProcFile("sys/kernel/yama/ptrace_scope"); if (!*ptrace_scope_file) return errorCodeToError(ptrace_scope_file.getError()); ... } ``` The intention of the `if` statement is to check whether the `ptrace_scope_file` is an `Error` or not, and return the error if it is. However, the `operator*` of `ErrorOr` returns the value that is stored (which is a `std::unique_ptr<MemoryBuffer>`), so what the `if` condition actually do is to check if the unique pointer is non-null. Note that the method `ErrorOr::getStorage()` ([called by](https://github.com/llvm/llvm-project/blob/328f40f408c218f25695ea42c844e43bef38660b/llvm/include/llvm/Support/ErrorOr.h#L162-L164) `ErrorOr::operator *`) **does** assert on whether or not `HasError` has been set (see [ErrorOr.h](https://github.com/llvm/llvm-project/blob/328f40f408c218f25695ea42c844e43bef38660b/llvm/include/llvm/Support/ErrorOr.h#L235-L243)). However, it seems this wasn't executed, probably because the LLDB was a release build. # Fix The fix is simply remove the `*` in the said `if` statement.
…142952) This was removed in llvm#135343 in favour of making it a format variable, which we do here. This follows the precedent of the `[opt]` and `[artificial]` markers. Before: ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` After (note the `[inlined]` markers): ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 [inlined] frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 [inlined] frame #3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame #4: 0x0000000186345be4 dyld`start + 7040 ``` rdar://152642178
This PR is one of the many PRs in the SYCL upstreaming effort focusing on device code linking during the SYCL offload compilation process. RFC: https://discourse.llvm.org/t/rfc-offloading-design-for-sycl-offload-kind-and-spir-targets/74088
In this PR, we introduce a new tool that will be used to perform device code linking for SYCL offload kind. It accepts SYCL device objects in LLVM IR bitcode format and will generate a fully linked device object that can then be wrapped and linked into the host object.
A primary use case for this tool is to perform device code linking for objects with SYCL offload kind inside the clang-linker-wrapper. It can also be invoked via clang driver as follows:
clang --target=spirv64 --sycl-link input.bcDevice code linking for SYCL offloading kind has a number of known quirks that makes it difficult to use in a unified offloading setting. Two of the primary issues are:
Hence, we introduce this new tool to provide a clean wrapper to perform SYCL device linking.
List of items supported in this PR:
List of upcoming PRs:
Thanks