Skip to content

Conversation

Michael137
Copy link
Member

Part of #149827

This patch adds special handling for AsmLabels created by LLDB. LLDB uses AsmLabels to encode information about a function declaration to make it easier to locate function symbols when JITing C++ expressions. For constructors/destructors LLDB doesn't know at the time of creating the AsmLabelAttr which structor variant the expression evaluator will need to call (this is decided when compiling the expression). So we make the Clang mangler inject this information into our custom label when we're JITting the expression.

…for LLDB JIT expressions

Part of llvm#149827

This patch adds special handling for `AsmLabel`s created by LLDB. LLDB
uses `AsmLabel`s to encode information about a function declaration to
make it easier to locate function symbols when JITing C++ expressions.
For constructors/destructors LLDB doesn't know at the time of creating
the `AsmLabelAttr` which structor variant the expression evaluator will
need to call (this is decided when compiling the expression). So we make
the Clang mangler inject this information into our custom label when
we're JITting the expression.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-lldb

@llvm/pr-subscribers-clang

Author: Michael Buch (Michael137)

Changes

Part of #149827

This patch adds special handling for AsmLabels created by LLDB. LLDB uses AsmLabels to encode information about a function declaration to make it easier to locate function symbols when JITing C++ expressions. For constructors/destructors LLDB doesn't know at the time of creating the AsmLabelAttr which structor variant the expression evaluator will need to call (this is decided when compiling the expression). So we make the Clang mangler inject this information into our custom label when we're JITting the expression.


Full diff: https://github.com/llvm/llvm-project/pull/155485.diff

2 Files Affected:

  • (modified) clang/lib/AST/Mangle.cpp (+32-1)
  • (modified) clang/unittests/AST/DeclTest.cpp (+75)
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 0bfb51c11f0a5..1131477fa7200 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -152,6 +152,33 @@ bool MangleContext::shouldMangleDeclName(const NamedDecl *D) {
   return shouldMangleCXXName(D);
 }
 
+static llvm::StringRef g_lldb_func_call_label_prefix = "$__lldb_func:";
+
+/// Given an LLDB function call label, this function prints the label
+/// into \c Out, together with the structor type of \c GD (if the
+/// decl is a constructor/destructor). LLDB knows how to handle mangled
+/// names with this encoding.
+///
+/// Example input label:
+///   $__lldb_func::123:456:~Foo
+///
+/// Example output:
+///   $__lldb_func:D1:123:456:~Foo
+///
+static void emitLLDBAsmLabel(llvm::StringRef label, GlobalDecl GD,
+                             llvm::raw_ostream &Out) {
+  assert(label.starts_with(g_lldb_func_call_label_prefix));
+
+  Out << g_lldb_func_call_label_prefix;
+
+  if (llvm::isa<clang::CXXConstructorDecl>(GD.getDecl()))
+    Out << "C" << GD.getCtorType();
+  else if (llvm::isa<clang::CXXDestructorDecl>(GD.getDecl()))
+    Out << "D" << GD.getDtorType();
+
+  Out << label.substr(g_lldb_func_call_label_prefix.size());
+}
+
 void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   const ASTContext &ASTContext = getASTContext();
   const NamedDecl *D = cast<NamedDecl>(GD.getDecl());
@@ -185,7 +212,11 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
     if (!UserLabelPrefix.empty())
       Out << '\01'; // LLVM IR Marker for __asm("foo")
 
-    Out << ALA->getLabel();
+    if (ALA->getLabel().starts_with(g_lldb_func_call_label_prefix))
+      emitLLDBAsmLabel(ALA->getLabel(), GD, Out);
+    else
+      Out << ALA->getLabel();
+
     return;
   }
 
diff --git a/clang/unittests/AST/DeclTest.cpp b/clang/unittests/AST/DeclTest.cpp
index 6b443918ec137..4bd7886ef9b35 100644
--- a/clang/unittests/AST/DeclTest.cpp
+++ b/clang/unittests/AST/DeclTest.cpp
@@ -16,6 +16,7 @@
 #include "clang/AST/Mangle.h"
 #include "clang/ASTMatchers/ASTMatchFinder.h"
 #include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/Basic/ABI.h"
 #include "clang/Basic/Diagnostic.h"
 #include "clang/Basic/LLVM.h"
 #include "clang/Basic/TargetInfo.h"
@@ -102,6 +103,80 @@ TEST(Decl, AsmLabelAttr) {
                      "foo");
 }
 
+TEST(Decl, AsmLabelAttr_LLDB) {
+  StringRef Code = R"(
+    struct S {
+      void f() {}
+      S() = default;
+      ~S() = default;
+    };
+  )";
+  auto AST =
+      tooling::buildASTFromCodeWithArgs(Code, {"-target", "i386-apple-darwin"});
+  ASTContext &Ctx = AST->getASTContext();
+  assert(Ctx.getTargetInfo().getUserLabelPrefix() == StringRef("_") &&
+         "Expected target to have a global prefix");
+  DiagnosticsEngine &Diags = AST->getDiagnostics();
+
+  const auto *DeclS =
+      selectFirst<CXXRecordDecl>("d", match(cxxRecordDecl().bind("d"), Ctx));
+
+  auto *DeclF = *DeclS->method_begin();
+  auto *Ctor = *DeclS->ctor_begin();
+  auto *Dtor = DeclS->getDestructor();
+
+  ASSERT_TRUE(DeclF);
+  ASSERT_TRUE(Ctor);
+  ASSERT_TRUE(Dtor);
+
+  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "$__lldb_func::123:123:_Z1fv"));
+  Ctor->addAttr(AsmLabelAttr::Create(Ctx, "$__lldb_func::123:123:S"));
+  Dtor->addAttr(AsmLabelAttr::Create(Ctx, "$__lldb_func::123:123:~S"));
+
+  std::unique_ptr<ItaniumMangleContext> MC(
+      ItaniumMangleContext::create(Ctx, Diags));
+
+  {
+    std::string Mangled;
+    llvm::raw_string_ostream OS_Mangled(Mangled);
+    MC->mangleName(DeclF, OS_Mangled);
+
+    ASSERT_EQ(Mangled, "\x01$__lldb_func::123:123:_Z1fv");
+  };
+
+  {
+    std::string Mangled;
+    llvm::raw_string_ostream OS_Mangled(Mangled);
+    MC->mangleName(GlobalDecl(Ctor, CXXCtorType::Ctor_Complete), OS_Mangled);
+
+    ASSERT_EQ(Mangled, "\x01$__lldb_func:C0:123:123:S");
+  };
+
+  {
+    std::string Mangled;
+    llvm::raw_string_ostream OS_Mangled(Mangled);
+    MC->mangleName(GlobalDecl(Ctor, CXXCtorType::Ctor_Base), OS_Mangled);
+
+    ASSERT_EQ(Mangled, "\x01$__lldb_func:C1:123:123:S");
+  };
+
+  {
+    std::string Mangled;
+    llvm::raw_string_ostream OS_Mangled(Mangled);
+    MC->mangleName(GlobalDecl(Dtor, CXXDtorType::Dtor_Deleting), OS_Mangled);
+
+    ASSERT_EQ(Mangled, "\x01$__lldb_func:D0:123:123:~S");
+  };
+
+  {
+    std::string Mangled;
+    llvm::raw_string_ostream OS_Mangled(Mangled);
+    MC->mangleName(GlobalDecl(Dtor, CXXDtorType::Dtor_Base), OS_Mangled);
+
+    ASSERT_EQ(Mangled, "\x01$__lldb_func:D2:123:123:~S");
+  };
+}
+
 TEST(Decl, MangleDependentSizedArray) {
   StringRef Code = R"(
     template <int ...N>

@Michael137
Copy link
Member Author

Michael137 commented Aug 26, 2025

@AaronBallman I think you looked at this already in #149827, but thought it's best I split this out since it's sufficiently niche behaviour to warrant its own commit.

@Michael137
Copy link
Member Author

LLDB doesn't currently support inheriting ctors in the expression evaluator but I added this capability here for completeness.

@Michael137 Michael137 requested a review from cor3ntin September 3, 2025 15:01
@Michael137
Copy link
Member Author

friendly ping

@Michael137 Michael137 merged commit db8cad0 into llvm:main Sep 9, 2025
9 checks passed
@Michael137 Michael137 deleted the clang/lldb-asmlabel branch September 9, 2025 08:08
Michael137 added a commit that referenced this pull request Sep 9, 2025
#149827)

Depends on
* #148877
* #155483
* #155485
* #154137
* #154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In #154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 9, 2025
… call labels (#149827)

Depends on
* llvm/llvm-project#148877
* llvm/llvm-project#155483
* llvm/llvm-project#155485
* llvm/llvm-project#154137
* llvm/llvm-project#154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
llvm/llvm-project#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In llvm/llvm-project#154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 9, 2025
…for LLDB JIT expressions (llvm#155485)

Part of llvm#149827

This patch adds special handling for `AsmLabel`s created by LLDB. LLDB
uses `AsmLabel`s to encode information about a function declaration to
make it easier to locate function symbols when JITing C++ expressions.
For constructors/destructors LLDB doesn't know at the time of creating
the `AsmLabelAttr` which structor variant the expression evaluator will
need to call (this is decided when compiling the expression). So we make
the Clang mangler inject this information into our custom label when
we're JITting the expression.

(cherry picked from commit db8cad0)
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 9, 2025
llvm#149827)

Depends on
* llvm#148877
* llvm#155483
* llvm#155485
* llvm#154137
* llvm#154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
llvm#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In llvm#154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
(cherry picked from commit 57a7907)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category lldb

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants