Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVPTX] mangle symbols in debug info to conform to PTX restrictions. #113216

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

Artem-B
Copy link
Member

@Artem-B Artem-B commented Oct 21, 2024

Until now debug info was printing the symbols names as-is and that resulted in invalid PTX when the symbols contained characters that are invalid for PTX. E.g. __PRETTY_FUNCTION.something

Debug info is somewhat disconnected from the symbols themselves, so the regular "NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes them, as needed. One gotcha is that the same code path is used to print the names of debug info sections. Those section names do start with a '.debug'. The dot in those names is nominally illegal in PTX, but the debug section names with a dot are accepted as a special case. The downside of this change is that if someone ever has a .debug* symbol that needs to be referred to from the debug info, that label will be passed through as-is, and will still produce broken PTX output. If/when we run into a case where we need it to work, we could consider only passing through specific debug section names, or add a mechanist allowing us to tell section names apart from regular symbols.

Fixes #58491

@llvmbot
Copy link
Member

llvmbot commented Oct 21, 2024

@llvm/pr-subscribers-backend-nvptx

@llvm/pr-subscribers-debuginfo

Author: Artem Belevich (Artem-B)

Changes

Until now debug info was printing the symbols names as-is and that resulted in invalid PTX when the symbols contained characters that are incalid for PTX. E.g. __PRETTY_FUNCTION.something

Debug info is somewhat disconnected from the symbols themselves, so the regular "NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes them, as needed. One gotcha is that the same code path is used to print the names of debug info sections. Those section names do start with a '.debug'. The dot in those names is nominally illegal in PTX, but the debug section names with a dot are accepted as a special case. The downside of this change is that if someone ever has a .debug* symbol that needs to be referred to from the debug info, that label will be passed through as-is, and will still produce broken PTX output. If/when we run into a case where we need it to work, we could consider only passing through specific debug section names, or add a mechanist allowing us to tell section names apart from regular symbols.

Fixes #58491


Full diff: https://github.com/llvm/llvm-project/pull/113216.diff

6 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp (+17)
  • (modified) llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h (+2)
  • (modified) llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp (+4-25)
  • (modified) llvm/lib/Target/NVPTX/NVPTXUtilities.cpp (+2-1)
  • (modified) llvm/lib/Target/NVPTX/NVPTXUtilities.h (+15)
  • (added) llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll (+42)
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp
index fc207b1a8871af..9f911436ff4aa8 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.cpp
@@ -11,9 +11,13 @@
 //===----------------------------------------------------------------------===//
 
 #include "NVPTXTargetStreamer.h"
+#include "NVPTXUtilities.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
+#include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCObjectFileInfo.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Support/Casting.h"
 
 using namespace llvm;
 
@@ -135,3 +139,16 @@ void NVPTXTargetStreamer::emitRawBytes(StringRef Data) {
 #endif
 }
 
+void NVPTXTargetStreamer::emitValue(const MCExpr *Value) {
+  if (Value->getKind() == MCExpr::SymbolRef) {
+    const MCSymbolRefExpr &SRE = cast<MCSymbolRefExpr>(*Value);
+    StringRef SymName = SRE.getSymbol().getName();
+    if (!SymName.starts_with(".debug")) {
+      Streamer.emitRawText(NVPTX::getValidPTXIdentifier(SymName));
+      return;
+    }
+    // Fall through to the normal printing.
+  }
+  // Otherwise, print the Value normally.
+  MCTargetStreamer::emitValue(Value);
+}
diff --git a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h
index ca0d84ee2079ae..36db76065e0c07 100644
--- a/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h
+++ b/llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXTargetStreamer.h
@@ -48,6 +48,8 @@ class NVPTXTargetStreamer : public MCTargetStreamer {
   ///
   /// This is used to emit bytes in \p Data as sequence of .byte directives.
   void emitRawBytes(StringRef Data) override;
+  /// Makes sure that labels are mangled the same way as the actual symbols.
+  void emitValue(const MCExpr *Value) override;
 };
 
 class NVPTXAsmTargetStreamer : public NVPTXTargetStreamer {
diff --git a/llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp b/llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
index 9f5a217795183b..724ef7fe98322e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAssignValidGlobalNames.cpp
@@ -17,13 +17,11 @@
 //===----------------------------------------------------------------------===//
 
 #include "NVPTX.h"
-#include "llvm/ADT/StringExtras.h"
+#include "NVPTXUtilities.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/LegacyPassManager.h"
 #include "llvm/IR/Module.h"
-#include "llvm/Support/raw_ostream.h"
-#include <string>
 
 using namespace llvm;
 
@@ -35,11 +33,8 @@ class NVPTXAssignValidGlobalNames : public ModulePass {
   NVPTXAssignValidGlobalNames() : ModulePass(ID) {}
 
   bool runOnModule(Module &M) override;
-
-  /// Clean up the name to remove symbols invalid in PTX.
-  std::string cleanUpName(StringRef Name);
 };
-}
+} // namespace
 
 char NVPTXAssignValidGlobalNames::ID = 0;
 
@@ -58,34 +53,18 @@ bool NVPTXAssignValidGlobalNames::runOnModule(Module &M) {
       // Note: this does not create collisions - if setName is asked to set the
       // name to something that already exists, it adds a proper postfix to
       // avoid collisions.
-      GV.setName(cleanUpName(GV.getName()));
+      GV.setName(NVPTX::getValidPTXIdentifier(GV.getName()));
     }
   }
 
   // Do the same for local functions.
   for (Function &F : M.functions())
     if (F.hasLocalLinkage())
-      F.setName(cleanUpName(F.getName()));
+      F.setName(NVPTX::getValidPTXIdentifier(F.getName()));
 
   return true;
 }
 
-std::string NVPTXAssignValidGlobalNames::cleanUpName(StringRef Name) {
-  std::string ValidName;
-  raw_string_ostream ValidNameStream(ValidName);
-  for (char C : Name) {
-    // While PTX also allows '%' at the start of identifiers, LLVM will throw a
-    // fatal error for '%' in symbol names in MCSymbol::print. Exclude for now.
-    if (isAlnum(C) || C == '_' || C == '$') {
-      ValidNameStream << C;
-    } else {
-      ValidNameStream << "_$_";
-    }
-  }
-
-  return ValidNameStream.str();
-}
-
 ModulePass *llvm::createNVPTXAssignValidGlobalNamesPass() {
   return new NVPTXAssignValidGlobalNames();
 }
diff --git a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
index 44c50827764b08..4beef4b4ea48db 100644
--- a/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXUtilities.cpp
@@ -13,6 +13,7 @@
 #include "NVPTXUtilities.h"
 #include "NVPTX.h"
 #include "NVPTXTargetMachine.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Function.h"
@@ -33,7 +34,7 @@
 namespace llvm {
 
 namespace {
-typedef std::map<std::string, std::vector<unsigned> > key_val_pair_t;
+typedef std::map<std::string, std::vector<unsigned>> key_val_pair_t;
 typedef std::map<const GlobalValue *, key_val_pair_t> global_val_annot_t;
 
 struct AnnotationCache {
diff --git a/llvm/lib/Target/NVPTX/NVPTXUtilities.h b/llvm/lib/Target/NVPTX/NVPTXUtilities.h
index 36fc0e49153531..0cf1f77f647d6e 100644
--- a/llvm/lib/Target/NVPTX/NVPTXUtilities.h
+++ b/llvm/lib/Target/NVPTX/NVPTXUtilities.h
@@ -14,6 +14,7 @@
 #define LLVM_LIB_TARGET_NVPTX_NVPTXUTILITIES_H
 
 #include "NVPTX.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/CodeGen/ValueTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
@@ -84,6 +85,20 @@ bool shouldEmitPTXNoReturn(const Value *V, const TargetMachine &TM);
 bool Isv2x16VT(EVT VT);
 
 namespace NVPTX {
+inline std::string getValidPTXIdentifier(StringRef Name) {
+  std::string ValidName;
+  ValidName.reserve(Name.size() + 4);
+  for (char C : Name)
+    // While PTX also allows '%' at the start of identifiers, LLVM will throw a
+    // fatal error for '%' in symbol names in MCSymbol::print. Exclude for now.
+    if (isAlnum(C) || C == '_' || C == '$')
+      ValidName.push_back(C);
+    else
+      ValidName.append({'_', '$', '_'});
+
+  return ValidName;
+}
+
 
 inline std::string OrderingToString(Ordering Order) {
   switch (Order) {
diff --git a/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll b/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll
new file mode 100644
index 00000000000000..99c9773e27f4e0
--- /dev/null
+++ b/llvm/test/DebugInfo/NVPTX/debug-ptx-symbols.ll
@@ -0,0 +1,42 @@
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_60 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_60 | %ptxas-verify %}
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+; Verify that the symbols with the characters illegal in PTX get appropriately mangled.
+@__PRETTY_FUNCTION__._Z3foov = private unnamed_addr constant [11 x i8] c"void foo()\00", align 1, !dbg !0
+; '.' gets replaced with `_$_`.
+; CHECK: .global .align 1 .b8 __PRETTY_FUNCTION___$__Z3foov[11] = {118, 111, 105, 100, 32, 102, 111, 111, 40, 41};
+
+; .debug* section names are special and are allowed to have the leading dot.
+; CHECK-DAG: .section        .debug_abbrev
+; CHECK-DAG: .section        .debug_info
+; CHECK-DAG: .b32 .debug_abbrev
+; CHECK-DAG: .b32 .debug_line
+; CHECK-DAG: .section        .debug_macinfo
+
+; .. but the symbol name must be mangled the same way here as it was at the definition point.
+; CHECK-DAG: .b64 __PRETTY_FUNCTION___$__Z3foov
+;
+
+!llvm.dbg.cu = !{!8}
+!llvm.module.flags = !{!10, !11, !12}
+
+!0 = !DIGlobalVariableExpression(var: !1, expr: !DIExpression())
+!1 = distinct !DIGlobalVariable(scope: null, file: !2, line: 1, type: !3, isLocal: true, isDefinition: true)
+!2 = !DIFile(filename: "<stdin>", directory: "/usr/local/google/home/tra/work/llvm/build/release+assert+zapcc/dbg-dot")
+!3 = !DICompositeType(tag: DW_TAG_array_type, baseType: !4, size: 88, elements: !6)
+!4 = !DIDerivedType(tag: DW_TAG_const_type, baseType: !5)
+!5 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
+!6 = !{!7}
+!7 = !DISubrange(count: 11)
+!8 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !2, producer: "clang version 20.0.0git", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, globals: !9, splitDebugInlining: false, nameTableKind: None)
+!9 = !{!0}
+!10 = !{i32 2, !"Debug Info Version", i32 3}
+!11 = !{i32 1, !"wchar_size", i32 4}
+!12 = !{i32 4, !"nvvm-reflect-ftz", i32 0}
+!13 = !{!"clang version 20.0.0git"}
+!14 = distinct !DISubprogram(name: "foo", linkageName: "_Z3foov", scope: !2, file: !2, line: 1, type: !15, scopeLine: 1, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !8)
+!15 = !DISubroutineType(types: !16)
+!16 = !{null}
+!17 = !DILocation(line: 1, column: 56, scope: !14)

Copy link

github-actions bot commented Oct 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Until now debug info was printing the symbols names as-is and that resulted
in invalid PTX when the symbols contained characters that are incalid for PTX.
E.g. `__PRETTY_FUNCTION.something`

Debug info is somewhat disconnected from the symbols themselves, so the regular
"NVPTXAssignValidGlobalNames" pass can't easily fix them.

As the "plan B" this patch catches printout of debug symbols and fixes them, as needed.
One gotcha is that the same code path is used to print the names of debug info sections.
Those section names do start with a '.debug'. The dot in those names is nominally illegal
in PTX, but the debug section names with a dot are accepted as a special case.
The downside of this change is that if someone ever has a `.debug*` symbol that needs to
be referred to from the debug info, that label will be passed through as-is, and will
still produce broken PTX output. If/when we run into a case where we need it to work, we
could consider only passing through specific debug section names, or add a mechanist allowing
us to tell section names apart from regular symbols.

Fixes llvm#58491
!10 = !{i32 2, !"Debug Info Version", i32 3}
!11 = !{i32 1, !"wchar_size", i32 4}
!12 = !{i32 4, !"nvvm-reflect-ftz", i32 0}
!13 = !{!"clang version 20.0.0git"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused, but !8's producer could refer to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I do not know enough about dwarf generation internals, to tell whether some of the metadata w/o explicit references is needed or not, so I left clang-generated IR as is.

@tambry
Copy link
Contributor

tambry commented Oct 22, 2024

Looks simple enough. Thanks for fixing this! 👍

@Artem-B
Copy link
Member Author

Artem-B commented Oct 22, 2024

@dwblaikie Do you have any concerns about this approach?

@dwblaikie
Copy link
Collaborator

@dwblaikie Do you have any concerns about this approach?

Had a quick glance, nothing stands out & I don't think I have enough NVPTX understanding to really validate the design/approach or implementation. But no obvious red flags, at least...

@Artem-B Artem-B merged commit 020fa86 into llvm:main Oct 22, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ptx assembly aborted due to errors
4 participants