-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ident mangling and unicode. #7539
Comments
related #2253, i think? |
Tools like GDB and c++filt should be considered; but a cursory googling doesn't turn up anything about how they demangle/understand unicode idents. cc @michaelwoerister for the gdb aspect. |
Triage: still an issue, but unicode identifiers are behind a feature gate. I looked into what clang does, which seems to be nothing: int pörk() {
return 1;
}
I'll look into what it does on ARM and android once the toolchains are installed.... |
Yeah, on arm-linux-gnueabi, you get:
Seems it's just plain unsupported on ARM. Perhaps we should mangle per-target? |
Triage bump: I believe just 1 mangling would be better to keep consistency throughout targets. |
On Linux the C++ name-mangling scheme is defined by the so-called Itanium C++ ABI. It doesn't define the encoding for identifiers outside of ASCII. See http://mentorembedded.github.io/cxx-abi/abi.html#mangling-structure and search for "A-Z":
On most platforms, GCC sets the source charset to be UTF-8. I think the source charset is used when emitting symbols, and so this will be used. However, on a system where the host charset is EBCDIC (lol) the source charset will be UTF-EBCDIC. You probably don't need to worry about this. GDB and BFD naively read the symbol names as bytes and hope the right thing happens. The demangler doesn't have any real support for any sort of Unicode mangling -- it does have a mode for Unicode support that was added for gcj, but this is only enabled if the Java demangling option is given, which isn't desirable for Rust. In this mode, a non-ASCII character is encoded as "__U[...hex digit...]+_" (e.g., "_U3FE"). (Though, hilariously, the decoding also bails on any result > 256 ... not sure how that makes any sense at all.) IIRC we added this sort of encoding precisely to work around assembler issues. I think the best thing to do, on Linux, would be not to share mangling with C++ at all; define a sensible Rust mangling (it could share concepts or whatever but ideally would create distinct names -- basically don't start with "_Z"); and then add a Rust mode to the libiberty demangler and basic Rust demangling support to gdb. The gdb parts of this are not a large project and were done in recent times for D, so there's even recent patches that can be cribbed from. |
Triage: not aware of any changes here. |
Triage: cc #55467. |
The recent pre-RFC for more principled symbol mangling would address this: https://internals.rust-lang.org/t/pre-rfc-a-new-symbol-mangling-scheme/8501 |
Yes, it would. |
I believe this can be closed now as we're eventually looking to switch to the new mangling scheme, and in particular this is already listed as an unresolved problem on the tracking issue (#60705); we can open a new bug if there's extensive discussion on the subject in the future. |
Add new lint `negative_feature_names` and `redundant_feature_names` Add new lint [`negative_feature_names`] to detect feature names with prefixes `no-` or `not-` and new lint [`redundant_feature_names`] to detect feature names with prefixes `use-`, `with-` or suffix `-support` changelog: Add new lint [`negative_feature_names`] and [`redundant_feature_names`]
#7488 added a fix to mangle unicode identifiers because the android assembler can't handle them. It is essentially the easiest mangling possible (using
char::escape_unicode
and replacing the leading\
with a$
), with no reference to any other compilers that perform unicode mangling.Presumably, matching any precedent (if there is one) would be best, in terms of tool support etc.
This bug represents the task of researching this and fixing it.
The text was updated successfully, but these errors were encountered: