asm! should automatically rewrite labels in assembly code #81088

Amanieu · 2021-01-16T17:04:38Z

Defining labels in inline assembly is broken because inlining, function cloning and other compiler optimizations may result in an asm! block being emitted more than once in the final binary. This can cause duplicate symbol errors or even compiler crashes (#74262). For this reason it is recommended to only use GAS local labels when using asm! (#76704).

To improve user experience when using asm!, rustc should automatically rewrite label names into a unique name. For example:

asm!(
    "popcnt {popcnt}, {v}",
    "loop:",
    "blsi rax, {v}",
    "jz exit",
    "xor {v}, rax",
    "tzcnt rax, rax",
    "stosb",
    "jmp loop",
    "exit:",
    v = inout(reg) value => _,
    popcnt = out(reg) popcnt,
    out("rax") _, // scratch
    inout("rdi") bits.as_mut_ptr() => _,
);

rustc should search for lines beginning with <ident>: to find label names and then replace all instances of <ident> in the assembly code with .Lasm_label_<ident><unique id>. The .L instructs the assembler that the label is local to the current compilation unit and should never be exported, even as a debug symbol. The unique ID can be generated by LLVM using ${:uid} and is guaranteed to be unique within the current compilation unit.

The text was updated successfully, but these errors were encountered:

nagisa · 2021-01-17T13:45:10Z

Any reason to bother with .L and ${:uid} instead of just replacing them with numbered labels?

Amanieu · 2021-01-17T19:30:23Z

Internally the assembler will turn numbered labels into .L local symbols. Doing it directly makes the implementation slightly simpler since we don't need to track whether a label is before or after a particular instruction.

asquared31415 · 2021-01-28T17:27:11Z

@rustbot claim

Amanieu · 2021-01-28T17:35:30Z

@asquared31415 Thanks for taking up this issue!

I would suggest doing the rewriting in the backend in compiler/rustc_codegen_llvm/src/asm.rs.

asquared31415 · 2021-01-28T17:37:00Z

I already have an implementation working in the simple cases, and was about to submit a pr when I realized it is going to be a lot more work to fix the edge cases and that I should claim these issues so someone else doesn't try to work on them too

joshtriplett · 2021-02-03T21:53:37Z

Reposting a comment that I left on #81570:

Forcing all labels within an asm block to be local would certainly improve usability for the common case (so that people don't have to use local labels).

However, it is legal in clang and gcc to define a label in an asm block and make that label a global symbol. And I've seen that used, on occasion, in real projects, including the Linux kernel. (For instance, suppose you want a global symbol pointing to a location that can be overwritten as a "probe point".) Rewriting would cause such code to silently "work" while making a global symbol with the wrong name, rather than catching the issue.

I agree that such code is not especially safe, if you're not certain that the asm code will appear in the binary exactly once. (In C that's a little easier to guarantee.) It's safer in global_asm, and I do think that we should support it there eventually.

In the meantime, I personally would prefer to make a best-effort attempt to catch the use of non-local labels in asm! and error on them, rather than attempting to rewrite them (which requires more careful lexing of inline assembly). Alternatively, if the thought is that we might allow global labels in the future, perhaps we could attempt to lex a little more of the inline assembly, catch attempts to use .globl or .global, and emit an error when applying those directives to a label.

I would prefer to just reject non-local labels.

asquared31415 · 2021-02-03T21:57:57Z

That approach would work, and perhaps be better for allowing future choices. We can always remove an error and make something work later, but if this causes weird behavior, it's a little harder to revert, especially if people rely on it. As discussion in #81570 progressed I have become more concerned about enforcing things that are wrong, or preventing the use of things that should work, so I support making them a compiler error for now and having a proper discussion about what asm! should do in regards to labels.

Perhaps it would be a good idea to catch .global along with the syntax flags as in #79869 and also look for non-local labels and suggest a fix if possible for the labels too. I had already been glancing at #79869 and would be willing to try making it error instead. I would need a little direction here, as I am not certain where the error should be detected and how to emit compiler errors.

I think that if it is decided that we should error on global labels rather than rewrite, #81570 should be closed no matter the decision on the detection of .global, as that pr's goal was to remove the errors and crashes, whereas changing to erroring on labels would only remove a crash and change the error.

Amanieu · 2021-02-03T22:10:23Z

There is a precedent for label rewriting: Clang does this for labels in MS-style inline asm (source), which is what inspired me to suggest this.

Clang is a bit smarter than us since it uses LLVM MC to actually parse the asm as it is rewriting, but it also does a lot more than us (e.g. inferring operand & constraints). Also this LLVM support only exists for x86, so it wouldn't be suitable for us.

I still think that automatic label rewriting is the best thing we can do at the moment. We should certainly lint against the use of .global/.globl in the same way as #79869. This will help with people accidentally expecting exported label names to work.

comex · 2021-02-03T23:21:45Z

Sure, but MS-style inline asm is overall quite a bit more magic than GNU-style inline asm.

GNU-style inline asm has always been passed straight through to the assembler, except for making substitutions where substitutions are requested. In the case of GCC, or Clang with -fno-integrated-as, this is done without even validating it in any way: you can take a C file containing asm("do a barrel roll");, pass it to gcc -S, and get an "assembly" file with do a barrel roll pasted in the middle of a function. This is a desirable property, since it allows inline asm to use instructions which are known to an external assembler but not the compiler.

While rewriting labels doesn't directly affect that use case, it still violates the principle of transparency. Users expect assembly inside asm! to have the same syntax as standalone assembly, not "the same syntax except for this one magic feature". And this will become more confusing if global_asm! is added.

Also, if naked functions are stabilized, defining a global symbol within asm! might be a reasonable thing to do in those (though I think some guarantees about compilation model would need to be worked out before it could really have well-defined semantics).

There's also the "cowboy code" concern mentioned by @joshtriplett. He said that ensuring asm code would appear only once is "a little easier to guarantee", but I don't think that's quite right. In both Rust and GNU C, it is impossible to guarantee inline assembly will be instantiated exactly once (not counting global inline assembly and perhaps naked functions), because the compiler is always allowed to duplicate code within a function. But Linux does it anyway, because that's their attitude toward a lot of compiler things.

People might attempt the same in Rust. Personally, I'd prefer if rustc didn't go out of its way to break it; after all, if the compiler does decide to duplicate the asm, the result will be an assembler error, not UB. I suspect others might strongly prefer if rustc did go out of its way to break it. Regardless, it's a problem if it seems to work but does something unexpected due to the rewriting.

Finally, this kind of semi-blind scanning of assembly code is impossible to make robust. For example, it's perfectly legal for a label to have the same name as an instruction (jmp: jmp jmp).

asquared31415 · 2021-02-04T01:53:03Z

I agree mostly with the assessment that the compiler shouldn't explicitly forbid things that may be legal. Especially with such a raw construct like inline assembly, outright forbidding thing is probably bad unless it's actually an issue. I definitely think that the compiler should emit a warning for labels that could be bad, and require the user to explicitly disable the warning if that's really what they want. This would still allow the compiler to have issues, but I think that warning the user is desirable.

I think that in the case of #79869 and .global, the compiler should error because those things are expressed using a different syntax.

Amanieu · 2021-02-04T19:39:38Z

Wild idea that I just thought of, what if we added extra syntax to asm! for labels:

asm!(foo: "jmp {foo}");

This avoids the need to parse the assembly code since labels are defined outside the asm string and references to labels use the placeholder syntax. We could then have a lint which recommends people to use this syntax if we find labels inside the assembly code.

asquared31415 · 2021-02-04T22:51:43Z

That solution then prevents the use of multiline strings that could have multiple labels. I do kinda like it, but it also seems bad for the reasons mentioned by comex regarding transparency. Inline assembly is not a rust construct, and I don't know how much we should extract from the raw inline assembly.

bstrie · 2021-04-07T00:01:55Z

after all, if the compiler does decide to duplicate the asm, the result will be an assembler error, not UB.

If an instance of this can't result in anything other than a compilation error then I think it's reasonable to merely document this and leave it as-is. Or is there some worse failure case that I'm not seeing?

I'm not necessarily opposed to a warning here, but I would say that warnings must be actionable somehow. The user isn't capable of controlling whether or not the block gets duplicated, right? Therefore a warning here would really just mean "either silence this warning or don't use loop labels at all". And parsing/validating the assembly strings is something that the RFC seems to go to great lengths to avoid, which would make this a bit of an odd case.

Amanieu · 2021-04-07T04:00:25Z

LLVM has been know to crash in some cases: #74262.

The tricky part is that you only get a compilation error if the function containing the asm! gets inlined into another function multiple times. So your code could appear to work fine until the optimizer eventually makes a different choice and then your code breaks.

The reason I would like to have this issue resolved before stabilization is that the choice we make here will have a significant effect on the way people write inline asm.

At the moment I'm still tending towards the solution that I came up with in #81088 (comment):

asm!(foo: "jmp {foo}");

Labels in inline assembly are fundamentally different from labels in normal assembly because labels are local to the current asm block rather than global.

bstrie · 2021-04-07T04:08:32Z

asm!(foo: "jmp {foo}");

Would this be paired with completely banning ordinary labels? After all, if you're going to parse the strings just to warn the user that they shouldn't be using labels, then perhaps an outright ban is better.

Amanieu · 2021-04-07T04:22:35Z

Quick recap of 3 different options:

Option 1: do nothing

The only correct way to write inline assembly is to use GAS-style numbered local labels.

asm!("0: jmp 0b");

This can cause issues on x86 with intel syntax: LLVM will sometimes parse 0b and 1b as binary literals.
Perhaps we can attempt to scan the assembly code to look for things resembling labels and lint against them, but make no attempt at automatic rewriting.
Assembly code is generally less readable due to poor label names.

Option 2: automatic rewriting of labels

Named labels can be used inside inline assembly and get automatically rewritten to a unique name by the compiler.

asm!("foo: jmp foo");

This makes assembly work the way most people would expect it to.
This is a big change from the rest of asm! which specifically avoids having to do any parsing of the assembly code itself. There is the possibility of making mistakes since we only do token-level replacement (e.g. label with the same name as an instruction).
This behavior may surprise users who (incorrectly) expect a label defined in asm! to be globally visible.

Option 3: special syntax for labels

We introduce special syntax to asm! to represent labels.

asm!(foo: "jmp {foo}");

This uses the same type of placeholders as named register arguments, so there is no need to parse the asm code.
This makes it clear that asm! labels behave differently from normal labels.
We can still lint against named labels in the asm code like in option 1 and recommend using the new syntax instead.

However there are several downsides:

Previously you could re-use snippets of assembly code by using a macro_rules! that contained a string literal. This would now require special syntax since assembly code is no longer just a string literal. See this Zulip thread for a possible solution using a new asm_snippet! builtin macro.
For consistency, we should also support this in global_asm!. However global_asm! is allowed to define global labels (since it can't get inlined anywhere). It may be confusing to have two different label styles in global_asm!.

asquared31415 · 2021-04-08T23:38:36Z

I would much rather be able to use more descriptive labels in inline asm, however, I don't think there's a way to do this without a lot of parsing of inline assembly, which I think should be avoided, and even if that were done, I don't think that it can be done perfectly.

With this in mind, I think I am in favor of doing nothing, specifying in documentation that local labels are required and doing no parsing/rewriting. Declare use of other labels to be undefined behavior, because it depends on the exact way the compiler decides to emit things, and ideally get the crash on windows with -g fixed in LLVM (though IDK how feasible that is).

comex · 2021-04-09T22:04:33Z

For Option 3, why not do it like this?

asm!("{foo}: jmp {foo}");

Then you still have a single string literal.

Amanieu · 2021-04-10T10:00:26Z

Keep in mind that {ident} is also used for named operands which get substituted with register names. I don't like the idea of simply treating unknown identifiers as a label since that prevents us from emitting an error for typos.

joshtriplett · 2021-04-14T17:46:03Z

FWIW, I think it's worth considering a hybrid approach: we could go with approach 1 for now, stabilize inline assembly with that approach, and consider whether we also want to support approach 3 in the future along with things like asm goto.

asquared31415 · 2021-04-15T06:13:27Z

Doing nothing is the option that breaks the least, and doesn't lock into any irreversible design decisions. Could possibly introduce a Rust lint or warning rather than an error for labels that can currently cause label duplications.

comex · 2021-04-15T16:30:48Z

Doing nothing is the option that breaks the least, and doesn't lock into any irreversible design decisions.

I'm not sure that's true. People will try to use asm! to define global symbols, and it will work in practice under common circumstances (where the code containing the asm! is referenced and not duplicated). Then in the future, either making it an error, or automatically rewriting the symbols, would break that code.

amluto · 2021-04-18T23:28:26Z

As a possible solution that might not involve parsing asm code, binutils' gas does sort of have a solution:

.macro some_macro
        jmp .Lmy_label_\@
.Lmy_label_\@:
        nop
.endm

This isn't directly applicable to inline asm, but inline asm could plausibly be instantiated as a macro and an invocation of the macro. A macro argument or some other such trick could be used to pass the name of the macro or a per-macro unique id in.

Option 3 could also IMO be improved a bit to be less magic: instantiate {foo} as a unique label if and only iff foo is actually declared as a label by some syntax outside of the asm string. Or make {label:foo} turn into a unique label.

nbdd0121 · 2021-07-06T21:24:36Z

Keep in mind that {ident} is also used for named operands which get substituted with register names. I don't like the idea of simply treating unknown identifiers as a label since that prevents us from emitting an error for typos.

How about

asm!("{foo}: jmp {foo}", foo = label);

label just generates a unique label and is substituted similar to an allocated register operand.

Amanieu · 2021-07-07T12:10:42Z

This is how I planned for asm!(foo: "jmp {foo}") to be represented internally in the compiler, but I don't like the fact that you have to declare the label name twice.

…nieu Lint against named asm labels This adds a deny-by-default lint to prevent the use of named labels in inline `asm!`. Without a solution to rust-lang#81088 about whether the compiler should rewrite named labels or a special syntax for labels, a lint against them should prevent users from writing assembly that could break for internal compiler reasons, such as inlining or anything else that could change the number of actual inline assembly blocks emitted. This does **not** resolve the issue with rewriting labels, that still needs a decision if the compiler should do any more work to try to make them work.

ghost · 2021-08-22T11:41:33Z

How about asm!("call {fn foo}") for functions and asm!("mov eax, {static BAR}"} for statics?

Amanieu · 2021-08-22T23:19:34Z

We already have this with the sym operand type. This issue is about labels defined in an asm code snippet.

bradjc · 2021-09-10T14:48:37Z

I don't know if there is interest in community votes, but I would like to see any option other than Option 1 (use numeric labels). Assembly code is hard enough to read without being able to use descriptive labels.

Amanieu added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-inline-assembly Area: Inline assembly (`asm!(…)`) F-asm `#![feature(asm)]` (not `llvm_asm`) labels Jan 16, 2021

Amanieu mentioned this issue Jan 16, 2021

Tracking Issue for inline assembly (asm!) #72016

Closed

4 tasks

rustbot assigned asquared31415 Jan 28, 2021

asquared31415 mentioned this issue Jan 30, 2021

Rewrite local labels in inline assembly #81570

Closed

Amanieu mentioned this issue Apr 8, 2021

unstable book: explaining how to properly use labels in inline asm #76704

Closed

asquared31415 mentioned this issue Jul 20, 2021

Lint against named asm labels #87324

Merged

Amanieu mentioned this issue Aug 20, 2021

Exempt #[naked] fns from named_asm_labels lint #88169

Closed

bradjc mentioned this issue Sep 10, 2021

Update Nightly - Sep 2021 tock/tock#2816

Merged

2 tasks

asquared31415 mentioned this issue Oct 21, 2021

Compiler null dereference when there are conflicting asm labels #74262

Closed

bstrie mentioned this issue Feb 16, 2022

Tracking Issue for RFC #2972: Constrained Naked Functions #90957

Open

6 tasks

asquared31415 removed their assignment Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asm! should automatically rewrite labels in assembly code #81088

asm! should automatically rewrite labels in assembly code #81088

Amanieu commented Jan 16, 2021 •

edited by rustbot

Loading

nagisa commented Jan 17, 2021

Amanieu commented Jan 17, 2021

asquared31415 commented Jan 28, 2021

Amanieu commented Jan 28, 2021

asquared31415 commented Jan 28, 2021

joshtriplett commented Feb 3, 2021

asquared31415 commented Feb 3, 2021

Amanieu commented Feb 3, 2021

comex commented Feb 3, 2021

asquared31415 commented Feb 4, 2021

Amanieu commented Feb 4, 2021

asquared31415 commented Feb 4, 2021

bstrie commented Apr 7, 2021

Amanieu commented Apr 7, 2021

bstrie commented Apr 7, 2021 •

edited

Loading

Amanieu commented Apr 7, 2021

asquared31415 commented Apr 8, 2021

comex commented Apr 9, 2021

Amanieu commented Apr 10, 2021

joshtriplett commented Apr 14, 2021

asquared31415 commented Apr 15, 2021

comex commented Apr 15, 2021

amluto commented Apr 18, 2021

nbdd0121 commented Jul 6, 2021

Amanieu commented Jul 7, 2021

ghost commented Aug 22, 2021 •

edited by ghost

Loading

Amanieu commented Aug 22, 2021

bradjc commented Sep 10, 2021

asm! should automatically rewrite labels in assembly code #81088

asm! should automatically rewrite labels in assembly code #81088

Comments

Amanieu commented Jan 16, 2021 • edited by rustbot Loading

nagisa commented Jan 17, 2021

Amanieu commented Jan 17, 2021

asquared31415 commented Jan 28, 2021

Amanieu commented Jan 28, 2021

asquared31415 commented Jan 28, 2021

joshtriplett commented Feb 3, 2021

asquared31415 commented Feb 3, 2021

Amanieu commented Feb 3, 2021

comex commented Feb 3, 2021

asquared31415 commented Feb 4, 2021

Amanieu commented Feb 4, 2021

asquared31415 commented Feb 4, 2021

bstrie commented Apr 7, 2021

Amanieu commented Apr 7, 2021

bstrie commented Apr 7, 2021 • edited Loading

Amanieu commented Apr 7, 2021

Option 1: do nothing

Option 2: automatic rewriting of labels

Option 3: special syntax for labels

asquared31415 commented Apr 8, 2021

comex commented Apr 9, 2021

Amanieu commented Apr 10, 2021

joshtriplett commented Apr 14, 2021

asquared31415 commented Apr 15, 2021

comex commented Apr 15, 2021

amluto commented Apr 18, 2021

nbdd0121 commented Jul 6, 2021

Amanieu commented Jul 7, 2021

ghost commented Aug 22, 2021 • edited by ghost Loading

Amanieu commented Aug 22, 2021

bradjc commented Sep 10, 2021

Amanieu commented Jan 16, 2021 •

edited by rustbot

Loading

bstrie commented Apr 7, 2021 •

edited

Loading

ghost commented Aug 22, 2021 •

edited by ghost

Loading