[RISC-V] codegen seems not ideal #361

japaric · 2019-06-19T09:21:41Z

Disclaimer: I don't know much of the RISC-V ISA so this may be just my poor
understanding.

Some things I observed while working with the hifive1 board.

NOTE: wherever the (compilation) target is not specified,
riscv32imac-unknown-none-elf was used

`jalr` instead of `jal`

All direct function calls seems to use some sort of relocatable form. Example
below:

#[no_mangle]
unsafe fn _start() -> ! {
    foo();

    loop {}
}

#[inline(never)]
#[no_mangle]
unsafe fn foo() {
    asm!("" : : : "memory" : "volatile")
}

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       addi    sp, sp, -16
   11002:       sw      ra, 12(sp)
   11004:       sw      s0, 8(sp)
   11006:       addi    s0, sp, 16
   11008:       auipc   ra, 0
   1100c:       jalr    ra, ra, 10
   11010:       j       0

0000000000011012 foo:
   11012:       addi    sp, sp, -16
   11014:       sw      ra, 12(sp)
   11016:       sw      s0, 8(sp)
   11018:       addi    s0, sp, 16
   1101a:       lw      s0, 8(sp)
   1101c:       lw      ra, 12(sp)
   1101e:       addi    sp, sp, 16
   11020:       ret

That auipc, jalr combination looks like relocatable code to me (not that I
know much about relocatable code). I was expecting to see jal function calls
of this form:

global_asm!(r#"
  .global _start
_start:
  addi    sp, sp, -16
  sw      ra, 12(sp)
  sw      s0, 8(sp)
  addi    s0, sp, 16
  jal     foo
  j       0
"#);

#[inline(never)]
#[no_mangle]
unsafe fn foo() {
    asm!("" : : : "memory" : "volatile")
}

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       addi    sp, sp, -16
   11002:       sw      ra, 12(sp)
   11004:       sw      s0, 8(sp)
   11006:       addi    s0, sp, 16
   11008:       jal     6
   1100c:       j       0

000000000001100e foo:
   1100e:       addi    sp, sp, -16
   11010:       sw      ra, 12(sp)
   11012:       sw      s0, 8(sp)
   11014:       addi    s0, sp, 16
   11016:       lw      s0, 8(sp)
   11018:       lw      ra, 12(sp)
   1101a:       addi    sp, sp, 16
   1101c:       ret

Since the target specification says relocation-model: "static":

$ rustc -Z unstable-options --print target-spec-json --target riscv32imac-unknown-none-elf
(..)
  "relocation-model": "static",
(..)

That would save 4 bytes of .text per function call.

`atomic::compiler_fence` produces an instruction

This program shows that atomic::compiler_fence produces a fence instruction

#[no_mangle]
unsafe fn _start() -> ! {
    atomic::compiler_fence(Ordering::SeqCst);

    loop {}
}

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       addi    sp, sp, -16
   11002:       sw      ra, 12(sp)
   11004:       sw      s0, 8(sp)
   11006:       addi    s0, sp, 16
   11008:       fence   rw, rw
   1100c:       j       0

Even though the description of the function states that "compiler_fence does
not emit any machine code".

As it is, both atomic::fence and atomic::compiler_fence generate the same
machine code.

#[no_mangle]
unsafe fn _start() -> ! {
    atomic::fence(Ordering::SeqCst);

    loop {}
}

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       addi    sp, sp, -16
   11002:       sw      ra, 12(sp)
   11004:       sw      s0, 8(sp)
   11006:       addi    s0, sp, 16
   11008:       fence   rw, rw
   1100c:       j       0

You don't see this behavior with the ARM Cortex-M backend:

#[no_mangle]
unsafe fn _start() -> ! {
    atomic::compiler_fence(Ordering::SeqCst);

    loop {}
}

$ cargo objdump --target thumbv7m-none-eabi --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       b       #-4 <_start>

`intrinsics::abort` != `UNIMP`

Calling intrinsics::abort produces a function call to the abort symbol even
though the UNIMP instruction exists.

#[no_mangle]
unsafe fn _start() -> ! {
    core::intrinsics::abort()
}

global_asm!(r#"
  .global abort
abort:
  UNIMP
"#);

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 abort:
   11000:       unimp

0000000000011002 _start:
   11002:       addi    sp, sp, -16
   11004:       sw      ra, 12(sp)
   11006:       sw      s0, 8(sp)
   11008:       addi    s0, sp, 16
   1100a:       auipc   ra, 0
   1100e:       jalr    ra, ra, -10
   11012:       auipc   ra, 0
   11016:       jalr    ra, ra, -18

I was actually expecting something like this to be generated:

global_asm!(r#"
  .global _start
_start:
  UNIMP
"#);

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       unimp

`noreturn nounwind` & divergent functions

The riscv32imac-unknown-none-elf target uses panic-strategy: "abort"; that
tells the backend that functions never unwind so divergent functions fn() -> !
, which never return and are marked as noreturn nounwind in LLVM IR, should
not preserve the caller "saved registers". However, one observes register
stacking in the following program:

#[no_mangle]
unsafe fn _start() -> ! {
    // (RISC-V has so many registers)
    asm!("" : :
         "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7) "r"(8) "r"(9)
         "r"(10) "r"(11) "r"(12) "r"(13) "r"(14) "r"(15) "r"(16)
         : : "volatile");

    loop {}
}

$ cargo objdump --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       addi    sp, sp, -16
   11002:       sw      ra, 12(sp)
   11004:       sw      s0, 8(sp)
   11006:       sw      s1, 4(sp)
   11008:       addi    s0, sp, 16
   1100a:       addi    a6, zero, 1
   1100c:       addi    a7, zero, 2
   1100e:       addi    t0, zero, 3
   11010:       addi    t1, zero, 4
   11012:       addi    t2, zero, 5
   11014:       addi    t3, zero, 6
   11016:       addi    t4, zero, 7
   11018:       addi    t5, zero, 8
   1101a:       addi    t6, zero, 9
   1101c:       addi    a3, zero, 10
   1101e:       addi    a4, zero, 11
   11020:       addi    a5, zero, 12
   11022:       addi    a0, zero, 13
   11024:       addi    a1, zero, 14
   11026:       addi    a2, zero, 15
   11028:       addi    s1, zero, 16
   1102a:       j       0

s0 and s1 are pushed onto the stack but never popped. Is that required by
the ISA / C ABI?

(off-topic: why is ra also being pushed onto the stack when _start performs
no function call?)

Compare the previous program to this ARM Cortex-M program:

#[no_mangle]
unsafe fn _start() -> ! {
    asm!("" : :
         "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7) "r"(8) "r"(9)
         "r"(10) "r"(11) "r"(12) "r"(13)
         : : "volatile");

    loop {}
}

$ cargo objdump --target thumbv7m-none-eabi --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       mov.w   r12, #0
   11004:       mov.w   lr, #1
   11008:       mov.w   r8, #2
   1100c:       mov.w   r9, #3
   11010:       mov.w   r10, #4
   11014:       mov.w   r11, #5
   11018:       movs    r4, #6
   1101a:       movs    r5, #7
   1101c:       movs    r6, #8
   1101e:       movs    r7, #9
   11020:       movs    r2, #10
   11022:       movs    r3, #11
   11024:       movs    r0, #12
   11026:       movs    r1, #13
   11028:       b       #-4 <_start+0x28>

Registers are never pushed onto the stack.

Compare that machine code to the machine code generated for a non-divergent
function:

#[no_mangle]
unsafe fn _start() {
    asm!("" : :
         "r"(0) "r"(1) "r"(2) "r"(3) "r"(4) "r"(5) "r"(6) "r"(7) "r"(8) "r"(9)
         "r"(10) "r"(11) "r"(12) "r"(13)
         : : "volatile");
}

$ cargo objdump --target thumbv7m-none-eabi --bin app --release -- -C -d -no-show-raw-insn
0000000000011000 _start:
   11000:       push.w  {r4, r5, r6, r7, r8, r9, r10, r11, lr}
   11004:       mov.w   r12, #0
   11008:       mov.w   lr, #1
   1100c:       mov.w   r8, #2
   11010:       mov.w   r9, #3
   11014:       mov.w   r10, #4
   11018:       mov.w   r11, #5
   1101c:       movs    r4, #6
   1101e:       movs    r5, #7
   11020:       movs    r6, #8
   11022:       movs    r7, #9
   11024:       movs    r2, #10
   11026:       movs    r3, #11
   11028:       movs    r0, #12
   1102a:       movs    r1, #13
   1102c:       pop.w   {r4, r5, r6, r7, r8, r9, r10, r11, pc}

Registers are pushed in the prologue and then popped in the epilogue of the
function.

It would be good to check if these backend issues have been fixed in the latest
version of LLVM (using llc) because rustc is using a several months old
LLVM. If a recent commit shows the same issues then we may want to submit bug
reports to the LLVM project.

Metadata

$ rustc -Vv
rustc 1.37.0-nightly (04a3dd8a8 2019-06-18)
binary: rustc
commit-hash: 04a3dd8a872633ca1e4c217d11f741cc35cb19a5
commit-date: 2019-06-18
host: x86_64-unknown-linux-gnu
release: 1.37.0-nightly
LLVM version: 8.0

cc @rust-embedded/riscv @Disasm

The text was updated successfully, but these errors were encountered:

Disasm · 2019-06-20T22:12:18Z

Oh, someday I'll have to become an LLVM developer. I'm 99% sure these strange things have nothing to do with Rust itself, but I need to check anyway.
Thanks for the observations!

Disasm · 2019-06-24T19:26:35Z

`atomic::compiler_fence` produces an instruction

Even though the description of the function states that "compiler_fence does
not emit any machine code".

The description is wrong. This compiler_fence lowers to atomic_fence(ordering, SingleThread) and this atomic_fence can be lowered to any architecture-dependent code, not only to empty code.
LLVM lowers atomic_fence(ordering, SingleThread) to CompilerBarrier for AArch64 and ARM targets, but not for AVR, PowerPC, RISC-V and Spark. Of cause, this "SingleThread means CompilerBarrier" hack could be added for RISC-V too, but I do not see any adequate reason for this. Something is wrong on the Rust side.

`intrinsics::abort` != `UNIMP`

Are there any guidelines on abort implementation? I think that UNIMP instruction can cause "Illegal instruction" exception and further stack overrun. Infinite loop inside abort seems like a better idea to me.

`noreturn` `nounwind` & divergent functions

The same for clang: https://godbolt.org/z/YPa98Z
However, this register save code elimination is quite complex thing because you need to proof that it's useless. At the same time we do not see register restores because this code was removed during the dead code elimination process, which is easier. Also you will get the same "optimizations" with gcc.

`jalr` instead of `jal`

I do not have any answer yet, trying to investigate...

Disasm · 2019-06-30T11:59:53Z

`jalr` instead of `jal`

Compiler generates call <something> instruction, which translates into auipc+jalr pair (The RISC-V Instruction Set Manual Volume I: User-Level ISA v2.2, p.110, table 20.2). That what you can see in *.o files because at this point it's impossible to determine whether you can call your function with jal. However, linker could translate this to jal (or even j) instruction if it's instructed to do so with linker relaxation attribute. By default, LLVM do not use this attribute for some reason, but it's possible to turn it on.

Unfortunately in Rust this doesn't work: the attribute starts to appear in object files, but rust-lld ignores it and leaves instructions as is. I need to figure out why.

lenary · 2019-08-01T09:48:48Z

Hi! I'm a developer at lowRISC, working with Alex Bradbury on the RISC-V llvm backend. I've just started (yesterday!) looking at where we can improve the RISC-V Rust backend, in addition to my main LLVM work.

The LLVM backend only recently became non-experimental (around the time of the 9.0 branch, with releases imminent). With that, the aim is that we can compile most RISC-V programs in rv{32,64}imacfd, supporting the main ABIs. We know there are still lots of opportunities to produce "better" code. I'm happy to note that none of these issues make your program incorrect, they just make it use a slightly

Looking at these issues:

jalr instead of jal. You are correct that this is an optimisation commonly performed by the linker, during linker relaxation. I'm not sure LLD does linker relaxation yet, nor am I sure there are any outstanding patches so it may do so. GNU ld will definitely perform this optimisation (the correct relocations should be being emitted).
intrinsics::abort I think we haven't really addressed many custom/intrinsic lowerings in the RISC-V llvm backend yet. I shall take a closer look at this.
atomic::compiler_fence I will take a look at how this is lowered. It may be a case of doing a different lowering in the RISC-V backend, but somehow this feels like a higher-level issue than that. I'm going to have to look at some more resources, and see if this should instead be a change somewhere in the target-independent part of LLVM.
Stack usage in noreturn, nounwind functions: Right now it seems like the compiler is naive, and saves registers it doesn't need to. We can look at this not happening (subject to the ABI), but it's not a top priority.

I hope this clears up what's going on!

SUPERCILEX · 2023-05-04T05:56:59Z

For anyone working on branch prediction, this was just recently fixed: rust-lang/rust#109426. Enable like this: SUPERCILEX/hardcaml_riscv@d1125a6#diff-f34d15ec63ba1edb3a7a25b6a3e3b7d3c1f02cbef0b66a6a73e8cf5c959a8576

rmsyn · 2024-08-17T20:36:50Z

`jalr` instead of `jal`

This has been resolved since this issue was opened: https://godbolt.org/z/zqET14TEs

`noreturn` `nounwind` and divergent functions

This has been resolved since this issue was opened: https://godbolt.org/z/5bsEss6M3

intrinsics::abort != UNIMP

This has been resolved since this issue was opened: https://godbolt.org/z/qv7GndhEc

`atomic::compiler_fence` produces an instruction

This issue seems to still exist in the current stable and nightly compilers: https://godbolt.org/z/ao3464vbe

Out of the original issues noted, the only remaining issue seems to be atomic::compiler_fence and atomic::fence emitting code.

Not sure if these should actually lower into a fence and/or fence.i RISC-V instruction, or what the desired behavior is.

jamesmunns · 2024-08-18T00:29:32Z

The compiler fence seems to disappear when -C opt-level="..." is passed. It's likely the compiler fence is making a function call but doing nothing, but once optimized the actual call is inlined and the side effect is removed.

rmsyn · 2024-08-18T01:13:03Z

The compiler fence seems to disappear when -C opt-level="..." is passed.

Confirmed that this works, atomic::compiler_fence disappears, where atomic::fence emits a fence instruction.

So, I guess this would only cause an issue for users trying to run debug builds, expecting either of those calls to do something?

This could be an annoying "gotcha" for the atomic::fence call.

BartMassey · 2024-08-18T19:30:32Z

Everything sounds like it's working more-or-less as expected now. Anyone object to closing this issue?

adamgreig · 2024-08-27T18:19:02Z

Thanks for triaging!

japaric added the T-riscv label Jun 19, 2019

Disasm mentioned this issue Jun 30, 2019

compiler_fence may emit machine code rust-lang/rust#62256

Open

adamgreig closed this as completed Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISC-V] codegen seems not ideal #361

[RISC-V] codegen seems not ideal #361

japaric commented Jun 19, 2019

Disasm commented Jun 20, 2019

Disasm commented Jun 24, 2019 •

edited

Loading

Disasm commented Jun 30, 2019

lenary commented Aug 1, 2019 •

edited

Loading

SUPERCILEX commented May 4, 2023

rmsyn commented Aug 17, 2024

jamesmunns commented Aug 18, 2024

rmsyn commented Aug 18, 2024

BartMassey commented Aug 18, 2024

adamgreig commented Aug 27, 2024

[RISC-V] codegen seems not ideal #361

[RISC-V] codegen seems not ideal #361

Comments

japaric commented Jun 19, 2019

jalr instead of jal

atomic::compiler_fence produces an instruction

intrinsics::abort != UNIMP

noreturn nounwind & divergent functions

Metadata

Disasm commented Jun 20, 2019

Disasm commented Jun 24, 2019 • edited Loading

atomic::compiler_fence produces an instruction

intrinsics::abort != UNIMP

noreturn nounwind & divergent functions

jalr instead of jal

Disasm commented Jun 30, 2019

jalr instead of jal

lenary commented Aug 1, 2019 • edited Loading

SUPERCILEX commented May 4, 2023

rmsyn commented Aug 17, 2024

jalr instead of jal

noreturn nounwind and divergent functions

intrinsics::abort != UNIMP

atomic::compiler_fence produces an instruction

jamesmunns commented Aug 18, 2024

rmsyn commented Aug 18, 2024

BartMassey commented Aug 18, 2024

adamgreig commented Aug 27, 2024

`jalr` instead of `jal`

`atomic::compiler_fence` produces an instruction

`intrinsics::abort` != `UNIMP`

`noreturn nounwind` & divergent functions

Disasm commented Jun 24, 2019 •

edited

Loading

`atomic::compiler_fence` produces an instruction

`intrinsics::abort` != `UNIMP`

`noreturn` `nounwind` & divergent functions

`jalr` instead of `jal`

`jalr` instead of `jal`

lenary commented Aug 1, 2019 •

edited

Loading

`jalr` instead of `jal`

`noreturn` `nounwind` and divergent functions

`atomic::compiler_fence` produces an instruction