Skip to content
This repository has been archived by the owner on Apr 13, 2019. It is now read-only.

qemu-system-riscv32 store doesn't modify the memory during a sw ra, 12(sp) #137

Open
vsiles opened this issue Apr 27, 2018 · 18 comments
Open

Comments

@vsiles
Copy link

vsiles commented Apr 27, 2018

Hi !
I'm trying to test a simple bare metal C code with qemu-system-riscv32 [1], and I think the jump instructions are off by 4 bytes. You can simply test it by running the "ROM" code of the sifive_e machine in gdb:

$ qemu-system-riscv32 -nographic -machine sifive_e -kernel foo.elf -s -S
$ # in another terminal
$ riscv64-unknown-elf-gdb
(gdb) target remote :1234
(gdb) layout asm   /* should display 0x1000 lui t0, 0x20400 | 0x1004  jr t0 */
(gdb) p /x $pc
$1 = 0x1000          /* lui t0, 0x20400 */
(gdb) stepi
(gdb) p /x $pc       /* jr t0 */
$2 = 0x1004
(gdb) p /x $t0
$3 = 0x2040000
(gdb) stepi
(gdb) p /x $pc
$4 = 0x20400004 /* should be 0x20400000 */

This as a nasty side effect with call: when we get back from a function call, the instruction restoring ra is skipped, so if you have several chained function calls, the first ret will return and the second ret will trigger an illegal access to address 0 (most of the time).

For the record, my actual test code is a bare metal application printing "hello world" on the uart:

  • my gcc compiler has been built from the https://github.com/riscv/riscv-tools repository
  • my qemu is the one from this repository
  • my application is compiled with the following flags:
    CFLAGS := -ffreestanding -nostdlib -nostartfiles -march=rv32imafdc -mabi=ilp32 -mcmodel=medany -g -O1

[1] actual revision:

$ git log -1
commit 4d37030aa03e07ae927174c8fce0227648552e51
Author: Michael Clark <[email protected]>
Date:   Tue Apr 24 10:12:27 2018 +1200
@vsiles
Copy link
Author

vsiles commented Apr 27, 2018

It seems there is two separate issues in fact:

  • on both risc32 and risc64, the "jump" instructions seem to be performed correctly but gdb seems to do two steps at once, so it looks like the return address is wrong, but in fact its ok
  • I tested my code on sifive_e, sifive_u and spike_v1.10 machines, and only the sifive_e has an issue (only one in 32 bits), where my stack is full of 0, so the ra is "correctly" set to 0, but it shouldn't be 0 in the first place on the stack.

You'll fine attached my current project (set to compile the sifive_e version)
core.tar.gz

@vsiles
Copy link
Author

vsiles commented Apr 27, 2018

The issue seems related to sw on the risc32 machines: on function entry, ra is stored on the stack using sw ra, 12(sp) but the memory doesn't seem correctly modified, so the future lw will only recover a 0.

I only had the issue on 32 bit RISCV.

@vsiles vsiles changed the title qemu-system-riscv32 seems to add a +4 to jump instructions qemu-system-riscv32 store doesn't modify the memory during a sw ra, 12(sd) Apr 27, 2018
@vsiles vsiles changed the title qemu-system-riscv32 store doesn't modify the memory during a sw ra, 12(sd) qemu-system-riscv32 store doesn't modify the memory during a sw ra, 12(sp) Apr 27, 2018
@michaeljclark
Copy link
Collaborator

Thanks for the report, test case and GDB testing.

There are different code paths in target/riscv/translate.c for when a debugger is attached. I wonder if the issue is only present when GDB is attached.

@vsiles
Copy link
Author

vsiles commented May 9, 2018

Now that I understand a bit better riscv, I'll try to reproduce the issue without gdb to confirm your idea. I'll let you know

@vsiles
Copy link
Author

vsiles commented May 9, 2018

I wrote a minimal code that basically just dump info from Machine CSR registers. I works in 32 bit mode (sifive_u) but not in 32 bit mode (sifive_e) with the following compilation / qemu options:

CFLAGS64		+= -march=rv64imafdc -mabi=lp64  -mcmodel=medany
CFLAGS32		+= -march=rv32imafdc -mabi=ilp32  -mcmodel=medany

qemu-system-riscv64 -machine sifive_u -nographic -kernel build/core.elf
qemu-system-riscv32 -machine sifive_e -kernel build/core.elf -nographic

Here are the two main files:

# start.S
.global _start
_start:
    /* global pointer stuff */
.option push
.option norelax
    la      gp, __global_pointer$
.option pop

    /* Clear bss section */
    la      a0, __bss_start
    la      a1, _end
    bgeu    a0, a1, 2f
1:
    sw      zero, 0(a0)
    addi    a0, a0, 4
    bltu    a0, a1, 1b
2:

    la      sp, machine_stack_start     /* also correctly initialize sp for M mode */

    /* Save M stack current address */
    mv      s0, sp
    /*
     * Allocate room on the kernel stack for the
     * supervisor regs
     */
    addi    sp, sp, -(REGS_SIZE)

    /* M main function */
    mv      a0, sp
    call    machine_main
/* main.c */
#define UART_BASE_PA    UINT32_C(0x10013000)

static void uart_init(void)
{
    /* Monitor Mode */
    console_init(UART_BASE_PA);
}

static void entry(uint64_t mode);

void machine_main(regs *supervisor_regs)
{
    uart_init(); /* It usually never exits this call because of the "ra == 0" issue */
    entry(0);

    while (1) ;
}

@michaeljclark
Copy link
Collaborator

BTW this might help:

riscv-probe works on riscv32 and riscv64 in the spike_v1.9.1 and spike_v1.10 machines. e.g.

$ qemu-system-riscv32 -nographic -machine spike_v1.10 -kernel bin/riscv32/probe-htif
$ qemu-system-riscv64 -nographic -machine spike_v1.10 -kernel bin/riscv64/probe-htif

It would be possible to add the sifive uart to riscv-probe by updating the Makefile and adding mbi_uart.c and a new compile targets and linker scripts:

  • sifive_e entry is 0x20400000
  • sifive_u entry is 0x80000000 (same as spike)

Note sifive_e and sifive_u can be run in either riscv32 or riscv64 modes:

  • e31 processor for sifive_e on riscv32
  • e51 processor for sifive_e on riscv64
  • u34 processor for sifive_u on riscv32
  • u54 processor for sifive_u on riscv64

Something like this should work for the SiFive UART (borrowed from bbl):

#include <stdint.h>

enum {
    UART_REG_TXFIFO = 0,
    UART_REG_RXFIFO = 1,
    UART_REG_TXCTRL = 2,
    UART_REG_RXCTRL = 3,
    UART_REG_DIV = 4
};

uint32_t *uart = (uint32_t *)0x10013000;

int mbi_console_getchar()
{
  int32_t ch = uart[UART_REG_RXFIFO];
  if (ch < 0) return -1;
  return ch;
}

void mbi_console_putchar(uint8_t ch)
{
    volatile uint32_t *tx = &uart[UART_REG_TXFIFO];
    while ((int32_t)(*tx) < 0);
    *tx = ch;
}

I might abstract riscv-probe and build binaries for sifive_e and sifive_u as well as spike by implementing mbi_uart.c and adding linker scripts...

@vsiles
Copy link
Author

vsiles commented May 9, 2018

I'll have a look thanks ! It looks like a lot like the small code I wrote to understand all the machine registers :)

@michaeljclark
Copy link
Collaborator

Yes. I used it for basic compatibility testing for reading and comparing machine CSRs between spike and QEMU in 32-bit mode and 64-bit mode. You'll get different results if you run it on spike_v1.9.1 compared to spike_v1.10. It's possible to add more CSRs. It uses the CSR numbers in hex rather than the literal strings so that it can read CSRs that are not implemented in binutils.

@michaeljclark
Copy link
Collaborator

There is also https://github.com/riscv/riscv-tests which can be run in QEMU. The 'v' tests run in the spike_v1.10 machine.

@michaeljclark
Copy link
Collaborator

Okay, I updated riscv-probe (https://github.com/michaeljclark/riscv-probe) with support for both HTIF and UART, added invocation examples to the README and moved the linker script into a conf directory (defaults to 0x80000000 for spike and sifive_u).

@diodesign
Copy link

I wrote a minimal code that basically just dump info from Machine CSR registers. I works in 32 bit mode (sifive_u) but not in 32 bit mode (sifive_e)

Were you able to resolve this @vsiles? Was it your code at fault, or is there an issue with sw ra? I'm hitting the same problem with rv32 code built from Rust in qemu-system-riscv32 when using sifive_e. The following code, at the end of a Rust function, loads 0 into ra and then jumps to it, crashing the emulator:

0x204006a8:  lw              ra,124(sp)
0x204006ac:  addi            sp,sp,128
0x204006b0:  ret

Which shouldn't be happening. I'd like to know if this is something I need to fix my end, or in qemu. Thanks!

@vsiles
Copy link
Author

vsiles commented Sep 18, 2018

I didn't try to resolve it, since we were only working with 64 bit archs. In such case, gdb display is funky (it might skip steps) but the behavior was correct. I did not look into the 32 bit case more in depth, sorry.

@diodesign
Copy link

diodesign commented Sep 18, 2018

Thanks for the reply. Something funky is going on. The code enters this function:

IN: _ZN4core3fmt5write17h728046e032081804E
0x20400378:  addi            sp,sp,-128
0x2040037c:  sw              ra,124(sp)

At 0x20400378, ra = 0x20400080 and sp = 0x8001ffd0, according to Qemu's monitor, and both valid. sp is not touched again until we reach...

0x204006a8:  lw              ra,124(sp)
0x204006ac:  addi            sp,sp,128
0x204006b0:  ret             

So ra should be restored correctly to 0x20400080 by the lw. Instead, it's set to 0, causing a crash. This is using QEMU emulator version 2.12.0 (Debian 1:2.12+dfsg-3) running:

qemu-system-riscv32 -machine sifive_e -kernel target/riscv32imac-unknown-none-elf/release/kernel -nographic -d in_asm,cpu

I'll build from the latest source, but I can't see anything that's changed that would fix this. Is this something I could help with or patch if there's a problem with rv32 sw ra @michaeljclark please?

Edit: It's still overwriting ra with 0 using the latest riscv-all branch (git log -1 --> commit 6e532472173162d479cdfd5f74d2db9f327967ca)

@michaeljclark
Copy link
Collaborator

I am wondering if it might be decompression related thus in target/riscv/translate.c. I couldn't see from your objdump output whether you had RVC or not.

BTW The disassembler is distinct from the decoder in target/riscv/translate.c so if you run with -d in_asm,op,op_ind,op_opt,out_asm then you might see if something is mis-translated.

I have not encountered any issues like this on riscv32 yet so I'm curious what is going on... I'm surprised that riscv32 linux boots oif we can't sw ra

I can try your reproducer. Do you have the linker script?

@vsiles
Copy link
Author

vsiles commented Oct 2, 2018

Sorry, I no longer have access to these files (former job)... But I didn't have anything clever in it anyway

@jim-wilson
Copy link
Contributor

Perhaps there is a buffer overflow that is clobbering values on the stack. Since the problem only happens for 32-bit code, maybe the buffer overflow doesn't happen for 64-bit code. This could be a compiler or library bug where the size of something is computed wrong when you have a 32-bit target.

@diodesign
Copy link

diodesign commented Oct 9, 2018

OK. It's my fault. I could have sworn there was 128KB of DRAM in the E300 series. It's 16KB. That'll be why ra wasn't written to memory: sp was set too high in memory, in non-existent DRAM. I'm so sorry. I suspect @vsiles was also hitting this same issue.

(PS: thanks for the port, it's appreciated!)

@vsiles
Copy link
Author

vsiles commented Oct 9, 2018

Might have been that indeed !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants