Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy way to transfer code from flash to RAM? #394

Open
therealprof opened this issue Nov 7, 2017 · 18 comments
Open

Easy way to transfer code from flash to RAM? #394

therealprof opened this issue Nov 7, 2017 · 18 comments

Comments

@therealprof
Copy link
Contributor

I have an interrupt handler which is called very frequently and due to the CPU running at high speed I have to use wait states which seems to slow down processing quite a bit. Is there any simple way to declare that the IRQ handler should be copied to RAM during initialisation instead of being run from flash?

@japaric
Copy link
Member

japaric commented Nov 7, 2017

Easy way to transfer code from flash to RAM?

There's no easy way but you can place functions in RAM using the latest stable release of cortex-m-rt. Example below:

#![no_std]

extern crate blue_pill; // or some device crate
extern crate cortex_m_rt;

use core::ptr;

#[inline(never)]
fn main() {
    let x = unsafe { ptr::read_volatile(0x2000_0000 as *const u8) };
    // call function placed in RAM
    let y = foo(x);
    unsafe { ptr::write_volatile(0x2000_0001 as *mut u8, y) };
}

// plain function placed in RAM
#[link_section = ".data"] // <- the important part
#[inline(never)] // you don't want this to be inlined, right?
fn foo(x: u8) -> u8 {
    x + 1
}

// interrupt handler placed in RAM
// we can't compose `link_section` with `interrupt!` so we have to write the expansion of
// `interrupt!` here (bummer)
#[allow(non_snake_case)]
#[link_section = ".data"] // <- the important part
#[no_mangle]
pub fn EXTI0() {
    let x = unsafe { ptr::read_volatile(0x2000_0000 as *const u8) };
    // call function placed in RAM
    let y = foo(x);
    unsafe { ptr::write_volatile(0x2000_0001 as *mut u8, y) };
}

I can't test this right now but the disassembly looks correct:

$ arm-none-eabi-objdump -CD target/thumbv7m-none-eabi/release/cortex-m-quickstart
080001a4 <cortex_m_quickstart::main>:
 80001a4:       b580            push    {r7, lr}
 80001a6:       466f            mov     r7, sp
 80001a8:       f04f 5000       mov.w   r0, #536870912  ; 0x20000000
 80001ac:       7800            ldrb    r0, [r0, #0]
 80001ae:       f000 f80b       bl      80001c8 <___ZN19cortex_m_quickstart3foo17h20bca7cdf6dc430dE_venee
r>
 80001b2:       2101            movs    r1, rust-embedded/cortex-m-rt#1
 80001b4:       f2c2 0100       movt    r1, #8192       ; 0x2000
 80001b8:       7008            strb    r0, [r1, #0]
 80001ba:       bd80            pop     {r7, pc}

(..)

080001c8 <___ZN19cortex_m_quickstart3foo17h20bca7cdf6dc430dE_veneer>:
 80001c8:       f85f f000       ldr.w   pc, [pc]        ; 80001cc <___ZN19cortex_m_quickstart3foo17h20bca
7cdf6dc430dE_veneer+0x4>
 80001cc:       20000001        andcs   r0, r0, r1

Disassembly of section .data:

20000000 <cortex_m_quickstart::foo>:
20000000:       3001            adds    r0, rust-embedded/cortex-m-rt#1
20000002:       4770            bx      lr

20000004 <EXTI0>:
20000004:       b580            push    {r7, lr}
20000006:       466f            mov     r7, sp
20000008:       f04f 5000       mov.w   r0, #536870912  ; 0x20000000
2000000c:       7800            ldrb    r0, [r0, #0]
2000000e:       f7ff fff7         bl      20000000 <cortex_m_quickstart::foo>
20000012:       2101            movs    r1, rust-embedded/cortex-m-rt#1
20000014:       f2c2 0100       movt    r1, #8192       ; 0x2000
20000018:       7008            strb    r0, [r1, #0]
2000001a:       bd80            pop     {r7, pc}

Both foo and the EXTI0 interrupt handler should end up in RAM (the .data section is initialized before main starts). main calls into foo using a veneer (TIL: a mechanism for working around the relative address restriction of branch instructions).

There seems to be no problem in calling RAM functions from a RAM function or from a Flash function but calling a Flash function from a RAM function gave me this linker error (I moved foo to Flash):

  = note: /home/japaric/tmp/cortex-m-quickstart/target/thumbv7m-none-eabi/release/deps/cortex_m_quickstart-893c31d38a914189.cortex_m_quickstart0.rcgu.o: In function `cortex_m_quickstart::EXTI0':
          /home/japaric/tmp/cortex-m-quickstart/src/main.rs:32:(.data+0xa): relocation truncated to fit: R_ARM_THM_CALL against `cortex_m_quickstart::foo'

which kind of looks familiar.

One thing I should note is that if you call e.g. FOO.bar() inside an interrupt handler then there's no guarantee that the bar subroutine will end in RAM if you haven't defined bar yourself (and used the link_section trick). LLVM may inline bar into the interrupt handler and things will just work, or it may not and you'll probably get the "relocation truncated" linker error.

I think it would make sense to have some macro / attribute to make placing functions in RAM easier but this should get more testing first. Again, I have not tested anything myself :-).

@therealprof
Copy link
Contributor Author

Sweet! Thank you so much @japaric. That works an absolute treat and has reduced the time to handle an interrupt (including instrumentation so I can measure it but excluding the veneer overhead which consists of quite a few instructions) from 6.2µs to 5.4µs.

@japaric
Copy link
Member

japaric commented Nov 7, 2017

Nice wins! Glad to hear it worked. Let's keep this issue open to track adding this as a proper feature.

@therealprof
Copy link
Contributor Author

It would be even nicer if we could directly jump into the function in RAM via the interrupt/exception vector rather than have an entry point to an exception handler then jump into the veneer and then jump into RAM and all the way back.

@japaric
Copy link
Member

japaric commented Nov 22, 2017

Things that need to be decided before implementing this:

  • What should be the syntax of this feature for plain functions, interrupt!, exception! and RTFM?

    • We could go for #[link_section = "$section"] but that seems error prone because the user can write anything for the $section.
  • Should we support a single linker section or several? If the latter, how would that affect the syntax? Think of these scenarios:

    • You have a single RAM region -- this is the most common case. Only a single linker section needs to be supported; that linker section will always be in the only RAM region.
    • You have two RAM regions: plain RAM and core-coupled RAM (CCRAM) (*), and you want to put all the program code in one of these RAM regions.
    • You have two RAM regions and you want to place some code in one region and some other code in the other region. (Does this scenario even occur in practice?)

We could start by simply supporting the first scenario and postpone supporting the other scenarios. In any case, we don't have a great story for placing linker section in this or that memory region -- everything is kind of hard coded right now.

(*) Only the processor can access the CCRAM so CCRAM has better performance than plain RAM because the processor doesn't have to share the CCRAM bus with the DMA.

@pftbest
Copy link
Contributor

pftbest commented Nov 22, 2017

I think a common scenario for this feature might be a self programming. When you erase the FLASH you can't read or execute code from it, so you need to place the necessary code and data in RAM. Interrupt table should also be in RAM in this case.

@therealprof
Copy link
Contributor Author

Interrupt table should also be in RAM in this case.

🤔

@japaric
Copy link
Member

japaric commented Nov 24, 2017

@pftbest you mean putting the whole program in RAM, right? That's sound like a different feature to me. We could have a Cargo feature that when enabled changes the Flash reset handler (boot code) to copy the program from Flash to RAM and then jumps to the RAM reset handler.

What @therealprof asked for was a way to place certain functions and interrupt handler in RAM. Which is also a wanted feature since you may not always be able to fit the whole program in RAM -- in that case you can place only the most performance critical sections of your program in RAM.

@therealprof you can put the vector table in RAM but you need to adjust the VTOR register accordingly. After reset VTOR will always be 0x0 (Flash), I think, so you need to change VTOR in the boot code.

@therealprof
Copy link
Contributor Author

@therealprof you can put the vector table in RAM but you need to adjust the VTOR register accordingly.

I'm not working too much with Cortex-M4... ;)

@japaric
Copy link
Member

japaric commented Nov 25, 2017

I'm not working too much with Cortex-M4

I linked the M4 documentation because that was the first google hit :P. But it's also present on the M3 and seems to be optional on the M0+. No mention about it for the M0 so I guess it doesn't exist there. In any case, it's not necessary to put the vector table in RAM -- unless you want to able to modify it at runtime.

@therealprof
Copy link
Contributor Author

I haven't seen a M0/M0+ device that had it implemented at least. I do have plenty of M3 and higher devices, too but I haven't looked too deeply into those architecture details. ;) But I agree that this is mostly a feature one would need for dynamic interrupt routine changes and I haven't seen a flashing implementation that used interrupts so far...

japaric referenced this issue in rust-embedded/cortex-m-rt Aug 31, 2018
this attribute lets you place functions in RAM

closes #42
@achan1989
Copy link

* You have _two_ RAM regions and you want to place some code in one region and some other code in the other region. (Does this scenario even occur in practice?)

We're about to do this in my day job. Useful when you want the performance of CCRAM, but not all program code fits into it.

@hannobraun
Copy link
Member

I had the need to run some code from RAM recently and I figured I'd leave my experiences here. Maybe it will inform any future work on this feature, maybe it will help someone else who's having the same problems, or maybe someone will show up and tell me I'm doing it wrong (hopefully it's the last one :-) ).

So, I created a Rust function with #[link_section = ".data"] and #[inline(never)], as proposed by @japaric. This worked, as in it placed the function into the .data section. However, I found the result to be completely useless.

First, I got linker errors whenever I wanted to do anything in the function:

relocation R_ARM_THM_CALL out of range: -402628917 is not in [-16777216, 16777215]

Switching to the GNU linker, which gave me this:

relocation truncated to fit: R_ARM_THM_CALL against symbol `core::panicking::panic'

It's been a long time since I wrote ARM assembler, but I remember that there are various ways to call a function, and that at least one way involves jumping to a relative address. I understand these error messages to mean that the compiler generated such a relative jump, but that the symbol it wanted to jump to turned out to be too far away. That makes sense to me, since this is a RAM function calling a Flash function.

I didn't spend any time trying to get this to work, as calling any Flash function is unacceptable for my use case. I'm trying to program a page of Flash memory, and any access to Flash while this is ongoing will interrupt the process.

Problem is, you can't write a lot of Rust code without running into a call to some function that I don't control and thus can't slap a link_section onto. Iterating over a slice, pointer arithmetic, everything involves function calls. In debug mode, you can't even write 1 + 1 without getting a reference to panic.

I wasn't able to write anything useful in release mode either, and I didn't dive too deeply into that. I figured that any solution I can find would be dependent on compiler implementation details and could break any time.

I then considered inline assembler, but the crate I'm working on currently compiles on stable, and I don't think changing that would be acceptable. I ended up writing my function in C and linking that into the Rust code.

(Aside: Putting link_section on an extern fn will do nothing, without any warning or error. Do your linker magic in C code, for example with __attribute__((section(".data"))), if you're using GCC.)

To summarize, writing Rust code that is guaranteed to run in RAM seems to be pretty much impossible at this point. The solution proposed here (and presumably the one implemented in rust-embedded/cortex-m-rt#100) may help with some performance optimization, but it is useless if executing from Flash is not acceptable.

@therealprof
Copy link
Contributor Author

Problem is, you can't write a lot of Rust code without running into a call to some function that I don't control and thus can't slap a link_section onto. Iterating over a slice, pointer arithmetic, everything involves function calls. In debug mode, you can't even write 1 + 1 without getting a reference to panic.

Not sure what you expect us to do about that. That's not what this approach is supposed to address.

Indeed if you cannot allow any flash access at all you're screwed. Usually I would expect that you write to one flash region while you execute from the other. It would be great to have a detailed control about the inlining of crate code per crate and I've mentioned that before in various places but if you need to have your code in RAM and you cannot use any flash, your only chance is to do what flashers are doing: Loading a pre-compiled binary blob into RAM and executing that.

@hannobraun
Copy link
Member

Not sure what you expect us to do about that.

I don't expect you to do anything, and I'm sorry if it came off like that.

Indeed if you cannot allow any flash access at all you're screwed.

Yes, but that wasn't obvious to me after reading the comments in this issue, or in rust-embedded/cortex-m-rt#100. At the very least, my comment may prevent someone with the same problem from wasting their time.

Usually I would expect that you write to one flash region while you execute from the other.

Yes. The STM32L0 I'm working with does have two flash banks, and erasing/writing one while executing from the other is possible. However, to write a half-page, I first need to write all 16 words to the memory controller, and any other Flash access (regardless of bank) will interrupt the process.

your only chance is to do what flashers are doing: Loading a pre-compiled binary blob into RAM and executing that.

That's what I ended up doing.

@eldruin
Copy link
Member

eldruin commented Nov 28, 2019

@hannobraun I think it would be a good idea to document these limitations directly in the rust-embedded/cortex-m-rt#100 RFC so that it is clear to people in the future what is actually executed from RAM and what is not when using #[ramfunc].

@hannobraun
Copy link
Member

@eldruin I posted there and linked to my comment. Not sure what the status of that PR is though.

@eldruin
Copy link
Member

eldruin commented Nov 28, 2019

@hannobraun Ah, I noticed now that the RFC does not include a separate RFC document as otherwise usual. I am not sure where to add it either; maybe directly in the docs of an example or so.

@adamgreig adamgreig transferred this issue from rust-embedded/cortex-m-rt Jan 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants