Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multitasking on M4 core #302

Open
LnnrtS opened this issue Jul 12, 2024 · 2 comments
Open

Allow multitasking on M4 core #302

LnnrtS opened this issue Jul 12, 2024 · 2 comments

Comments

@LnnrtS
Copy link
Collaborator

LnnrtS commented Jul 12, 2024

Generally there seems to be the issue on the M4 side that some processes can block for a long time (like writing to the filesystem) while there are other tasks that need to run within a certain time window (like the the wifi heartbeat or more importantly reading the wifi uart rx buffer). When there are conflicts either try to to relax the requirements (increase buffer sizes, increase timeouts) or try to break up long running tasks into smaller chunks (at the cost of additional complexity for maintaining state between those chunks).

Those strategies obviously have its limits and come with downsides and I think have reached a point where introducing some kind of multitasking (framework) could be the overall better solution. It's probably a larger change I cannot fully judge if it's good to now or even ever but I wanted to just bring it up.

The two main approaches I see are RTOS or coroutines.

For an RTOS, the least instrusive way we would be to create mutltiple threads with different priorities and distribute the pieces that are now called in the superloop into those task. Then we need to check where there is communication and shared data between those tasks and make that thread safe.

I haven't worked with coroutines in c++ but https://github.com/lewissbaker/cppcoro is something I would be looking at first. In contrast to an RTOS, you have to worry less about race conditions and sorts but you manually need to break down long running tasks since they won't be interrupted by the scheduler

@danngreen
Copy link
Member

Generally, yes a RTOS would be nice, but I'd be concerned with the code size. We are using about 230k of 320k max. There is about 162k of rodata that I had to relocate to DDR, at a performance loss since the bus from M4 to DDR is very slow. Also the heap and bss is in DDR to make room for code.

Right now we are using an interrupt-based task system for a few things (reading the controls being the highest priority).
In the main loop we're doing everything else: USB IO, SD Card IO, expander module I2C IO, wifi comm parsing, and ICC message handling.
We have plenty of timer ISRs available, we could put some of these into timer tasks and give them priorities, or use their native interrupts to handle requests.
For instance I think we can have an interrupt fire when an ICC message comes in, so we could group USB and SDIO with that. Expander module IO could be high priority but infrequent (1ms). Wifi comm could happen with the USART2 IRQ or else be on a timer at some configurable priority.

Alternatively, if there was some simple scheduler (bare-bones RTOS) compatible with this chip, we could try using that to have tasks.

Coroutines is interesting. Its such a different approach, I'd want to try them out on other projects before converting an large existing project like this.

@LnnrtS
Copy link
Collaborator Author

LnnrtS commented Jul 18, 2024

The code size of an RTOS is not that much actually. FreeRTOS claims 10kB, which might be very optimistic but shows the general range. It's basically a simple scheduler plus synchronization primitives where you pay for what you use.

What will increase though is RAM usage because each thread will need is own stack. I usually go with 4kB but that depends of course on what is running in these threads. (The M4 also has code in RAM, right)

Doesn't the interrupt based system you describe has the same limitation? If you allow some level of nesting you will extra stack space. Just that there are not fixed stacks but the individual stacks are dynamically stacked on top of each other?
Also if there is some kind of communication between the 'threads' you need to deal with race conditions just that you have nothing from an RTOS to help with that. And the RTOS has the advantage that you can wait for stuff instead of having to poll. Makes nicer code and safes some compute.

Maybe there are still some low-hanging fruits with code size. When I compile the whole M4 binary with -Os code shrinks by almost 100kB. (just compared the elfs with arm-none-eabi-size). Maybe there is even more, haven't looked yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants