Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add concept of priority to events and EventQueue #10263

Closed
0Grit opened this issue Mar 28, 2019 · 28 comments
Closed

Add concept of priority to events and EventQueue #10263

0Grit opened this issue Mar 28, 2019 · 28 comments

Comments

@0Grit
Copy link

0Grit commented Mar 28, 2019

Description

Been rattling this one around for a while.

Some people tend toward async API's and programming.
Prioritized events would make that easier versus needing to use multiple event queues and extra logic.

Also it might result in reduced RAM/flash resource usage for those applications having no need for threading beyond the natural thread and interrupt contexts provided by the core?

Related Issues/Comments:
PelionIoT/simple-mbed-cloud-client#81 (comment)
#10071
#9800 (comment)
#9077
ARMmbed/mbed-os-5-docs#1004 (comment)

Issue request type

[ ] Question
[x] Enhancement
[ ] Bug
@cmonr
Copy link
Contributor

cmonr commented Mar 28, 2019

@geky ?

@ciarmcom
Copy link
Member

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1087

@40Grit
Copy link

40Grit commented Mar 29, 2019

@janjongboom

@0Grit
Copy link
Author

0Grit commented Mar 29, 2019

@kjbracey-arm Trying to determine "philosophically" how this applies to cmsis-rtos, rtx, and mbed os rtos api's.

@0Grit
Copy link
Author

0Grit commented Mar 29, 2019

#10256

@kjbracey
Copy link
Contributor

kjbracey commented Apr 1, 2019

Possibly useful - Nanostack's event loop does have a priority scheme, as do many others. EventQueue seems to be in the minority here.

It doesn't remove the need for separate global and high-priority queues though - those two have incompatible notions of runtime and latency. People on the global queue routinely take 10s of milliseconds in one event, whereas people on the high priority queue are expecting sub-millisecond latency.

I think it's mainly useful to distinguish "bulk" work from more urgent work - eg a long-running crypto verification versus network packet handling, and that's what Nanostack historically needed (especially when running on much slower hardware where ECC could take minutes).

Trying to determine "philosophically" how this applies to cmsis-rtos, rtx, and mbed os rtos api's.

Seems orthogonal. EventQueue is a standalone API that works in any environment (like the Nanostack event queue). The ability is probably equally useful in all, I think. CMSIS-RTOS 2 does have its own MessageQueue which does have a priority.

@janjongboom
Copy link
Contributor

@kjbracey-arm @geky Wouldn't the next Mbed major release be a good moment to re-evaluate having two event queues in the same OS? Why can't we get rid of one? F.e. in Pelion Client we're using both (!) that's wasteful.

@kjbracey
Copy link
Contributor

kjbracey commented Apr 1, 2019

As per the discussion on #10256, they can be integrated via a JSON option so they're running on the same thread, which eliminates almost all the RAM overhead. What remains is the code overhead of multiple stacked-up abstraction layers, but the NanoStack abstraction layer is pretty thin - it's far from the biggest offender in the system. Main thing would be to make it not have its own timing queue.

Pelion Client and Nanostack both need to run on non-Mbed OS platforms, which means they need an event queue API they can take with them (and is usable from C) - we're not currently maintaining EventQueue/equeue as a separate portable library for that purpose, although I'm sure it would be doable. The NS event queue API is far more integral to that code base than EventQueue is to Mbed OS, so migrating to direct use of a different API is not a small job.

equeue would also need a couple of functional upgrades too - Nanostack does require the priority handling suggested here, as well as the ability to do static allocation as per #9172.

I think there are probably other easier savings to be made in other parts of the system - for example lwIP always makes its own thread, and can't share an event queue at all - that's much worse than Nanostack/Pelion client sharing an event queue through an abstraction layer
.

@40Grit
Copy link

40Grit commented Apr 1, 2019

Regarding the need for high/low latency queues.

My mind wandered to interruptable events but then the natural synchronization of data would be lost.

Though I suppose that would already be the case if high/low queue approach is used with 2 threads?

@kjbracey
Copy link
Contributor

kjbracey commented Apr 1, 2019

If you want an event to be interrupted for higher-priority work, then you can do that with a higher-priority thread or an interrupt. The higher-priority thread can be running its own event queue. And that's the model for the pair of shared normal and high-priority event queues. Having two queues of different priority is more useful than 1 queue with events of different priorities.

As soon as things are interruptible, you will need more stack space, regardless of how you do it - threads or multiple-level interrupts on one stack. Stack space needs to be sum of maximum stack space for the interruptee and the interrupter combined.

@40Grit
Copy link

40Grit commented Apr 1, 2019

And I assume even if all work we're broken down into the minimum required time base required to have 1 queue, that there would be overhead in the constant switching of events?

@kjbracey
Copy link
Contributor

kjbracey commented Apr 1, 2019

Yes. But it's something you can try to do.

Breaking up big things so they fit on the normal event queue is doable - for example breaking up a big multi-second crypto operation into 10s-of-millisecond chunks, and then queuing it as a series of low-priority events, rather than having a separate low-priority thread/queue. Does involve a lot of context-save/restore work.

But breaking up normal-priority stuff into a series of small bits that voluntarily yield to real-time speed stuff is not feasible in a conventional system. A single serial "printf" or a SD card access would be too long.

In practice the minimum viable 1-queue case is to have one normal-priority thread, and use interrupts directly for drivers instead of the high-priority one. Then you have to make sure all your interrupt work is small enough to not cause interrupt latency problems, but that's more viable than making all your normal work small enough to not cause real-time event latency problems.

Event-driven programming is hard enough without being forced to keep your runtime down to under a millisecond. Just doing event queues at all was found to be too complicated for most users, hence the abandonment of Mbed OS 3 and MINAR. Maybe we will end up in a middle ground where all the system stuff is event-driven, and the RTOS is just there for users, but I can't help thinking it might have been easier to get there by just bolting an RTOS onto Mbed OS 3.

The dual shared queue I advocated for Mbed OS 5 is a compromise where you're not all the way down at the 2-level world of bare metal or MINAR, but you have 3 levels - normal work, high-priority driver work, and really hard realtime interrupt work. We get to reserve the interrupts for serious stuff like the UART, by using the RTOS to give us 1 extra high-priority thread. So using the RTOS and its priority scheme, but minimally, to avoid a recurring problem with bare metal and MINAR of network drivers causing loss of UART traffic.

@40Grit
Copy link

40Grit commented Apr 1, 2019

So

  • The event system provided by cmsis rtos V2 / RTX is not used by Mbed OS?

  • Nanostack has its own event and queuing system which supports priority?

  • Pelion client uses nanostack's queing system?

  • PAL abstracts queing for Pelion client, or do bits of nanostack always get built into Pelion client?

  • Providing loose coupling or API's to both Pelion client and nanostack might provide better developer experience but will add some amount of codespace overhead?

@kjbracey
Copy link
Contributor

kjbracey commented Apr 1, 2019

The event system provided by cmsis rtos V2 / RTX is not used by Mbed OS?

The "message queue"? No. We do have a C++ Queue wrapper for it, at least. I'm not really familiar with the higher-level APIs of CMSIS-RTOS like that, so not sure how applicable they are.

Nanostack has its own event and queuing system which supports priority? Pelion client uses nanostack's queing system?

Yes, and yes. They share the same bare-metal heritage, so were designed to share 1 event queue.

PAL abstracts queing for Pelion client, or do bits of nanostack always get built into Pelion client?

PAL covers a bunch of higher-level stuff like sockets and threads, but doesn't further abstract the event queue - Pelion uses the Nanostack event queue implementation and API as-is. There needs to be the relevant bit of portability glue for the Nanostack event queue, so there is a nanostack-hal-pal providing that glue - running the Nanostack event queue on top of the PAL's thread API. That is used when porting Pelion Client to other systems, but in Mbed OS we use the direct nanostack-hal-mbed-cmsis-rtos, which offers the "integrate with Mbed OS EventQueue" option, alongside the extra bits needed by Nanostack itself that Pelion Client doesn't use. PAL is Pelion Client only - Nanostack just uses the event loop and the "Nanostack HAL" (which is very minimal compared to PAL).

In Mbed OS 3 there was more integration - we went beyond the "nanostack-hal" type integration and had a complete separate reimplementation of the Nanostack event API using MINAR. That was tighter, but hasn't been done for Mbed OS 5 EventQueue. It would need the priority, for starters.

Tighter event loop integration could in principle reduce code size, but I think there's more easy pickings in PAL.

@40Grit
Copy link

40Grit commented Apr 18, 2019

@cmonr please re-open

Still awaiting @geky's input

@40Grit
Copy link

40Grit commented Apr 24, 2019

@0xc0170 @linlingao

@linlingao
Copy link
Contributor

@geky Can you comment so that we can determine what action we should take next on this request?

@40Grit
Copy link

40Grit commented May 29, 2019

The dual shared queue I advocated for Mbed OS 5 is a compromise where you're not all the way down at the 2-level world of bare metal or MINAR, but you have 3 levels - normal work, high-priority driver work, and really hard realtime interrupt work. We get to reserve the interrupts for serious stuff like the UART, by using the RTOS to give us 1 extra high-priority thread. So using the RTOS and its priority scheme, but minimally, to avoid a recurring problem with bare metal and MINAR of network drivers causing loss of UART traffic.

@kjbracey-arm I read through this again.
Why would UART traffic ever be lost? I'd think DMA should enable zero copy buffers and eliminate any issues that an ISR popping RX data into queue doesn't already solve.

@kjbracey
Copy link
Contributor

I'd think DMA should enable zero copy buffers

You would think, yes. But in practice, as you'll find if you Google "UART + DMA + pretty much any device Mbed OS supports", there are a bunch of issues with the DMA. Often reprogramming the DMA to advance to the next buffer can induce character loss, or needs to be done with the same 1-character IRQ latency as if you had no DMA. :(

The DMA on various devices can probably be coerced to handle various special cases, maybe framed data if you know there's zero line errors, but for an arbitrary data stream, it's proved problematic.

@40Grit
Copy link

40Grit commented May 29, 2019

@kjbracey-arm I'm having trouble making sense of it still. Why would my ISR ever be so long as to cause a UART receive miss.

@kjbracey
Copy link
Contributor

Only takes 80 microseconds interrupt latency to lose a character at 115200 baud. The UART is one of the most common devices to suffer loss from interrupt latency - because these devices tend not to have hardware FIFOs, and the DMA is not used and/or not as good as a FIFO.

@40Grit
Copy link

40Grit commented May 29, 2019

So silicon vendors ISR's are not helping the situation I presume? I still feel like any MCU that has been specced for a 115200 baud application should consider 80us an eternity.
What would the tradeoff be increasing the UART interrupt priority?

@kjbracey
Copy link
Contributor

kjbracey commented May 29, 2019

Using multiple interrupt priorities increases stack space, because interrupts can nest on the IRQ stack.

The more serious problem is API complication - things make themselves IRQ-safe to all interrupts by disabling all IRQs. That's what enter/exit critical does.

And those routines can often add up to be the things that disable IRQs for 80us - it's enter critical for a long time, rather than an actual IRQ handler. They'd need to be modified to only mask certain levels of interrupts, which in turn would mean they wouldn't just be "IRQ-safe", but "IRQ level XXX safe". They'd be callable from some IRQ routines, but not serial IRQ routines. Gets complicated to explain.

And ARMv6 (and ARMv8 base) don't permit level-based masking, IIRC.

@40Grit
Copy link

40Grit commented May 29, 2019

Makes sense why I tend to create my own critical section functions which selectively disable interrupts. This however uses my application specific knowledge.

Feasibility of giving a tick high priority and using it to timeout critical section s that take longer than x?

@geky
Copy link
Contributor

geky commented Jul 23, 2019

Hi, sorry about not responding to this for so long. I've been backlogged for a while and only recently been able to more or less get on top of things. Better late than never (hopefully)?

Also sorry if I don't have the full context, I'm really behind in all of this.

Technology wise, adding priorities to equeue would be easy. The events are sorted by a simple comparison function and it'd be easy to extend this to include, say, an 8-bit priority. It would cost RAM, a byte for each event, but there's unaligned members we could abuse.

The reason I've been against adding priority is because it's not "true" priority. As @kjbracey-arm mentions "high-priority" tasks are still be blocked by "low-priority" tasks unless you have preemption.

It seems easy for users to get stuck with a flawed system where high-priority events don't get a chance to run, but not be obvious to the user.

From my (limited) experience, the correct way to create priorities is with separate threads, or on baremetal, with separate interrupts. Each "thread" gets its own event queue and high-priority events are immediately executed, preventing the risk of low-priority-induced jitter.

Relatedly, this is what equeue_background is for. equeue_background is a sort of inverted equeue_dispatch that lets you run an event queue on a timer without an explicit thread.

Outside of mbed-os, the most common system I've used equeue on is simple embedded applications with 1. a main thread, and 2. high-priority events running on a hardware timer.

This is mirrored in mbed-os with the default queue and the high priority queue, maybe we should provide a high-priority queue in baremetal?


Anyways, my question becomes, if users can provide true priorities via hardware or preemption (kinda also hardware). What is the reason for providing priorities in the event queue?

Is it ease of use? Resources?

I've been against priorities for so long now as a part of trying to prevent feature creep, but I'm starting to falter. If it makes things easier for users because it's their first idea, maybe we should just add priorities?

@0Grit
Copy link
Author

0Grit commented Jul 23, 2019

image

I spent a very short amount of time reading into whether armv8-baseline (M23) supports interrupt priorities.
@kjbracey-arm correct me if I am wrong but it looked like there are a couple of priority levels.

My initial dislike of threading has always been based on having to synchronize them. Shows my lack of formal and informal CS experience. Obviously preemption brings back the need to synchronize and adds special considerations of its own so this is a bit of a personal thought experiment.

In a preemption scenario the high priority queue would execute in interrupt-priority-x context correct?
In the end this is what a scheduler and context switching is doing anyway by interrupting changing stackpointer etc.?

@0Grit
Copy link
Author

0Grit commented Jul 23, 2019

And yes, thinking further; unless equeue started interacting directly with interrupt priority, saving the context off , and loading the new context then there would be no point in implementing priority.

Not to mention the equeue controlling interrupt priorities scenario starts to smell like threading anyway.

@ciarmcom
Copy link
Member

Thank you for raising this issue. Please note we have updated our policies and
now only defects should be raised directly in GitHub. Going forward questions and
enhancements will be considered in our forums, https://forums.mbed.com/ . If this
issue is still relevant please re-raise it there.
This GitHub issue will now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants