-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ticker - kl25z bugfix for handling events in the past #3982
Conversation
ticker_insert_event() can crash on KLXX (and probably other platforms) if an event is inserted with a timestamp before the current real time. The problem is easy to trigger: you just need to set up a Ticker object, and then disable interrupts for slightly longer than the Ticker object's interval. It's generally bad practice to disable interrupts for too long, but there are some cases where it's unavoidable, and anyway it would be better for the core library function not to crash. The case where I had an unavoidably long interrupts-off interval was writing flash with the FTFA. The FTFA hardware prohibits flash reads while an FTFA command is in progress, so interrupts must be disabled for the whole duration of each command to ensure that there are no instruction fetches from flash-resident ISRs in the course of the execution. An FTFA "erase sector" command takes a fairly long time (milliseconds), and I have a fairly high frequency Ticker (1ms). The problem and the fix are pretty straightforward. ticker_insert_event() searches the linked list to figure out where to insert the new event, looking for a spot earlier than any event currently queued. If the event is in the past, it'll usually end up at the head of the list. When the routine sees that the new event belongs at the head of the list, it calls data->interface->set_interrupt() to schedule the interrupt for the event, since it's the new soonest event. The KLXX version of us_ticker_set_interrupt() then looks to see if the event is in the past, which we've stipulated that it is, so rather than actually setting the interrupt, it simply calls the handler directly. The first thing the Ticker interrupt handler does is re-schedule itself, so we re-enter ticker_insert_event() at this point. This is where the problem comes in: we didn't finish updating the linked list before we called set_interrupt() and thus before we recursed back into ticker_insert_event(). We set the head of the list to the new event but we didn't set the new event's 'next' pointer. The fix is simply to finish updating the list before we call set_interrupt(), which we can do by moving the obj->next initialization ahead of the head pointer update.
@@ -74,15 +74,17 @@ void ticker_insert_event(const ticker_data_t *const data, ticker_event_t *obj, t | |||
prev = p; | |||
p = p->next; | |||
} | |||
|
|||
/* if we're at the end p will be NULL, which is correct */ | |||
obj->next = p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked the git history? I recall this was already moved from this location down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been the same since at least rev 107. If it was moved down, whoever did it didn't think it through clearly enough, because it's clearly an error: it leaves the list in an inconsistent state before making a call that can end up back in a function here that traverses or manipulates the list. This change looks plainly correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not needed if us_ticker_set_interrupt
does nothing more than set the interrupt.
Anyway, it shouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observe that this is in the generic HAL and us_ticker_set_interrupt is KL25Z specific. So "if us_ticker_set_interrupt does nothing more than set the interrupt" is only true for KL25Z and only after this change. And in any case, it's still not a good idea to leave a global resource in a corrupted state before calling an external function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still not a good idea to leave a global resource in a corrupted state before calling an external function.
100% agreed.
Observe that this is in the generic HAL and us_ticker_set_interrupt is KL25Z specific. So "if us_ticker_set_interrupt does nothing more than set the interrupt" is only true for KL25Z and only after this change
It is partially true: us_ticker_set_interrupt
is indeed not supposed to call us_ticker_irq_handler
but it is difficult to write compliant function because the description of hal functions doesn't cover/define details such as side effect allowed by implementation or expected behaviours for corner cases.
return; | ||
} | ||
|
||
// This event was in the past. Force it into the very near |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be better to set timer peripheral flag (pending IRQ, if it is available?) ? However, calling irq handle should not happen here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is undecidable to know if an event if in the past or in the future because the counter wrap.
Given that information int delta = (int)((uint32_t)timestamp - us_ticker_read());
can be negative for a lot of valid value (whenever timestamp >= us_ticker_read() + 0x80000000
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a limitation of the API, and it's not related to the change in question, as that's the test the existing code uses. (But for what it's worth, it's not valid in the API to create events more than 2^31-1 us in the future in the first place - this is mentioned in at least a few of the time-related APIs.)
cc @pan- |
@mjrgh Have you signed https://developer.mbed.org/contributor_agreement/ ? I can't locate you on mbed developer page |
I did sign the agreement a couple of years ago - my username on mbed is mjr. Mike |
@mjrgh @0xc0170 There is no events in the past. timestamp_t wrap, so all events set are in the future.
I don't think If the event should be handled immediately, then the interrupt can be manually triggered in |
The question that we should answer with an update to tickers .
@mjrgh is it possible for this target's ticker to set pending IRQ flag ? Or it's not, therefore you changed it to setting delta to 1? To summarize: Anything left for this PR to get in @pan- ? |
/morph test |
Result: SUCCESSYour command has finished executing! Here's what you wrote!
OutputAll builds and test passed! |
@0xc0170 This PR can go in, it is in line with the existing code. |
Ports for Upcoming Targets 3841: Add nRf52840 target ARMmbed/mbed-os#3841 3992: Introducing UBLOX_C030 platform. ARMmbed/mbed-os#3992 Fixes and Changes 3951: [NUCLEO_F303ZE] Correct ARDUINO pin ARMmbed/mbed-os#3951 4021: Fixing a macro to detect when RTOS was in use for the NRF52840_DK ARMmbed/mbed-os#4021 3979: KW24D: Add missing SPI defines and Arduino connector definitions ARMmbed/mbed-os#3979 3990: UBLOX_C027: construct a ticker-based wait, rather than calling wait_ms(), in the ARMmbed/mbed-os#3990 4003: Fixed OBOE in async serial tx for NRF52 target, fixes #4002 ARMmbed/mbed-os#4003 4012: STM32: Correct I2C master error handling ARMmbed/mbed-os#4012 4020: NUCLEO_L011K4 remove unsupported tool chain files ARMmbed/mbed-os#4020 4065: K66F: Move bss section to m_data_2 Section ARMmbed/mbed-os#4065 4014: Issue 3763: Reduce heap allocation in the GCC linker file ARMmbed/mbed-os#4014 4030: [STM32L0] reduce IAR heap and stack size for small targets ARMmbed/mbed-os#4030 4109: NUCLEO_L476RG : minor serial pin update ARMmbed/mbed-os#4109 3982: Ticker - kl25z bugfix for handling events in the past ARMmbed/mbed-os#3982
Description
Ticker can cause a crash if you disable interrupts for longer than the Ticker object's interval, if you have at least one other timed event in the system.
This problem has two causes.
The fix is to complete all manipulations to the list pointers BEFORE calling set_interrupt().
The fix here is to force a past timestamp slightly (1us) into the future. This makes the routine handle past and future events uniformly; in both cases, we schedule an interrupt and call the handler on the interrupt. This resolves any recurring event backlogs iteratively rather than recursively, eliminating the risk of unbounded stack usage. It also calls the callback uniformly in ISR context. The runtime timing is about the same as the original.
Migrations
No API changes
Related PRs
No other changes required
Todos
No to-do items
Deploy notes
It's just an internal bug fix; no deployment changes required
Steps to test or reproduce
On KL25Z: