Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ticker - kl25z bugfix for handling events in the past #3982

Merged
merged 3 commits into from
Apr 6, 2017

Conversation

mjrgh
Copy link
Contributor

@mjrgh mjrgh commented Mar 21, 2017

Description

Ticker can cause a crash if you disable interrupts for longer than the Ticker object's interval, if you have at least one other timed event in the system.

This problem has two causes.

  1. ticker_insert_event() leaves the event linked list in an inconsistent state before calling set_interrupt(), which can recurse back into ticker_insert_event(), which will again modify the list. This can cause crashes in different ways depending on the particular path through the code, but with the Ticker setup described above, it will usually create a cycle in the list (head->A->B->A->B...) that causes an infinite loop the next time someone tries to traverse the list.

The fix is to complete all manipulations to the list pointers BEFORE calling set_interrupt().

  1. TARGET_KLXX/us_ticker.c has a separate problem that isn't quite as much an outright error, but is still problematic. In us_ticker_set_interrupt(), the routine calls the interrupt handler recursively if the event is in the past. This is the actual thing that triggered problem Build USB libs with GCC_ARM #1 above - ticker_insert_event() calls us_ticker_set_interrupt() to set the interrupt, and for an event in the past, that calls the handler, which for a Ticker schedules the next instance of the event by calling ticker_insert_event(). But the recursion is a separate problem if you have a Ticker that gets behind real-time. The problem is that the whole backlog of past calls gets piled on the stack through recursive calls, so if you have more than a few, it can blow the stack. A separate potential problem is that the interrupt handler might be written to assume that it's running in interrupt context, so it might not properly protect statics when running from thread context.

The fix here is to force a past timestamp slightly (1us) into the future. This makes the routine handle past and future events uniformly; in both cases, we schedule an interrupt and call the handler on the interrupt. This resolves any recurring event backlogs iteratively rather than recursively, eliminating the risk of unbounded stack usage. It also calls the callback uniformly in ISR context. The runtime timing is about the same as the original.

Migrations

No API changes

Related PRs

No other changes required

Todos

No to-do items

Deploy notes

It's just an internal bug fix; no deployment changes required

Steps to test or reproduce

On KL25Z:

  • Create a couple of Ticker objects, one with a 1ms interval
  • Disable interrupts (__disable_irq())
  • Spin for about 5ms
  • Re-enable interrupts (__enable_irq())
  • Program will crash by way of an infinite loop in ticker_insert_event()

ticker_insert_event() can crash on KLXX (and probably other platforms) if an event is inserted with a timestamp before the current real time.  

The problem is easy to trigger:  you just need to set up a Ticker object, and then disable interrupts for slightly longer than the Ticker object's interval.  It's generally bad practice to disable interrupts for too long, but there are some cases where it's unavoidable, and anyway it would be better for the core library function not to crash.  The case where I had an unavoidably long interrupts-off interval was writing flash with the FTFA.  The FTFA hardware prohibits flash reads while an FTFA command is in progress, so interrupts must be disabled for the whole duration of each command to ensure that there are no instruction fetches from flash-resident ISRs in the course of the execution.  An FTFA "erase sector" command takes a fairly long time (milliseconds), and I have a fairly high frequency Ticker (1ms).

The problem and the fix are pretty straightforward.  ticker_insert_event() searches the linked list to figure out where to insert the new event, looking for a spot earlier than any event currently queued.  If the event is in the past, it'll usually end up at the head of the list.  When the routine sees that the new event belongs at the head of the list, it calls data->interface->set_interrupt() to schedule the interrupt for the event, since it's the new soonest event.  The KLXX version of us_ticker_set_interrupt() then looks to see if the event is in the past, which we've stipulated that it is, so rather than actually setting the interrupt, it simply calls the handler directly.  The first thing the Ticker interrupt handler does is re-schedule itself, so we re-enter ticker_insert_event() at this point.  This is where the problem comes in:  we didn't finish updating the linked list before we called set_interrupt() and thus before we recursed back into ticker_insert_event().  We set the head of the list to the new event but we didn't set the new event's 'next' pointer.

The fix is simply to finish updating the list before we call set_interrupt(), which we can do by moving the obj->next initialization ahead of the head pointer update.
@@ -74,15 +74,17 @@ void ticker_insert_event(const ticker_data_t *const data, ticker_event_t *obj, t
prev = p;
p = p->next;
}

/* if we're at the end p will be NULL, which is correct */
obj->next = p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the git history? I recall this was already moved from this location down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been the same since at least rev 107. If it was moved down, whoever did it didn't think it through clearly enough, because it's clearly an error: it leaves the list in an inconsistent state before making a call that can end up back in a function here that traverses or manipulates the list. This change looks plainly correct to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not needed if us_ticker_set_interrupt does nothing more than set the interrupt.
Anyway, it shouldn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observe that this is in the generic HAL and us_ticker_set_interrupt is KL25Z specific. So "if us_ticker_set_interrupt does nothing more than set the interrupt" is only true for KL25Z and only after this change. And in any case, it's still not a good idea to leave a global resource in a corrupted state before calling an external function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still not a good idea to leave a global resource in a corrupted state before calling an external function.

100% agreed.

Observe that this is in the generic HAL and us_ticker_set_interrupt is KL25Z specific. So "if us_ticker_set_interrupt does nothing more than set the interrupt" is only true for KL25Z and only after this change

It is partially true: us_ticker_set_interrupt is indeed not supposed to call us_ticker_irq_handler but it is difficult to write compliant function because the description of hal functions doesn't cover/define details such as side effect allowed by implementation or expected behaviours for corner cases.

return;
}

// This event was in the past. Force it into the very near
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be better to set timer peripheral flag (pending IRQ, if it is available?) ? However, calling irq handle should not happen here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is undecidable to know if an event if in the past or in the future because the counter wrap.
Given that information int delta = (int)((uint32_t)timestamp - us_ticker_read()); can be negative for a lot of valid value (whenever timestamp >= us_ticker_read() + 0x80000000 ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a limitation of the API, and it's not related to the change in question, as that's the test the existing code uses. (But for what it's worth, it's not valid in the API to create events more than 2^31-1 us in the future in the first place - this is mentioned in at least a few of the time-related APIs.)

@0xc0170 0xc0170 changed the title Patch 3 Ticker - kl25z bugfix for handling events in the past Mar 22, 2017
@0xc0170
Copy link
Contributor

0xc0170 commented Mar 22, 2017

cc @pan-

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 22, 2017

@mjrgh Have you signed https://developer.mbed.org/contributor_agreement/ ? I can't locate you on mbed developer page

@mjrgh
Copy link
Contributor Author

mjrgh commented Mar 22, 2017

I did sign the agreement a couple of years ago - my username on mbed is mjr.

Mike

@pan-
Copy link
Member

pan- commented Mar 27, 2017

@mjrgh @0xc0170 There is no events in the past. timestamp_t wrap, so all events set are in the future.
The question is what to do with events supposed to be scheduled immediately and how we can differentiate them from events happening in a far future.

ticker_insert_event() leaves the event linked list in an inconsistent state before calling set_interrupt(), which can recurse back into ticker_insert_event(), which will again modify the list. This can cause crashes in different ways depending on the particular path through the code, but with the Ticker setup described above, it will usually create a cycle in the list (head->A->B->A->B...) that causes an infinite loop the next time someone tries to traverse the list.

I don't think us_ticker_set_interrupt should directly call the interrupt handler which will call the event handler. I believe it should schedule the interrupt, nothing more.

If the event should be handled immediately, then the interrupt can be manually triggered in us_ticker_set_interrupt. The IRQ will be executed once IRQ are enabled again, at the end of ticker_insert_event.

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 29, 2017

@mjrgh @0xc0170 There is no events in the past. timestamp_t wrap, so all events set are in the future.
The question is what to do with events supposed to be scheduled immediately and how we can differentiate them from events happening in a far future.

The question that we should answer with an update to tickers .

I don't think us_ticker_set_interrupt should directly call the interrupt handler which will call the event handler. I believe it should schedule the interrupt, nothing mor

@mjrgh is it possible for this target's ticker to set pending IRQ flag ? Or it's not, therefore you changed it to setting delta to 1?

To summarize: Anything left for this PR to get in @pan- ?

@0xc0170
Copy link
Contributor

0xc0170 commented Mar 29, 2017

/morph test

@mbed-bot
Copy link

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 1783

All builds and test passed!

@pan-
Copy link
Member

pan- commented Apr 6, 2017

@0xc0170 This PR can go in, it is in line with the existing code.

@sg- sg- merged commit 1069dfc into ARMmbed:master Apr 6, 2017
aisair pushed a commit to aisair/mbed that referenced this pull request Apr 30, 2024
Ports for Upcoming Targets

3841: Add nRf52840 target ARMmbed/mbed-os#3841
3992: Introducing UBLOX_C030 platform. ARMmbed/mbed-os#3992

Fixes and Changes

3951: [NUCLEO_F303ZE] Correct ARDUINO pin ARMmbed/mbed-os#3951
4021: Fixing a macro to detect when RTOS was in use for the NRF52840_DK ARMmbed/mbed-os#4021
3979: KW24D: Add missing SPI defines and Arduino connector definitions ARMmbed/mbed-os#3979
3990: UBLOX_C027: construct a ticker-based wait, rather than calling wait_ms(), in the  ARMmbed/mbed-os#3990
4003: Fixed OBOE in async serial tx for NRF52 target, fixes #4002 ARMmbed/mbed-os#4003
4012: STM32: Correct I2C master error handling ARMmbed/mbed-os#4012
4020: NUCLEO_L011K4 remove unsupported tool chain files ARMmbed/mbed-os#4020
4065: K66F: Move bss section to m_data_2 Section ARMmbed/mbed-os#4065
4014: Issue 3763: Reduce heap allocation in the GCC linker file ARMmbed/mbed-os#4014
4030: [STM32L0] reduce IAR heap and stack size for small targets ARMmbed/mbed-os#4030
4109: NUCLEO_L476RG : minor serial pin update ARMmbed/mbed-os#4109
3982: Ticker - kl25z bugfix for handling events in the past ARMmbed/mbed-os#3982
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants