Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lp_ticker problem #11373

Closed
Tharazi97 opened this issue Aug 29, 2019 · 7 comments · Fixed by #11385
Closed

Lp_ticker problem #11373

Tharazi97 opened this issue Aug 29, 2019 · 7 comments · Fixed by #11385

Comments

@Tharazi97
Copy link
Contributor

Tharazi97 commented Aug 29, 2019

Description

DISCO_L475VG_IOT01A failed mbed_drivers-lp_ticker on nightly CI so I have looked into its lp_ticker implementation.

static void LPTIM1_IRQHandler(void)
{
core_util_critical_section_enter();
if (lp_Fired) {
lp_Fired = 0;
/* We're already in handler and interrupt might be pending,
* so clear the flag, to avoid calling irq_handler twice */
__HAL_LPTIM_CLEAR_FLAG(&LptimHandle, LPTIM_FLAG_CMPM);
lp_ticker_irq_handler();
}
/* Compare match interrupt */
if (__HAL_LPTIM_GET_FLAG(&LptimHandle, LPTIM_FLAG_CMPM) != RESET) {
if (__HAL_LPTIM_GET_IT_SOURCE(&LptimHandle, LPTIM_IT_CMPM) != RESET) {
/* Clear Compare match flag */
__HAL_LPTIM_CLEAR_FLAG(&LptimHandle, LPTIM_FLAG_CMPM);
lp_ticker_irq_handler();
}
}
if (__HAL_LPTIM_GET_FLAG(&LptimHandle, LPTIM_FLAG_CMPOK) != RESET) {
if (__HAL_LPTIM_GET_IT_SOURCE(&LptimHandle, LPTIM_IT_CMPOK) != RESET) {
__HAL_LPTIM_CLEAR_FLAG(&LptimHandle, LPTIM_FLAG_CMPOK);
lp_cmpok = true;
if(sleep_manager_locked) {
sleep_manager_unlock_deep_sleep();
sleep_manager_locked = false;
}
if(lp_delayed_prog) {
lp_ticker_set_interrupt(lp_delayed_counter);
lp_delayed_prog = false;
}
}
}
#if defined (__HAL_LPTIM_WAKEUPTIMER_EXTI_CLEAR_FLAG)
/* EXTI lines are not configured by default */
__HAL_LPTIM_WAKEUPTIMER_EXTI_CLEAR_FLAG();
#endif
core_util_critical_section_exit();
}
uint32_t lp_ticker_read(void)
{
uint32_t lp_time = LPTIM1->CNT;
/* Reading the LPTIM_CNT register may return unreliable values.
It is necessary to perform two consecutive read accesses and verify that the two returned values are identical */
while (lp_time != LPTIM1->CNT) {
lp_time = LPTIM1->CNT;
}
return lp_time;
}
/* This function should always be called from critical section */
void lp_ticker_set_interrupt(timestamp_t timestamp)
{
core_util_critical_section_enter();
/* Always store the last requested timestamp */
lp_delayed_counter = timestamp;
NVIC_EnableIRQ(LPTIM1_IRQn);
/* CMPOK is set by hardware to inform application that the APB bus write operation to the
* LPTIM_CMP register has been successfully completed.
* Any successive write before the CMPOK flag be set, will lead to unpredictable results
* We need to prevent to set a new comparator value before CMPOK flag is set by HW */
if (lp_cmpok == false) {
/* if this is not safe to write, then delay the programing to the
* time when CMPOK interrupt will trigger */
lp_delayed_prog = true;
} else {
timestamp_t last_read_counter = lp_ticker_read();
lp_ticker_clear_interrupt();
/* HW is not able to trig a very short term interrupt, that is
* not less than few ticks away (LP_TIMER_SAFE_GUARD). So let's make sure it'
* s at least current tick + LP_TIMER_SAFE_GUARD */
for(uint8_t i = 0; i < LP_TIMER_SAFE_GUARD; i++) {
if (LP_TIMER_WRAP((last_read_counter + i)) == timestamp) {
timestamp = LP_TIMER_WRAP((timestamp + LP_TIMER_SAFE_GUARD));
}
}
/* Then check if this target timestamp is not in the past, or close to wrap-around
* Let's assume last_read_counter = 0xFFFC, and we want to program timestamp = 0x100
* The interrupt will not fire before the CMPOK flag is OK, so there are 2 cases:
* in case CMPOK flag is set by HW after or at wrap-around, then this will fire only @0x100
* in case CMPOK flag is set before, it will indeed fire early, as for the wrap-around case.
* But that will take at least 3 cycles and the interrupt fires at the end of a cycle.
* In our case 0xFFFC + 3 => at the transition between 0xFFFF and 0.
* If last_read_counter was 0xFFFB, it should be at the transition between 0xFFFE and 0xFFFF.
* There might be crossing cases where it would also fire @ 0xFFFE, but by the time we read the counter,
* it may already have moved to the next one, so for now we've taken this as margin of error.
*/
if((timestamp < last_read_counter) && (last_read_counter <= (0xFFFF - LP_TIMER_SAFE_GUARD))) {
/* Workaround, because limitation */
__HAL_LPTIM_COMPARE_SET(&LptimHandle, ~0);
} else {
/* It is safe to write */
__HAL_LPTIM_COMPARE_SET(&LptimHandle, timestamp);
}
/* We just programed the CMP so we'll need to wait for cmpok before
* next programing */
lp_cmpok = false;
/* Prevent from sleeping after compare register was set as we need CMPOK
* interrupt to fire (in ~3x30us cycles) before we can safely enter deep sleep mode */
if(!sleep_manager_locked) {
sleep_manager_lock_deep_sleep();
sleep_manager_locked = true;
}
}
core_util_critical_section_exit();
}

It is not protected for the situation where lp_cmpok is set to false, so the lp_delayed_prog is set and the timestamp saved in lp_delayed_counter will be sent to lp_ticker_set_interrupt function via LPTIM1_IRQHandler. This timestamp could get out of date while waiting for CMPOK flag. (I mean the value of timestamp would be less than a value got by lp_ticker_read()) In that situation lp_ticker_set_interrupt function would see this timestamp as an event that has to be handled after turning the clock counter and will set next interrupt to 0xFFFF rather than handling the event right then, or setting an interrupt for after LP_TIMER_SAFE_GUARD time. This could lead to some critical errors.

This implementation is also vulnerable to timestamps close to ticker roll over.

To reproduce I've changed the code of test_multi_ticker to this:

void test_multi_ticker(void)
{
    for(uint64_t k = 0; k<10000; k++)
    {
        ThisThread::sleep_for(750);

        LowPowerTicker ticker[TICKER_COUNT];
        const uint32_t extra_wait = 2; // extra 2ms wait time

        multi_counter = 0;

        for (int i = 0; i < TICKER_COUNT; i++) {
            ticker[i].attach_us(callback(increment_multi_counter), MULTI_TICKER_TIME_MS * 1000);
        }

        ThisThread::sleep_for(MULTI_TICKER_TIME_MS + extra_wait);

        TEST_ASSERT_EQUAL(TICKER_COUNT, multi_counter);

        for (int i = 0; i < TICKER_COUNT; i++) {
            ticker[i].detach();
        }
    }
}

and changed MULTI_TICKER_TIME_MS to 10 to reduce testing time. It should fail like 1 in 1000 times.

I'll probably raise a PR with a proposition of fix soon.

We should consider creating something like "golden stress tests" for crusial low level peripherals which would be run on targets from golden list over night or even manualy after changing implementation.

@maciejbocianski @mprse @fkjagodzinski @jamesbeyond @LMESTM

Issue request type

[ ] Question
[ ] Enhancement
[x] Bug
@mprse
Copy link
Contributor

mprse commented Aug 29, 2019

It would be good to post here debug failure logs with the description what has happened.

@Tharazi97
Copy link
Contributor Author

This is the data collected from lp_ticker_set_interrupt function. timestamp is the value sent to it, current tick is current tick measured by lp_ticker_read(), and third column is the value sent by __HAL_LPTIM_COMPARE_SET or stored in lp_delayed_counter

[1567155957.69][CONN][RXD] timestamp[0]:45062    current_tick:44744   IRQ_set_for_tick:45062
[1567155957.75][CONN][RXD] timestamp[1]:45065    current_tick:45064   IRQ_set_for_tick:45070
[1567155957.81][CONN][RXD] timestamp[2]:45066    current_tick:45065   timestamp_set_in_buffor:45066
[1567155957.88][CONN][RXD] timestamp[3]:45066    current_tick:45067   IRQ_set_for_tick:65535
[1567155957.94][CONN][RXD] timestamp[4]:498      current_tick:272     IRQ_set_for_tick:498
[1567155957.99][CONN][RXD] timestamp[5]:295      current_tick:273     timestamp_set_in_buffor:295

In row [0] at tick 44744 next interrupt is set to tick 45062 ✅
In row [1] at tick 45064 next interrupt is set to tick 45070 because timestamp (45065) is too close to current tick ✅
in row [2] at tick 45065 it wants to set next interrupt to 45066, but lp_cmpok is currenty set to false so it stores the value in buffor and sets lp_delayed_prog flag to true, so it can be set later ✅
in row [3] at tick 45067 it received timestamp 45066 via LPTIM1_IRQHandler so it read it as time in past and set the next interrupt to the end of ticker counter (0xFFFF) ❌

The timestamp got outdated while waiting for lp_cmpok and was not handled correctly.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 30, 2019

cc @ARMmbed/team-st-mcd

@LMESTM
Copy link
Contributor

LMESTM commented Aug 30, 2019

hi - the description is clear and I acknowledge that the problem is probably there.
Thanks for the detailed analysis ...
@Tharazi97 you mentioned a possible upcoming PR - do you confirm ?
ready for review

@Tharazi97
Copy link
Contributor Author

@LMESTM I confirm. I'll test my solution on other targets and after that I will rise a PR

@mprse
Copy link
Contributor

mprse commented Aug 30, 2019

The problem is when we have many tickers set to close values. In the test scenario, we have 16 tickers which are set in the loop to fire at time NOW + 100ms. The common ticker layer operates on us and low-level driver on lp ticker ticks (lp ticker operates at 32 kHz). To simplify the problem I will focus on lp ticker ticks.
Lest assume that we have set 5 tickers to fire at NOW + 100 ticks. The ordered lp ticker event list may look like this (note that time is passing while further ticker events are being added to the list):

Ticker       Lp ticker count [tick]        Scheduled interrupt [tick]
1              0                                      100
2              0                                      100
3              1                                      101
4              1                                      101
5              2                                      102

Let's assume that 100 ticks have elapsed and Ticker 1 interrupt fired. The ticker common layer interrupt handler will go through the list and serve all expired events from the list. This means that the first two events are served and the next interrupt is scheduled to tick 101 (note that at this moment current tick is 100).
The low-level ticker wrapper on STM platforms should add 5 ticks (because next interrupt is too close - hardware limitation), so the interrupt will be scheduled to 106th tick.
But while setting the interrupt the Lp ticker counter increased to 101 and this was caught by the common layer, so interrupt has been fired immediately (by means of fire_interrupt() call). The next 2 tickers were served (3 and 4) by the interrupt handler and the next one is now scheduled to 102nd tick (current tick is 101).
Now we have already set the interrupt at 100th tick, so at 101 tick we are waiting for CMPOK and the 102 value to be set is buffered (note that last scheduled tick is now 106). Lest assume that CMPOK interrupt fires 3 ticks after writing to CMP reg. This means we have CMPOK interrupt at 103 tick and we want to set next interrupt to 102 (in the past). So the low-level ticker wrapper will write 0xFFFF to the compare reg and we're done.

@ciarmcom
Copy link
Member

ciarmcom commented Sep 3, 2019

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants