Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent LP Ticker Changes Break Tickless #7858

Closed
mattbrown015 opened this issue Aug 22, 2018 · 22 comments · Fixed by #8242
Closed

Recent LP Ticker Changes Break Tickless #7858

mattbrown015 opened this issue Aug 22, 2018 · 22 comments · Fixed by #8242

Comments

@mattbrown015
Copy link
Contributor

mattbrown015 commented Aug 22, 2018

Description

I've just tried to update to today's master and it has broken my software. :-(

I'm building for STM32L4 and using tickless in RTC mode.

I suspect the problem is something to do with recent LP ticker changes.

My software runs, does its initialisation and then just appears to stop. I think it is stuck in a ticker handler.

My current theory is the problem comes from the changes made by this PR, Fixes for tickless and LPTICKER_DELAY_TICKS #7524 and, in particular, this commit Update low power ticker wrapper.

Sorry about being vague but this is a big problem for me and I wanted to get an issue created. I shall continue to investigate and try and add more detail.

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 22, 2018

My current theory is the problem comes from the changes made by this PR, Fixes for tickless and LPTICKER_DELAY_TICKS #7524 and, in particular, this commit Fixes for tickless and LPTICKER_DELAY_TICKS #7524.

If you revert that commit or PR, works as expected?

Sorry about being vague but this is a big problem for me and I wanted to get an issue created. I shall continue to investigate and try and add more detail.

👍
[Mirrored to Jira]

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 22, 2018

cc @c1728p9
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 22, 2018

If you revert that commit or PR, works as expected?

I believe so but it is all a bit complicated and I went around a few circles. I'm currently trying to retrace my steps methodically in order to be sure.
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 22, 2018

I deployed https://github.com/ARMmbed/mbed-os/#7c22dca302e76d9fa1cb41efff31925187a4b876, LoRaWAN: Restoring default RX2 data rate #7831, and built and tested my software without any problems.

Then I deployed https://github.com/ARMmbed/mbed-os/#e02466a77a6a7db6e043b6b87cefe8aa6088f7ec, Fixes for tickless and LPTICKER_DELAY_TICKS #7524, and did a complete rebuild. My software hangs.

Above, I pointed my finger at a particular commit within Fixes for tickless and LPTICKER_DELAY_TICKS #7524 but I haven't double-checked this yet.

I have a further suspicion that the hang occurs when Mbed OS deepsleeps for the first time. I.e. all of the initialisation gets done and then it hangs.
[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Aug 22, 2018

Hi @mattbrown015 can you provide steps or an example project I can use to reproduce this failure? Also what board variant of the STM32L4 are you using?
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 22, 2018

Hi @c1728p9,

I think the most important nugget of info is that I'm using the RTC for my lp ticker i.e. "target.lpticker_lptim": 0. Obviously this isn't the default and hence feels like a likely source of problems.

We've created a custom target which is inherited from NUCLEO_L432KC. The MCU is actually a STM32L442KC but I'm pretty sure the precise MCU isn't important.

I don't have a cut down example for you yet but here's what I've just done:

  1. Deploy 7c22dca and check everything is OK.
  2. git cherry-pick adc64cccacfd93f04f17dd2f01f278c020b8831d i.e. 'Update low power ticker wrapper'
  3. Rebuild and watch it hang
  4. Select LPTIM for the lp ticker i.e. change to "target.lpticker_lptim": 1
  5. This did not appear to make any difference
  6. Power cycle and now it works

So there was some inconsistent behaviour which is making it difficult for me to be certain of anything. I.e. why did it not work until I turned it off and on again? Once possible explanation is that flashing and resetting doesn't actually reset the RTC.

I shall keep investigating for another 30 minutes and then I'm off home! :-)
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 23, 2018

I didn't put STM32L4 in the title because I don't know that only STM32L4s are affected, but it is quite possibly a STM32L4 problem. Hence I wonder if @jeromecoutant has any comment?

I notice that @jeromecoutant has made some RTC changes since adc64cc but I tried being right up to date and that doesn't work either.

I've got meetings this morning but my plan is to get my NUCLEO-L433RC-P and try mbed-os-example-blinky in tickless mode using the RTC as the lp ticker.
[Mirrored to Jira]

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 23, 2018

cc @ARMmbed/team-st-mcd
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 23, 2018

Oh, LPTICKER_DELAY_TICKS!

I added "target.lpticker_delay_ticks": 0 to mbed_app.json and my software no longer hangs. :-)

Is this the correct thing to do?

There's plenty of discussion about LPTICKER_DELAY_TICKS in Fixes for tickless and LPTICKER_DELAY_TICKS #7524 but I'm afraid I don't completely understand.
[Mirrored to Jira]

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 23, 2018

@mattbrown015 What's your target ? What is the current value now?
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Aug 23, 2018

My target is a custom target inherited from NUCLEO_L432KC.

The default value of LPTICKER_DELAY_TICKS on the NUCLEO_L432KC is 3, in common with most of the other STM32 devices.

But, I am also changing MBED_CONF_TARGET_LPTICKER_LPTIM to 0; its default is 1.

So I think my question is...

What is the correct LPTICKER_DELAY_TICKS value when using the RTC as the lp ticker on STM32?
[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Aug 25, 2018

Hi @mattbrown015 I think I was able to reproduce your issue on a NUCLEO_L432KC. By adding the macro MBED_TICKLESS and setting lpticker_lptim to 0 I was able to get the program below to lockup randomly. (Note - I built with the default profile and the ARM toolchain) Is this similar to the failure you were seeing?

To investigate this further I instrumented the ticker wrapper and looked at the signals with a logic analyzer. I found when the program locked up it was actually stuck in sleep mode. I couldn't find anything wrong with the ticker wrapper layer - the 3 cycle delay between back-to-back calls to attach_us was being met. The last ticker related call before entering sleep mode was to lp_ticker_set_interrupt. This makes it seem like this function failed to fire the match interrupt. @jeromecoutant @LMESTM do you have any thoughts on this or what could be going wrong?

Program which reproduces the failure:

#include "mbed.h"

int main() {
    uint32_t count = 0;
    while (true) {
        led1 = !led1;
        wait(0.1);
        count++;
        printf("Count: %i\r\n", count);
    }
}

[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Sep 6, 2018

Hi

I have checked Russ program:

  • default configuration ==> OK
  • add MBED_TICKLESS ==> OK
  • set lpticker_lptim to 0 ==> crash
  • set lpticker_delay_ticks to 0 ==> OK

Note that for other targets where lpticker_lptim is 0 by default,
I already set lpticker_delay_ticks to 0 for M4 targets.

It seems that LPTICKER_DELAY_TICKS is still not compatible with TICKLESS ?

[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Sep 7, 2018

Hi
It seems that if I remove the printf, program is working perfectly even with LPTICKER_DELAY_TICKS
[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 12, 2018

Hi @jeromecoutant I looked into this again, and I suspect the reason the removing the printf or setting LPTICKER_DELAY_TICKS to 0 makes this go away is because its a race condition that is highly timing dependent. By modifying the code and varying the delay time I was able to get this to fail both with LPTICKER_DELAY_TICKS=0 and with LPTICKER_DELAY_TICKS=3 when lpticker_lptim is set to 0.

One thing that we saw a while back, and I noticed with this as well, is if I phsyically touch the 32KHz crystal it stops oscillating while I'm touching it. Do STM32 devices support software controlled capacitors on the clock lines? If so could the rtc backend be configuring the capacitor values differently than the lpt and thus less reliable? When you see the crash you mention is the 32KHz still oscillating?

Code to reproduce the failure in both configurations and without printf:

#include "mbed.h"


DigitalOut led1(LED1);

static void busy_delay(int cycles)
{
    volatile int count = cycles;
    while (count--);
}


int main() {
    uint32_t count = 0;
    while (true) {
        led1 = !led1;
        wait(0.1);
        count = (count + 1) % 128;
        busy_delay(count * 10000);
    }
}

[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 13, 2018

Its not the 32KHz stopping which is the problem. I configured the STM32L432 to output the 32KHz clock on MCO, and I don't see it stop when this failure occurs. Something else must be the cause. For reference the code I used to output the 32KHz clock on D9 is:

    RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_MCOPRE_Msk);
    RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_MCOSEL_Msk) | (7 << RCC_CFGR_MCOSEL_Pos);
    GPIOA->MODER = (GPIOA->MODER & ~GPIO_MODER_MODE8_Msk) | (2 << GPIO_MODER_MODE8_Pos);

[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 14, 2018

I think I found the source of this problem. The rtc implementation of the low power ticker on ST devices can drop match events if they are set at the right time. This causes the device to get stuck sleeping without a wakeup source. This is more pronounced when tickless is enabled because there is no systick interrupt to wake the system if it gets stuck. The setting LPTICKER_DELAY_TICKS make this even more likely to occur, as it filters out spurious low power ticker interrupts (match interrupts which fire too early) which would normally wake the system. I created #8129 to fix this and a tests to catch regressions like this in the future.
[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Sep 14, 2018

Perhaps I'm getting ahead of the game, the PR isn't merged yet, but...

Does this mean the correct value for LPTICKER_DELAY_TICKS on STM32L4 is 3?

And, when the PR is merged I should put my LPTICKER_DELAY_TICKS back to 3?
[Mirrored to Jira]

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 14, 2018

Hi @mattbrown015, when you are using the RTC implementation (lpticker_lptim = 0) I would recommend setting LPTICKER_DELAY_TICKS to 0 so this feature is disabled as it doesn't provide any benefits.

The goal of LPTICKER_DELAY_TICKS is to keep interrupt latency low by prevent blocking from back-to-back calls to lp_ticker_set_interrupt. The RTC implementation does not block from back-to-back calls so there is no reason to use LPTICKER_DELAY_TICKS.

After #8129 is merged the RTC implementation will work reliably with or without LPTICKER_DELAY_TICKS. I would still recommend that you keep it off for the RTC implementation.

[Mirrored to Jira]

@jeromecoutant
Copy link
Collaborator

jeromecoutant commented Sep 25, 2018

Hi @mattbrown015

Please check #8242

As explained there, I think TICKLESS mode could be enabled now with targets with lpticker_lptim set to 1.
If you choose lpticker_lptim=0, it should work as long as you don't change RTC time in your application.

[Mirrored to Jira]

@mattbrown015
Copy link
Contributor Author

mattbrown015 commented Sep 26, 2018

Hi @jeromecoutant,

If you choose lpticker_lptim=0, it should work as long as you don't change RTC time in your application.

Oh...

I need to lpticker_lptim=0 as we sleep for minutes/hours and setting the time accurately is an important part of our app.

I've got other things to worry about at the moment and I'm waiting for various PRs to merge, so it's not my most urgent concern but it will become an issue!
[Mirrored to Jira]

@ARMmbed ARMmbed deleted a comment from ciarmcom Oct 2, 2018
@adbridge
Copy link
Contributor

adbridge commented Oct 4, 2018

Internal Jira reference: https://jira.arm.com/browse/IOTHAL-294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants