Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for tickless and LPTICKER_DELAY_TICKS #7524

Merged
merged 7 commits into from
Aug 21, 2018
Merged

Conversation

c1728p9
Copy link
Contributor

@c1728p9 c1728p9 commented Jul 16, 2018

Description

This PR fixes problems that occur when both tickless and LPTICKER_DELAY_TICKS are turned on. The two primary problems are:

  1. Tests which use the microsecond ticker directly interfere with the lp ticker wrapper
  2. The Timeout object used by the lp ticker wrapper locks deep sleep at various times causing tests which assert that deep sleep is allowed to fail

The tests susceptible to problem 1 are:

  • mbed_hal/common_tickers
  • mbed_hal/sleep

The tests susceptible to problem 2 are:

  • mbed_drivers/sleep_lock
  • mbed_drivers/timeout
  • mbed_hal/lp_ticker
  • mbed_hal/rtc
  • mbed_hal/sleep
  • mbed_hal/sleep_manager
  • mbed_hal/sleep_manager_racecondition

This PR fixes the problems mentioned above. Unfortunately, there are still additional problems which need to be addressed. Further fixes require:

  • Common ticker handling should be suspended before calling us/lp ticker directly. This is important for the below test, but is not currently causing failures.
    • mbed_hal/common_tickers
    • mbed_hal/common_tickers_freq
    • mbed_hal/lp_ticker
    • mbed_hal/sleep
  • The Serial driver must unlock deep sleep only after all bytes have been sent to prevent bytes from being dropped by entering deep sleep. This requires a HAL extension. I am seeing test failures due to this on the NUCLEO_F401RE when tickless is enabled.

Pull request type

[x] Fix
[ ] Refactor
[ ] New target
[ ] Feature
[ ] Breaking change

@0xc0170 0xc0170 requested a review from a team July 17, 2018 08:33
@0xc0170 0xc0170 requested review from mprse and a team and removed request for a team July 17, 2018 08:35
@jeromecoutant
Copy link
Collaborator

Hi Russ

I have started a full non regression for each STM32 family.
I will provide this status later.

In parallel, I quickly checked TICKLESS mode with F401. I agree status is much better.

For the moment, I only added some DeepSleepLock in tests-mbed_drivers-lp_ticker.

Copy link
Contributor

@mprse mprse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential inconsistency between result returned by sleep_manager_can_deep_sleep() and action performed by sleep_manager_sleep_auto().

@@ -196,7 +203,11 @@ void sleep_manager_sleep_auto(void)
#ifdef MBED_DEBUG
hal_sleep();
#else
if (sleep_manager_can_deep_sleep()) {
if (sleep_manager_can_deep_sleep()
#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lp ticker's Timeout object is in use (i.e. lp_ticker_get_timeout_pending() returns true) and deep-sleep is not locked by other features (i.e. lock_count==1), then:

  • sleep_manager_can_deep_sleep() returns true - indicates that we can put board into deep-sleep mode.
  • sleep_manager_sleep_auto() puts the board into the sleep mode (not deep-sleep).

It looks like a design issue since we can not relay on sleep_manager_can_deep_sleep() function.
From the user perspective it can be very misleading. From the testing perspective sleep_manager_can_deep_sleep() function is used in tests to check if current environment state allows to go into deep-sleep mode which is to be tested. So in some cases the test may test sleep mode instead of required deep-sleep mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a design issue since we can not relay on sleep_manager_can_deep_sleep() function.

where this comes from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mprse, sleep_manager_can_deep_sleep() shouldn't be updated this way. I do understand the purpose, but think this change should be test specific, not general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look at these changes c85856b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some possible alternatives I can think of are:

  • Modify tests to suspend the OS before running the tests - this may be intrusive as you can no longer use most OS primitives
  • Assert in tests using a different function - something like sleep_manager_can_deep_sleep_test()
  • Assert something different in the tests, such as time actually spent in deep sleep

Do you have any thoughts or preferences on this @mprse and @fkjagodzinski?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion first bullet is ok, but only for low level tests (HAL) where target specific drivers are tested, where the OS can be omitted. For driver layer tests I think that OS should be working normally and tests should take into consideration possible influence of the OS like systick handling, task scheduling, ticker interrupts, etc. So I suggest to use second bullet for upper layer tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since one of the problems is:

The Timeout object used by the lp ticker wrapper locks deep sleep at various times causing tests which assert that deep sleep is allowed to fail

we could modify the tests to wait for the deep sleep to be unlocked before calling sleep(). But even this would require using critical section, so would also introduce more complexity to tests.

Adding sleep_manager_can_deep_sleep_test() seems straight forward, but such change would mean we accept that tests are occasionally run in an environment different from what they were supposed to. Simply put, the test result can be meaningless sometimes.

In my opinion the most honest solution would be adding

#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)
#error [NOT_SUPPORTED] Test not supported for this target
#endif

or executing affected test cases conditionally based on LPTICKER_DELAY_TICKS > 0.

return deep_sleep_lock == 0 ? true : false;
uint32_t lock_count = deep_sleep_lock;
#if DEVICE_LPTICKER && (LPTICKER_DELAY_TICKS > 0)
if (lp_ticker_get_timeout_pending() && (lock_count > 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(lock_count > 0) - is this condition needed?
In case when lp ticker's Timeout object is in use deep-sleep is locked (i.e. lock_count > 0).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is to prevent underflow. If this function is wrapped in a critical section then the check wouldn't be needed, as lock_count would always be at least 1 if lp_ticker_get_timeout_pending() returned true.

@jeromecoutant
Copy link
Collaborator

More detailed status with NUCLEO_F401RE:

I applied few patches:
c4f0eaf

And I still got issues with:

  • tests-mbed_drivers-lp_timeout
    [1531832056.30][CONN][RXD] >>> Running case Add debug-info option #10: 'Zero delay (attach_us)'...
    [1531832056.37][CONN][RXD] :213::FAIL: Expected 1 Was 0

  • tests-mbedmicro-rtos-mbed-systimer

[1531833600.45][CONN][RXD] >>> Running case #5: 'Schedule zero ticks'...
[1531833600.53][CONN][RXD] :229::FAIL: Expected 1 Was 0

@mprse
Copy link
Contributor

mprse commented Jul 18, 2018

Regarding lp ticker wrapper I have doubts if this is a correct approach. According to the description it is designed for cases when:

  • Interrupt may not fire if set earlier than LPTICKER_DELAY_TICKS low power clock cycles.
  • Setting the interrupt back-to-back will block.

So the first bullet is the case when we have some hardware limitations like on NRF51_DK board:

If the COUNTER is N, writing N or N+1 to a CC register may not trigger a COMPARE event.

In this case the nearest interrupt which can be set is current tick + 2. This can be handled using lp ticker wrapper and LPTICKER_DELAY_TICKS = 2, but currently is handled by ensuring that value written to CC is greater than or equal to current tick + 2 in the Nordic NRF5x lp ticker driver:

const uint32_t now = nrf_rtc_counter_get(COMMON_RTC_INSTANCE);
if (now == ticks_count ||
RTC_WRAP(now + 1) == ticks_count) {
ticks_count = RTC_WRAP(ticks_count + 2);
}
nrf_rtc_cc_set(COMMON_RTC_INSTANCE, cc_channel, ticks_count);

Personally I prefer current option since usage of lp ticker wrapper which utilizes Timeout object in the HAL layer as we can see has too many side effects. Additionally since this is hardware specific limitation it should be handled in the target layer.

Regarding second bullet I'm not sure if I correctly understand the problem. Can you provide some use case for this?

@c1728p9
Copy link
Contributor Author

c1728p9 commented Jul 18, 2018

Hi @mprse, the primary problem is the second point - some low power ticker hardware cannot be written to twice within a certain number of low power clock cycles (LPTICKER_DELAY_TICKS ticks).

  1. The OS scheduler enters tickless mode and sets the low power ticker to wake 500ms in the future
  2. An interrupt fires right after and wakes the OS early
  3. The OS uses the low power ticker to schedule the next tick (1ms in the future)

If the next tick time is written to the low power ticker immediately the new match time will be dropped, as the hardware is still trying to set the first match value. Options to work around this are:

  1. Block in lp_ticker_set_interrupt until the new value is written
  2. Don't allow low power tickers which require a delay
  3. Use a Timeout object to write the second match value after LPTICKER_DELAY_TICKS ticks has passed

Option 1. is what was done prior to the ticker HAL specification. The problem with this is interrupt latency is impacted. An example in practice is the NUCLEO_F401RE. It has a low power ticker clock speed of 8KHz and requires 3 clock cycles between writes. This means that back-to-back writes will block for 366us in a critical section.

Option 2. was considered but it was found that a large number of targets have this low power ticker hardware behavior. Vendors with effected targets include ST, Nuvoton and OnSemi to name a few. Because this would rule out tickless for many targets this option was less than ideal as well. Note - the large number of vendors/targets impacted was also one of the main reason this was done in a common layer.

Option 3. both preserves interrupt latency and allows targets with this low power ticker behavior still implement the low power ticker API. Compared to Option 1, power savings is better since the CPU can go into sleep mode rather than busy waiting. Unfortunately it does add complexity and interferes with deep sleep tests.

@mprse do you know of any alternatives which could simplify this? These were the only options that I could think of, but if there is a better way to do it would be great.

@mprse
Copy link
Contributor

mprse commented Jul 19, 2018

@c1728p9 Thanks Russ for detailed explanation of the problem. It looks like option number 3 despite its complexity is the best solution for now.

We have disused with @fkjagodzinski possible solutions for the deep-sleep/sleep issue. Please consider the following option:

  • sleep_manager_can_deep_sleep(): returns true in case when only Timeout used by the lp ticker wrapper stops us from going into deep-sleep mode (as it is done now).
  • sleep_manager_sleep_auto(): when we can go into deep-sleep (sleep_manager_can_deep_sleep() returns true), but Timeout object of the lp ticker wrapper is in use (lp_ticker_get_timeout_pending() returns true), then we are waiting in busy loop outside the critical section until timeout is not pending (lp_ticker_get_timeout_pending() returns false) and then go into deep-sleep.

Since sleep_manager_sleep_auto() function shell automatically decide which mode should be selected its a good place to handle this specific case. Additionally it would solve the problem with the sleep tests.
I'm not sure about possible side effects of this solutions, but still if we were just about to enter into deep-sleep mode we can waste ~366 us (there are currently no other things to do).
We will lost time which we could spend in sleep mode, but for the rest of planed time we will be in deep-sleep which saves more power.

@c1728p9
Copy link
Contributor Author

c1728p9 commented Jul 26, 2018

@mprse @fkjagodzinski I updated this PR to address the feedback you gave me. In specific I added and updated the test to use the function sleep_manager_can_deep_sleep_test_check() which checks to see if deep sleep is allowed within 2ms. This way if deep sleep is blocked temporarily for any reason - low power ticker wrapper, buffered uart bytes, or some other unforeseen reason then tests will not start failing again. Let me know what you think.

Copy link
Contributor

@mprse mprse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c1728p9 Looks good to me.

What bothers me is that test for sleep_manager_can_deep_sleep_test_check() is missing (even though this function is intended for use in testing). I think it should be added or task for this should be created.
I suggest also to add information in the description of sleep_manager_can_deep_sleep_test_check() that timeout is set to 2 ms.

@fkjagodzinski
Copy link
Member

@mprse @c1728p9 I've got an open PR with sleep manager tests update (#7582), I can add a few test cases for that new function.

@cmonr
Copy link
Contributor

cmonr commented Jul 31, 2018

/morph build

/**
* Wrapper around lp_ticker_set_interrupt to prevent blocking
*
* Problems this function is solving:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mbed-ci
Copy link

mbed-ci commented Jul 31, 2018

Build : SUCCESS

Build number : 2710
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7524/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Jul 31, 2018

@mbed-ci
Copy link

mbed-ci commented Aug 1, 2018

When the define LPTICKER_DELAY_TICKS is set deep sleep can be randomly
disallowed when using the low power ticker. This is because a Timer
object, which locks deep sleep, is used to protect from back-to-back
writes to lp tickers which can't support that. This causes tests which
assert that deep sleep is allowed to intermittently fail.

To fix this intermittent failure this patch adds the function
sleep_manager_can_deep_sleep_test_check() which checks if deep sleep
is allowed over a duration. It updates all the tests to use
sleep_manager_can_deep_sleep_test_check() rather
than sleep_manager_can_deep_sleep() so the tests work even if deep
sleep is spuriously blocked.
Update the low power ticker wrapper code so it does not violate any
properties of the ticker specification. In specific this patch fixes
the following:
- Prevent spurious interrupts
- Fire interrupt only when the ticker times increments to or past the
    value set by ticker_set_interrupt
- Disable interrupts when ticker_init is called
@cmonr
Copy link
Contributor

cmonr commented Aug 17, 2018

/morph build

@mbed-ci
Copy link

mbed-ci commented Aug 17, 2018

Build : FAILURE

Build number : 2820
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7524/

Fix the HAL common_tickers and sleep tests so they work correctly when
the define LPTICKER_DELAY_TICKS is set.
To handle timer rollovers the test tests-mbed_hal-common_tickers_freq
calls intf->set_interrupt(0). For this to work correctly the ticker
implementation must fire an interrupt on every rollover event though
intf->set_interrupt(0) was called only once. Whether an interrupt will
fire only once or multiple times is undefined behavior which
cannot be relied upon.

To avoid this undefined behavior this patch continually schedules an
interrupt and performs overflow detection on every read. This also
removes the possibility of race conditions due to overflowCounter
incrementing at the wrong time.
Only reschedule the Timeout object in the low power ticker wrapper
if it is not already pending.
@cmonr
Copy link
Contributor

cmonr commented Aug 17, 2018

/morph build

@mbed-ci
Copy link

mbed-ci commented Aug 18, 2018

Build : SUCCESS

Build number : 2835
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/7524/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

@mbed-ci
Copy link

mbed-ci commented Aug 18, 2018

@mbed-ci
Copy link

mbed-ci commented Aug 19, 2018

@cmonr
Copy link
Contributor

cmonr commented Aug 20, 2018

/morph test

@NirSonnenschein
Copy link
Contributor

/morph uvisor-test

@mbed-ci
Copy link

mbed-ci commented Aug 21, 2018

@cmonr cmonr merged commit e02466a into ARMmbed:master Aug 21, 2018
pan- pushed a commit to pan-/mbed that referenced this pull request Aug 22, 2018
Fixes for tickless and LPTICKER_DELAY_TICKS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants