-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCS36510: Fix the sporadic semaphore timing issue #3779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @geky!
Tests the timer/semaphores at a lower level than the event queue, which removes a layer of concerns from issues in the rtos timing.
The NCS36510 is limited to 16bit timers. Construction of larger intervals is performed in software by counting the number of 16bit intervals that pass. Either this counting takes a bit of time, or there is a math error somewhere (maybe a long critical section?), because there is a roughly ~1us delay between when the interrupt occurs and the ticker progresses onto the next 16bit interval. This is normally a completely reasonable error, except that the error accumulates. After a while, the equeue tests find themselves with tens of milliseconds of error. To make matters worse, this error is random because of other interrupts occuring in the system, making the exact issue quite a bit difficult to track down. This fix drops the software counter in favor of just recalculating the next delay interval from the target time and value of the running timer. The running timer used to calculate the current tick is left to overflow in hardware and doesn't have this drift.
An odd bug (c0951c9) in the NCS36510 ticker code caused the timer/ticker classes to become desynchronized. Updated the handling to not assume the timers are perfectly in synch. This will increase the event's tolerance of less robust platforms.
43968e5
to
0949164
Compare
/morph test-nightly |
Result: FAILUREYour command has finished executing! Here's what you wrote!
OutputTest failed! |
Dammit, the test is working. Currently fails on NRF52:
|
4a07edf
to
6e920fd
Compare
/morph test |
Result: SUCCESSYour command has finished executing! Here's what you wrote!
OutputAll builds and test passed! |
A non-nightly won't cover the event tests, restarting. /morph test-nightly |
Ah thanks 👍 |
Result: SUCCESSYour command has finished executing! Here's what you wrote!
OutputAll builds and test passed! |
It was a legitimate 2.1 millisecond error. These tests are for the layer between the event queue's porting layer and hardware, so it indicates that either the NRF52's hardware timer or ticker was off by 2.1 milliseconds. I couldn't replicate this issue locally, the test is structured to be deterministic and repeatable, so my guess is that differences on the host side timing and USB communication caused some hardware events on the device side to line up and cause a 2 millisecond error. I'm not sure what tolerance we should be expecting, but given that this is one of the least powerful platforms, it seemed reasonable to increase the tolerance to 5 milliseconds. |
@@ -0,0 +1,122 @@ | |||
#include "mbed_events.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license missing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, we should definitely add licenses to our tests (more files are without as I noticed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, not sure how I ended up with a lot of licenseless test files.
Ports for Upcoming Targets Fixes and Changes 3590: [NUC472/M453] Export IAR project and other bugfixes ARMmbed/mbed-os#3590 3740: STM32L4 HAL update for RTC Wake Up Timer ARMmbed/mbed-os#3740 3739: STM32F7 : remove multiple HSE_VALUE define value ARMmbed/mbed-os#3739 3759: STM32: spi_frequency table index fix ARMmbed/mbed-os#3759 3779: NCS36510: Fix the sporadic semaphore timing issue ARMmbed/mbed-os#3779 3806: NXP KL43Z/KL27Z: fix spi format bits check ARMmbed/mbed-os#3806 3814: NCS36510: I2C idle delay of 1us ARMmbed/mbed-os#3814 3803: Bug fix of initial value of interrupt edge in "gpio_irq_init" again ARMmbed/mbed-os#3803
Whew, this was a long bug. Sorry for the delay.
Here's the history of the bug: #3404, it's been causing small problems on the CI, but takes a while to pop up, proved difficult to nail down.
This patch has three parts:
As for exactly what the timing bug was:
should resolve #3404
cc @maclobdell, @bridadan, @pradeep-gr, @radhika-raghavendran