Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

Closed
spoorthik opened this issue Nov 29, 2018 · 10 comments
Closed

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

spoorthik opened this issue Nov 29, 2018 · 10 comments
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug

Comments

@spoorthik
Copy link
Contributor

spoorthik commented Nov 29, 2018

Describe the bug
The test case test_slice_reset in tests/kernel/sched/schedule_api failed on CI run but couldn't reproduce locally.

To Reproduce
Steps to reproduce the behavior:

  1. mkdir build; cd build
  2. cmake -DBOARD=nrf52_pca10040
  3. make flash
  4. See error

Expected behavior
All test cases should pass without any assert failure.

Screenshots or console output

***** Booting Zephyr OS zephyr-v1.13.0-2220-ga0b9e70e10 (delayed boot 1000ms) *****
Running test suite threads_scheduling
===================================================================
starting test - test_priority_cooperative
PASS - test_priority_cooperative
===================================================================
starting test - test_priority_preemptible
PASS - test_priority_preemptible
===================================================================
starting test - test_yield_cooperative
PASS - test_yield_cooperative
===================================================================
starting test - test_sleep_cooperative
PASS - test_sleep_cooperative
===================================================================
starting test - test_sleep_wakeup_preemptible
PASS - test_sleep_wakeup_preemptible
===================================================================
starting test - test_pending_thread_wakeup
PASS - test_pending_thread_wakeup
===================================================================
starting test - test_time_slicing_preemptible
PASS - test_time_slicing_preemptible
===================================================================
starting test - test_time_slicing_disable_preemptible
PASS - test_time_slicing_disable_preemptible
===================================================================
starting test - test_lock_preemptible
PASS - test_lock_preemptible
===================================================================
starting test - test_unlock_preemptible
PASS - test_unlock_preemptible
===================================================================
starting test - test_sched_is_preempt_thread
PASS - test_sched_is_preempt_thread
===================================================================
starting test - test_slice_reset

    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:46: thread_tslice: t <= expected_slice_max is false


    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:45: thread_tslice: t >= expected_slice_min is false


    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:45: thread_tslice: t >= expected_slice_min is false

FAIL - test_slice_reset
===================================================================
starting test - test_slice_scheduling
PASS - test_slice_scheduling
===================================================================
starting test - test_priority_scheduling
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
PASS - test_priority_scheduling
===================================================================
starting test - test_wakeup_expired_timer_thread
PASS - test_wakeup_expired_timer_thread
===================================================================
Test suite threads_scheduling failed.
===================================================================
PROJECT EXECUTION FAILED

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK 0.9.5
  • Commit SHA or Version used: ga0b9e70e10 (on which the failure was seen, may not be the first hit)

Since the failure is not seen locally, I couldn't git bisect to find on what commit the failure was first seen.

@spoorthik
Copy link
Contributor Author

@inakypg @nashif @kumarvikash1: JFYI

@galak galak added priority: low Low impact/importance bug platform: nRF Nordic nRFx labels Dec 4, 2018
@galak galak added the bug The issue is a bug, or the PR is fixing a bug label Dec 4, 2018
@carlescufi
Copy link
Member

@pizi-nordic can you take a look?

@pizi-nordic
Copy link
Collaborator

It seems that this failure is caused by the same issue that triggered the #11722.

I was not able to reproduce it on unmodified code, however after altering the nrf_rtc_timer.c which enforced late execution of the z_clock_announce() (few ticks after expected time) the test failed in the same way as described in the bug report.

FYI: @carlescufi, @pdunaj, @andyross

@spoorthik
Copy link
Contributor Author

spoorthik commented Jan 14, 2019

On recent commit: a7afdc3 the test case: test_slice_scheduling is failing with assertion failure.
Please let me know if a separate bug has to be raised.

Execution log:

\***** Booting Zephyr OS zephyr-v1.13.0-3172-g5b9d5a217c (delayed boot 1000ms) *****
Running test suite threads_scheduling
===================================================================
starting test - test_priority_cooperative
PASS - test_priority_cooperative
===================================================================
starting test - test_priority_preemptible
PASS - test_priority_preemptible
===================================================================
starting test - test_yield_cooperative
PASS - test_yield_cooperative
===================================================================
starting test - test_sleep_cooperative
PASS - test_sleep_cooperative
===================================================================
starting test - test_sleep_wakeup_preemptible
PASS - test_sleep_wakeup_preemptible
===================================================================
starting test - test_pending_thread_wakeup
PASS - test_pending_thread_wakeup
===================================================================
starting test - test_time_slicing_preemptible
PASS - test_time_slicing_preemptible
===================================================================
starting test - test_time_slicing_disable_preemptible
PASS - test_time_slicing_disable_preemptible
===================================================================
starting test - test_lock_preemptible
PASS - test_lock_preemptible
===================================================================
starting test - test_unlock_preemptible
PASS - test_unlock_preemptible
===================================================================
starting test - test_sched_is_preempt_thread
PASS - test_sched_is_preempt_thread
===================================================================
starting test - test_slice_reset
PASS - test_slice_reset
===================================================================
starting test - test_slice_scheduling
ABCDEFGHI
A
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

B
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

C
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

D
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

E
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

F
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

G
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

H
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

I
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false



    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

FAIL - test_slice_scheduling
===================================================================
starting test - test_priority_scheduling
PASS - test_priority_scheduling
===================================================================
starting test - test_wakeup_expired_timer_thread
PASS - test_wakeup_expired_timer_thread
===================================================================
Test suite threads_scheduling failed.
===================================================================
PROJECT EXECUTION FAILED

@andyross
Copy link
Contributor

This was working as of the most recent timer work. This is still on nrf52840_pca100xx only? Any bisection info for when it was last working in the validation tree?

@spoorthik
Copy link
Contributor Author

@andyross: git bisect shows that the below commit is the culprit.
I have tested on nrf51_pca10028, nrf52_pca10040 and nrf52840_pca10056 boards and it fails on all of them.

0cc362f8737195a162a48c9b0ccaaf3ebec6e466 is the first bad commit
commit 0cc362f8737195a162a48c9b0ccaaf3ebec6e466
Author: Andy Ross <[email protected]>
Date:   Mon Dec 10 15:15:03 2018 -0800

    tests/kernel: Simplify timer spinning

@spoorthik spoorthik changed the title tests/kernel/sched/schedule_api fails on nrf52_pca10040 board tests/kernel/sched/schedule_api fails on nrf5* boards Jan 22, 2019
@nashif
Copy link
Member

nashif commented Jan 30, 2019

0cc362f confirmed to be the culprit

@aescolar
Copy link
Member

This test is failing very seldom in native_posix too. So I do not think this is a platform specific issue. Neither valgrind or address sanitizer detect the problem.

@galak galak added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels Feb 19, 2019
@wentongwu
Copy link
Contributor

It seems this issue caused by nrf5* timer delay which will trigger same priority thread time slice schedule. And can we check timer delay issue from platform respective? Actually I do see the difference of system tick between nrf5* board and other Arm boards. Normal Arm boards use Systick interrupt to calculate system tick. And nrf5* board use RTC to do it with IRQ num NRF5_IRQ_RTC1_IRQn = 17, maybe a little lower priority than other arm boards. And from the code I also see that nrf5* boards has some other interrupts(like NRF5_IRQ_POWER_CLOCK_IRQn and NRF5_IRQ_TEMP_IRQn) with higher priority than RTC's.

@andyross
Copy link
Contributor

@wentongwu: Actually it's simpler than that. The patch @spoorthik and @nashif bisected to was just an emulator workaround. The original code would spin waiting on a specified number of ticks to pass, which was tickling bugs on x86_64 so I replaced it with a simpler busy wait, which worked everywhere except on nRF.

The difference in behavior is tick alignment: the old code was always exiting the wait loop immediately after a tick had expired, where the new one is exactly N milliseconds[1], and likely to be anywhere between two tick boundaries. Later code is then checking time deltas in ticks and sees a +/- 1 tick aliasing issue where the aligned code does not. nRF tends to see these issues more easily because it's underlying clock counter is running at a very different rate than other platforms; it makes a great bug magnet for timer stuff.

In this case I think the right solution is to revert the patch (I just checked and indeed a simple reversion fixes the issue) and come up with a more targetted workaround for x86_64. We've actually swapped qemu versions in CI since this was added; it's possible the bug isn't visible any more...

[1] "Exact" to the precision of the underlying timer, anyway.

andyross pushed a commit to andyross/zephyr that referenced this issue Feb 28, 2019
Commit 0cc362f ("tests/kernel: Simplify timer spinning") was
added to work around a qemu bug with dropped interrupts on x86_64.
But it turns out that the tick alignment that the original
implementation provided (fundamentally, it spins waiting on the timer
driver to report tick changes) was needed for correct operation on
nRF52.

The effectively revert that commit (and refactors all the spinning
into a single utility) and replaces it with a workaround targeted to
qemu on x86_64 only.  Fixes zephyrproject-rtos#11721

Signed-off-by: Andy Ross <[email protected]>
galak pushed a commit that referenced this issue Feb 28, 2019
Commit 0cc362f ("tests/kernel: Simplify timer spinning") was
added to work around a qemu bug with dropped interrupts on x86_64.
But it turns out that the tick alignment that the original
implementation provided (fundamentally, it spins waiting on the timer
driver to report tick changes) was needed for correct operation on
nRF52.

The effectively revert that commit (and refactors all the spinning
into a single utility) and replaces it with a workaround targeted to
qemu on x86_64 only.  Fixes #11721

Signed-off-by: Andy Ross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

8 participants