tests/kernel/sched/schedule_api fails on nrf5* boards #11721

spoorthik · 2018-11-29T06:14:41Z

Describe the bug
The test case test_slice_reset in tests/kernel/sched/schedule_api failed on CI run but couldn't reproduce locally.

To Reproduce
Steps to reproduce the behavior:

mkdir build; cd build
cmake -DBOARD=nrf52_pca10040
make flash
See error

Expected behavior
All test cases should pass without any assert failure.

Screenshots or console output

***** Booting Zephyr OS zephyr-v1.13.0-2220-ga0b9e70e10 (delayed boot 1000ms) *****
Running test suite threads_scheduling
===================================================================
starting test - test_priority_cooperative
PASS - test_priority_cooperative
===================================================================
starting test - test_priority_preemptible
PASS - test_priority_preemptible
===================================================================
starting test - test_yield_cooperative
PASS - test_yield_cooperative
===================================================================
starting test - test_sleep_cooperative
PASS - test_sleep_cooperative
===================================================================
starting test - test_sleep_wakeup_preemptible
PASS - test_sleep_wakeup_preemptible
===================================================================
starting test - test_pending_thread_wakeup
PASS - test_pending_thread_wakeup
===================================================================
starting test - test_time_slicing_preemptible
PASS - test_time_slicing_preemptible
===================================================================
starting test - test_time_slicing_disable_preemptible
PASS - test_time_slicing_disable_preemptible
===================================================================
starting test - test_lock_preemptible
PASS - test_lock_preemptible
===================================================================
starting test - test_unlock_preemptible
PASS - test_unlock_preemptible
===================================================================
starting test - test_sched_is_preempt_thread
PASS - test_sched_is_preempt_thread
===================================================================
starting test - test_slice_reset

    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:46: thread_tslice: t <= expected_slice_max is false


    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:45: thread_tslice: t >= expected_slice_min is false


    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_sched_timeslice_reset.c:45: thread_tslice: t >= expected_slice_min is false

FAIL - test_slice_reset
===================================================================
starting test - test_slice_scheduling
PASS - test_slice_scheduling
===================================================================
starting test - test_priority_scheduling
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
ABCDEFGHI
PASS - test_priority_scheduling
===================================================================
starting test - test_wakeup_expired_timer_thread
PASS - test_wakeup_expired_timer_thread
===================================================================
Test suite threads_scheduling failed.
===================================================================
PROJECT EXECUTION FAILED

Environment (please complete the following information):

OS: Linux
Toolchain: Zephyr SDK 0.9.5
Commit SHA or Version used: ga0b9e70e10 (on which the failure was seen, may not be the first hit)

Since the failure is not seen locally, I couldn't git bisect to find on what commit the failure was first seen.

The text was updated successfully, but these errors were encountered:

spoorthik · 2018-11-29T06:15:40Z

@inakypg @nashif @kumarvikash1: JFYI

carlescufi · 2018-12-05T13:40:58Z

@pizi-nordic can you take a look?

pizi-nordic · 2018-12-07T16:04:21Z

It seems that this failure is caused by the same issue that triggered the #11722.

I was not able to reproduce it on unmodified code, however after altering the nrf_rtc_timer.c which enforced late execution of the z_clock_announce() (few ticks after expected time) the test failed in the same way as described in the bug report.

FYI: @carlescufi, @pdunaj, @andyross

spoorthik · 2019-01-14T09:29:46Z

On recent commit: a7afdc3 the test case: test_slice_scheduling is failing with assertion failure.
Please let me know if a separate bug has to be raised.

Execution log:

\***** Booting Zephyr OS zephyr-v1.13.0-3172-g5b9d5a217c (delayed boot 1000ms) *****
Running test suite threads_scheduling
===================================================================
starting test - test_priority_cooperative
PASS - test_priority_cooperative
===================================================================
starting test - test_priority_preemptible
PASS - test_priority_preemptible
===================================================================
starting test - test_yield_cooperative
PASS - test_yield_cooperative
===================================================================
starting test - test_sleep_cooperative
PASS - test_sleep_cooperative
===================================================================
starting test - test_sleep_wakeup_preemptible
PASS - test_sleep_wakeup_preemptible
===================================================================
starting test - test_pending_thread_wakeup
PASS - test_pending_thread_wakeup
===================================================================
starting test - test_time_slicing_preemptible
PASS - test_time_slicing_preemptible
===================================================================
starting test - test_time_slicing_disable_preemptible
PASS - test_time_slicing_disable_preemptible
===================================================================
starting test - test_lock_preemptible
PASS - test_lock_preemptible
===================================================================
starting test - test_unlock_preemptible
PASS - test_unlock_preemptible
===================================================================
starting test - test_sched_is_preempt_thread
PASS - test_sched_is_preempt_thread
===================================================================
starting test - test_slice_reset
PASS - test_slice_reset
===================================================================
starting test - test_slice_scheduling
ABCDEFGHI
A
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

B
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

C
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

D
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

E
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

F
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

G
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

H
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

I
    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false



    Assertion failed at /home/jenkins/workspace/zephyr-master-ci-master/zephyr.git/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:51: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && ((int)p1 == thread_idx)) is false

FAIL - test_slice_scheduling
===================================================================
starting test - test_priority_scheduling
PASS - test_priority_scheduling
===================================================================
starting test - test_wakeup_expired_timer_thread
PASS - test_wakeup_expired_timer_thread
===================================================================
Test suite threads_scheduling failed.
===================================================================
PROJECT EXECUTION FAILED

andyross · 2019-01-14T17:59:16Z

This was working as of the most recent timer work. This is still on nrf52840_pca100xx only? Any bisection info for when it was last working in the validation tree?

spoorthik · 2019-01-22T08:38:42Z

@andyross: git bisect shows that the below commit is the culprit.
I have tested on nrf51_pca10028, nrf52_pca10040 and nrf52840_pca10056 boards and it fails on all of them.

0cc362f8737195a162a48c9b0ccaaf3ebec6e466 is the first bad commit
commit 0cc362f8737195a162a48c9b0ccaaf3ebec6e466
Author: Andy Ross <[email protected]>
Date:   Mon Dec 10 15:15:03 2018 -0800

    tests/kernel: Simplify timer spinning

nashif · 2019-01-30T18:57:25Z

0cc362f confirmed to be the culprit

aescolar · 2019-02-15T18:13:11Z

This test is failing very seldom in native_posix too. So I do not think this is a platform specific issue. Neither valgrind or address sanitizer detect the problem.

wentongwu · 2019-02-26T12:00:09Z

It seems this issue caused by nrf5* timer delay which will trigger same priority thread time slice schedule. And can we check timer delay issue from platform respective? Actually I do see the difference of system tick between nrf5* board and other Arm boards. Normal Arm boards use Systick interrupt to calculate system tick. And nrf5* board use RTC to do it with IRQ num NRF5_IRQ_RTC1_IRQn = 17, maybe a little lower priority than other arm boards. And from the code I also see that nrf5* boards has some other interrupts(like NRF5_IRQ_POWER_CLOCK_IRQn and NRF5_IRQ_TEMP_IRQn) with higher priority than RTC's.

andyross · 2019-02-26T17:32:36Z

@wentongwu: Actually it's simpler than that. The patch @spoorthik and @nashif bisected to was just an emulator workaround. The original code would spin waiting on a specified number of ticks to pass, which was tickling bugs on x86_64 so I replaced it with a simpler busy wait, which worked everywhere except on nRF.

The difference in behavior is tick alignment: the old code was always exiting the wait loop immediately after a tick had expired, where the new one is exactly N milliseconds[1], and likely to be anywhere between two tick boundaries. Later code is then checking time deltas in ticks and sees a +/- 1 tick aliasing issue where the aligned code does not. nRF tends to see these issues more easily because it's underlying clock counter is running at a very different rate than other platforms; it makes a great bug magnet for timer stuff.

In this case I think the right solution is to revert the patch (I just checked and indeed a simple reversion fixes the issue) and come up with a more targetted workaround for x86_64. We've actually swapped qemu versions in CI since this was added; it's possible the bug isn't visible any more...

[1] "Exact" to the precision of the underlying timer, anyway.

Commit 0cc362f ("tests/kernel: Simplify timer spinning") was added to work around a qemu bug with dropped interrupts on x86_64. But it turns out that the tick alignment that the original implementation provided (fundamentally, it spins waiting on the timer driver to report tick changes) was needed for correct operation on nRF52. The effectively revert that commit (and refactors all the spinning into a single utility) and replaces it with a workaround targeted to qemu on x86_64 only. Fixes zephyrproject-rtos#11721 Signed-off-by: Andy Ross <[email protected]>

Commit 0cc362f ("tests/kernel: Simplify timer spinning") was added to work around a qemu bug with dropped interrupts on x86_64. But it turns out that the tick alignment that the original implementation provided (fundamentally, it spins waiting on the timer driver to report tick changes) was needed for correct operation on nRF52. The effectively revert that commit (and refactors all the spinning into a single utility) and replaces it with a workaround targeted to qemu on x86_64 only. Fixes #11721 Signed-off-by: Andy Ross <[email protected]>

spoorthik mentioned this issue Nov 29, 2018

tests/cmsis_rtos_v1 fails on nrf52840_pca10056 #11723

Closed

galak added priority: low Low impact/importance bug platform: nRF Nordic nRFx labels Dec 4, 2018

galak assigned carlescufi Dec 4, 2018

galak added the bug The issue is a bug, or the PR is fixing a bug label Dec 4, 2018

carlescufi assigned pizi-nordic Dec 5, 2018

spoorthik changed the title ~~tests/kernel/sched/schedule_api fails on nrf52_pca10040 board~~ tests/kernel/sched/schedule_api fails on nrf5* boards Jan 22, 2019

nashif mentioned this issue Feb 15, 2019

List of tests that keep failing sporadically #12553

Closed

18 tasks

galak added priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels Feb 19, 2019

galak assigned andyross Feb 19, 2019

andyross mentioned this issue Feb 27, 2019

tests/kernel/sched/schedule_api: Restore spinning for timer alignment #13916

Merged

galak closed this as completed in #13916 Feb 28, 2019

andyross mentioned this issue Mar 6, 2019

tests/kernel/sched/schedule_api: Assertion failed for test_slice_scheduling #13821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

spoorthik commented Nov 29, 2018 •

edited

Loading

spoorthik commented Nov 29, 2018

carlescufi commented Dec 5, 2018

pizi-nordic commented Dec 7, 2018

spoorthik commented Jan 14, 2019 •

edited

Loading

andyross commented Jan 14, 2019

spoorthik commented Jan 22, 2019

nashif commented Jan 30, 2019

aescolar commented Feb 15, 2019

wentongwu commented Feb 26, 2019

andyross commented Feb 26, 2019

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

tests/kernel/sched/schedule_api fails on nrf5* boards #11721

Comments

spoorthik commented Nov 29, 2018 • edited Loading

spoorthik commented Nov 29, 2018

carlescufi commented Dec 5, 2018

pizi-nordic commented Dec 7, 2018

spoorthik commented Jan 14, 2019 • edited Loading

andyross commented Jan 14, 2019

spoorthik commented Jan 22, 2019

nashif commented Jan 30, 2019

aescolar commented Feb 15, 2019

wentongwu commented Feb 26, 2019

andyross commented Feb 26, 2019

spoorthik commented Nov 29, 2018 •

edited

Loading

spoorthik commented Jan 14, 2019 •

edited

Loading