-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stack_sentinel: rare ASSERTION FAIL [!(z_arch_curr_cpu()->nested != 0U)] @ ZEPHYR_BASE/kernel/thread.c:429 Threads may not be created in ISRs #16915
Comments
Another strange and also rare "Fatal fault in ISR!", does it seem related?
|
please file a different GH issue for the other crash, the test cases and the errors produced are unrelated |
Another nearly identical instance:
|
Quoting bug "Configure QEMU to run independent of the host clock zephyrproject-rtos#14173" We have been struggling for years with issues related to how QEMU attempts to synchronize guest timer interrupts with the host clock, for example zephyrproject-rtos#12553. The symptom is that heavily loaded sanitycheck runs have tests spuriously failing due to timing related issues. This creates noise in our CI runs which masks true bugs in our system which manifest only intermittently, causing real issues that will happen all the time at scale to be 'swept under the rug'; right now any time a test fails sanitycheck retries it a few times and only consecutive failures produce an error. There's also a lot of relevant information and more links in: "List of tests that keep failing sporadically" zephyrproject-rtos#12553 This new "emu_time" tag helps by letting users either select or exclude the tests that really need accurate time to pass and have a high chance to actually be impacted by this emulation issue. As an example, it's only with 'sanitycheck --exclude emu_time' that I could spot and file intermittent but non-emu_time issue zephyrproject-rtos#16915. As Andrew predicted above, it was drown in emu_time noise before that. Conversely, "--tag emu_time" can be used by developers focusing on fixing qemu's -icount feature, for instance zephyrproject-rtos#14173 or others. Even before qemu's -icount is fixed, Continuous Integration could be split in two separate runs: A. --tag emu_time and B. --exclude emu_time. Only A tests would be allowed retries which would stop hiding other, unrelated intermittent issues affecting B tests. This initial commit does not pretend to exhaustively tag all affected tests. However it's an already functional and useful start of 14 tests collected from and tested over weeks of local sanitycheck runs and _partially_ reviewed by qemu clock expert Andy Ross. This commit also increases the individual timeout of 7 tests that have been observed to consistently take much longer than the median (2-3s). This flags how unusually long these are, lets users temporarily reduce the very long 60s default timeout in their local workspace and finally should reduce the chance of them timing out on a heavily loaded system. Set their timeout to 3-4 times the duration observed in CI and locally. Signed-off-by: Marc Herbert <[email protected]>
This seems to be the culprit. The check is here:
I think there is some issue on how z_is_in_isr() is implemented in xtensa. |
I'm currently unable to reproduce this issue with the current master branch, even on a heavily loaded system. |
This is the code that increments/decrements nested. |
Describe the bug
Intermittent failure of sanitycheck test:
Spotted only once. Generally doesn't fail. The previous run was in the exact same conditions and didn't fail.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
PASS
Impact
Low.
Environment (please complete the following information):
Additional context
JOBS=50
Screenshots or console output
The text was updated successfully, but these errors were encountered: