-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetry between switched_in
and switched_out
events
#76057
Comments
Hi @alexpaschoaletto! We appreciate you submitting your first issue for our open-source project. 🌟 Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙 |
ContextThis comment only explains a bit of the actual feature I'm working on and how I found the bug in the first place. I'm developing a kernel feature which needs to know when a particular thread switches in/out of the CPU in order to properly work. My feature therefore directly uses the The feature should start a timer whenever the concerned thread enters the CPU and stop the timer if the thread leaves it before expiring. All the timer expiry function does is redefine its EDF priority by recalculating its absolute deadline. Therefore a preemption can happen depending on the other threads. The following showcases a normal execution of the feature - the only relevant logs for this analysis are:
The expected output when the thread finishes before the timer expiring is: ...
[00:00:19.110,000] <inf> CBS: cbs_1 SWT_TO 200000 //the number is the time left (in cycles)
[00:00:19.120,000] <inf> CBS: cbs_1 B_STAT 34173
[00:00:19.120,000] <inf> CBS: cbs_1 SWT_AY 34173
... And the expected output if the timer expires before the thread finishes is: ...
[00:00:05.040,000] <inf> CBS: cbs_1 SWT_TO 200000
[00:00:05.060,000] <inf> CBS: cbs_1 B_STAT 9257
[00:00:05.070,000] <inf> CBS: cbs_1 B_ROUT 200000 // logged by timer expiry function
[00:00:05.070,000] <inf> CBS: cbs_1 B_STAT 191420
[00:00:05.070,000] <inf> CBS: cbs_1 SWT_AY 191420
... However, the actual output frequently observed when the timer expires is: ...
[00:00:05.040,000] <inf> CBS: cbs_1 SWT_TO 200000 // switch to
[00:00:05.050,000] <inf> CBS: cbs_1 B_STAT 21493
[00:00:05.060,000] <inf> CBS: cbs_1 B_ROUT 200000
[00:00:05.060,000] <inf> CBS: cbs_1 B_STAT 191045
[00:00:05.060,000] <inf> CBS: cbs_1 SWT_AY 191045 // switch away
[00:00:05.070,000] <inf> CBS: cbs_1 SWT_AY 191045 // switch away
... I assumed it could be my implementation and decided to test a simple code as a sanity check. And that's when I ended up with the example code of #75406. |
Quick update, I think line 879 in static inline void set_current(struct k_thread *new_thread)
{
z_thread_mark_switched_out(); // here - only the switched_out is called
_current_cpu->current = new_thread;
} This is the only moment in xiao_esp32c3/esp32c3
[00:00:13.091,000] <inf> tracing: SWT_TO
[00:00:13.091,000] <inf> tracing: SWT_AY
[00:00:14.092,000] <inf> tracing: SWT_TO
[00:00:14.095,000] <inf> tracing: SWT_AY
xiao_esp32c3/esp32c3
xiao_esp32c3/esp32c3
[00:00:15.096,000] <inf> tracing: SWT_TO
[00:00:15.098,000] <inf> tracing: SWT_AY
[00:00:16.099,000] <inf> tracing: SWT_TO
[00:00:16.103,000] <inf> tracing: SWT_AY and this is the output for the kernel feature I'm developing: ...
[00:00:13.080,000] <inf> CBS: cbs_1 SWT_TO 200000 // switch to
[00:00:13.100,000] <inf> CBS: cbs_1 B_STAT 10069
[00:00:13.110,000] <inf> CBS: cbs_1 B_ROUT 200000
[00:00:13.110,000] <inf> CBS: cbs_1 B_STAT 170632
[00:00:13.110,000] <inf> CBS: cbs_1 SWT_AY 170632 // switch away
... As expected, results with targets that did not output |
FWIW:
Then you really don't want to be doing that on top of the tracing layer, which is (as you've discovered) somewhat fragile. Trace points move/break/get-evaded all the time, we aren't (and don't want to be!) rigorous about low level ordering details. The kernel promises the behavior specified in its API. Tracing is a debug feature, not an API. Can you elaborate on what you're trying to achieve, either here or in a separate issue (or Discord, etc...). I suspect there are better options. Basically if you really need to be whiteboxing the scheduler then you want to be writing code in the scheduler. That's not to say there isn't a bug with the tracing hooks on context switch, mind you. Just that it's the kind of bug I'd expect to see. |
Thanks for the tip! I'm still new on Zephyr kernel implementing and would love to hear more suggestions. The feature I'm working on is the Constant Bandwidth Server (CBS). It is basically a soft real-time task wrapper that has a time budget to execute the jobs assigned to it. It is also meant to work alongside EDF, and its priority (i.e. deadline) is recalculated every time the budget expires or an internal condition is met. This budget can only be consumed when the jobs are being executed (i.e. in the CPU) which is why I need to know when the CBS thread enters and leaves the processor to trigger/stop the budget timer accordingly.
I agree! And that's why I'm no longer using that. However, analyzing the kernel code so far the best places to put my |
Please suggest something better then Personally I don't see kprobe/ftrace like tracing being answers that make any sense here.
What we have today works, and works well with the tools people are familiar with in this space (systemview, tracealyzer) so its unclear to me what the problem is exactly other than it requires manually ensuring the hooks are called at the right time... which really doesn't appear to be much of a burden at all. |
so on mips, we had one call to the switched_out hook missing, something like this, though still not sure if this the right spot makes it work:
For xtensa, there is something weird going on, the switched_in call is called but without any output, investigating still. |
Thank you for your help! If I can be useful for any further testing please let me know. |
What about syscalls? We already have a mechanism for detecting/emitting/indirecting around defined functions with lightweight/inlined stubs based on build time configuration (it's even smart enough to understand how to inspect the arguments and return values!). Seems like all the same tricks would apply to emit tracing entry/exit hooks as you'd need to emit a un/marshalling, they could share probably 80% of their code. Imagine by doing tracing by putting a " I mean, it doesn't have to be super gold plated. My complaint here is just things like this bug. Tracing in Zephyr is, kinda, "always broken". We get it working once and then look away and some yahoo has come in and moved stuff around and the trace points aren't there any more, or there's a new code path that skips them, or in cases like this one the entry/exits don't match (to be fair context switching is highly nontrivial and arch-specific, so this specific bug is not amenable to obvious automation). |
Hey there, I'm back after a few weeks off. I don't really think so, but any progress here? as for the specific case of the /* Just a wrapper around _current = xxx with tracing */
static inline void set_current(struct k_thread *new_thread)
{
/*
FIXME this code appears to be inducing false switched_out events.
When the switched_out function is commented, targets seem to work as expected.
https://github.com/zephyrproject-rtos/zephyr/issues/76057#issuecomment-2236545540
z_thread_mark_switched_out() below is proposed to
be left commented until a proper solution arises.
*/
/* z_thread_mark_switched_out(); */
_current_cpu->current = new_thread;
} I've inspected a great part of the codebase already but haven't yet gained enough system-wide understanding of what's really going on, and I'm not sure I will anytime soon. I'm aware this proposal might seem a bad idea, though, so feel free to reject it completely if you will. |
No complete yet, will come back to this ASAP |
add missing switched_out trace point. Partially fixes zephyrproject-rtos#76057 Signed-off-by: Anas Nashif <[email protected]>
I'm sorry @mmahadevan108 but isn't this issue supposed to remain open as long as other architectures still need fixes? @nashif only solved for MIPS as far as I understood. |
add missing switched_out trace point. Partially fixes zephyrproject-rtos#76057 Signed-off-by: Anas Nashif <[email protected]>
add missing switched_out trace point. Partially fixes zephyrproject-rtos#76057 Signed-off-by: Anas Nashif <[email protected]>
Describe the bug
The
z_thread_mark_switched_in
andz_thread_mark_switched_out
functions seem to be getting called assimetrically across different targets, with (sometimes way) moreout
events being called thanin
events. I have first mentioned this problem in discussion #75406, but after a bit of digging and a more comprehensive testing round I am now almost sure it's an actual bug and not a problem of my implementation (although that can always be the case).The example below uses the user-defined tracing API, which internally is directly called by the aforementioned functions. I am also developing a kernel feature which is called directly by
z_thread_mark_switched_in
andout
, and the problem continues, which leads me to think they are the souce of the issue and not the user-defined tracing API.To Reproduce
Set the following configs in
prj.conf
as well as whichever needed for your device to log outputs.Run the following example code written in
main.c
.Expected behavior
The code should showcase an equal amount of
SWT_TO
andSWT_AY
events.The number itself is not relevant, but they shoud match. Like this:
The exact shape of the bug relies on the target, please refer to #75406 for more examples.
But the worst-case scenario for me was certainly with the
xiao_esp32c3
board:Impact
Although it doesn't prevent my feature to run, it certainly cuts off the ability to adapt to preemptions. Since I can't know reliably in all targets when a thread enters/leaves the CPU. Therefore the functionality can only be assured when no higher priority thread shows up amidst execution.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: