Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel: Reorganize halt_thread() #81677

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

peter-mitsis
Copy link
Collaborator

Minor reorganization to halt_thread() to streamline the branching and comparisons.

Originally part of #81311, this commit has been split out so it could receive a ton of testing in isolation.

andyross
andyross previously approved these changes Nov 20, 2024
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, did a line-by-line review and I'm convinced this indeed has zero logic or synchronization changes. No reason not to +1.

I'm a little dubious about the claimed performance benefit though. This is going to help the case of k_thread_suspend() (which I suspect is the benchmark being chased?) by potentially fixing the direction of the branches taken to be correctly predicted, and by skipping a needless second test of new_state (because there is only one possibility at that point). And that's not nothing, but really I have a hard time believing it's more than 2-5 cycles or so on shallow pipeline CPUs? Maybe we can retest this patch in isolation?

Also there's going to be some minor code size increase to emit the duplicated calls to the unpend_all/update_cache that used to be shared between the two state paths.

}
(void)z_abort_thread_timeout(thread);
unpend_all(&thread->join_queue);
if (likely(new_state == _THREAD_SUSPENDED)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like "suspend" is a rarer and more edgy case than "abort". Though plausibly you could argue that it's more likely to be part of a performance path and thus should be the fast case for code generation.

kernel/sched.c Outdated
if (thread == _current && arch_is_in_isr()) {
dummify = true;
}
/* New thread state is _THREAD_DEAD */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't hurt to add an assert here. Before, we'd test both and noop if someone passed a nonsense state, now it'll kill the thread. The former was arguably an OK way to respond to a coding mistake, the new behavior seems dangerous.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nashif
nashif previously approved these changes Nov 20, 2024
@peter-mitsis
Copy link
Collaborator Author

For what it is worth, below are a few numbers from running the preemptive thread_metric benchmark on the disco_l475_iot1 board using CONFIG_SCHED_MULTIQ=y (higher numbers is better)

main branch: 5731301
main branch + this commit: 5553629
main branch + An empty clear_halting(): 5339260 (this patch can be found in #81311)
main branch + this commit + an empty clear_halting(): 5877246

It definitely seems weird that the performance for the individual patches drops, but it turns out that together, they work very well.

@andyross
Copy link
Contributor

Hold up: so performance drops by 3% with this patch (weird), drops even more if you then make clear_halting() optimize away (even weirder), but magically is 2.5% faster when you combine both tricks?

So, first, are you sure the numbers are reliable and not just noisy? Maybe try a few times or whatever to check a variance?

And if they are good measurements, we're probably looking at a compiler artifact instead, like the smaller code pushes us above/below the threshold for an inlining operation. And we should check the generated code to figure out what the deltas are. If it's just about "this function is fast when inlined" we can treat that more directly.

@peter-mitsis
Copy link
Collaborator Author

I did some analysis on the generated assembly code. Emptying either clear_halting() or reorganizing the suspend/dead checks in halt_thread() is enough to trigger the compiler to hit some kind of threshold as to how it organizes the branches. One obvious difference between the assembly outputs is that in the main branch, halt_thread() is getting inlined while z_thread_halt() is not. However, in the modified versions, it is z_thread_halt() that is getting inlined and halt_thread() is not. However, when both are modifications are present it is enough to overcome whatever its limitations are and result in better organization and performance.

Minor reorganization to halt_thread() to streamline the branching
and comparisons.

Signed-off-by: Peter Mitsis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants