kernel: Reorganize halt_thread() #81677

peter-mitsis · 2024-11-20T19:03:02Z

Minor reorganization to halt_thread() to streamline the branching and comparisons.

Originally part of #81311, this commit has been split out so it could receive a ton of testing in isolation.

andyross

OK, did a line-by-line review and I'm convinced this indeed has zero logic or synchronization changes. No reason not to +1.

I'm a little dubious about the claimed performance benefit though. This is going to help the case of k_thread_suspend() (which I suspect is the benchmark being chased?) by potentially fixing the direction of the branches taken to be correctly predicted, and by skipping a needless second test of new_state (because there is only one possibility at that point). And that's not nothing, but really I have a hard time believing it's more than 2-5 cycles or so on shallow pipeline CPUs? Maybe we can retest this patch in isolation?

Also there's going to be some minor code size increase to emit the duplicated calls to the unpend_all/update_cache that used to be shared between the two state paths.

andyross · 2024-11-20T19:05:18Z

kernel/sched.c

-			}
-			(void)z_abort_thread_timeout(thread);
-			unpend_all(&thread->join_queue);
+		if (likely(new_state == _THREAD_SUSPENDED)) {


Seems like "suspend" is a rarer and more edgy case than "abort". Though plausibly you could argue that it's more likely to be part of a performance path and thus should be the fast case for code generation.

andyross · 2024-11-20T19:09:25Z

kernel/sched.c

-			if (thread == _current && arch_is_in_isr()) {
-				dummify = true;
-			}
+		/* New thread state is _THREAD_DEAD */


Wouldn't hurt to add an assert here. Before, we'd test both and noop if someone passed a nonsense state, now it'll kill the thread. The former was arguably an OK way to respond to a coding mistake, the new behavior seems dangerous.

peter-mitsis · 2024-11-20T23:57:36Z

For what it is worth, below are a few numbers from running the preemptive thread_metric benchmark on the disco_l475_iot1 board using CONFIG_SCHED_MULTIQ=y (higher numbers is better)

main branch: 5731301
main branch + this commit: 5553629
main branch + An empty clear_halting(): 5339260 (this patch can be found in #81311)
main branch + this commit + an empty clear_halting(): 5877246

It definitely seems weird that the performance for the individual patches drops, but it turns out that together, they work very well.

andyross · 2024-11-21T02:14:56Z

Hold up: so performance drops by 3% with this patch (weird), drops even more if you then make clear_halting() optimize away (even weirder), but magically is 2.5% faster when you combine both tricks?

So, first, are you sure the numbers are reliable and not just noisy? Maybe try a few times or whatever to check a variance?

And if they are good measurements, we're probably looking at a compiler artifact instead, like the smaller code pushes us above/below the threshold for an inlining operation. And we should check the generated code to figure out what the deltas are. If it's just about "this function is fast when inlined" we can treat that more directly.

peter-mitsis · 2024-11-22T19:41:41Z

I did some analysis on the generated assembly code. Emptying either clear_halting() or reorganizing the suspend/dead checks in halt_thread() is enough to trigger the compiler to hit some kind of threshold as to how it organizes the branches. One obvious difference between the assembly outputs is that in the main branch, halt_thread() is getting inlined while z_thread_halt() is not. However, in the modified versions, it is z_thread_halt() that is getting inlined and halt_thread() is not. However, when both are modifications are present it is enough to overcome whatever its limitations are and result in better organization and performance.

Minor reorganization to halt_thread() to streamline the branching and comparisons. Signed-off-by: Peter Mitsis <[email protected]>

zephyrbot added the area: Kernel label Nov 20, 2024

zephyrbot requested review from andyross, ceolin, cfriedt, dcpleung, nashif, npitre and TaiJuWu November 20, 2024 19:03

zephyrbot assigned andyross and peter-mitsis Nov 20, 2024

andyross previously approved these changes Nov 20, 2024

View reviewed changes

nashif previously approved these changes Nov 20, 2024

View reviewed changes

peter-mitsis dismissed stale reviews from nashif and andyross via a1f61c9 November 20, 2024 23:47

peter-mitsis force-pushed the pmitsis-halt-thread-reorder branch from 9a9f51a to a1f61c9 Compare November 20, 2024 23:47

peter-mitsis mentioned this pull request Nov 21, 2024

kernel: Add _THREAD_SLEEPING thread state #81684

Open

kernel: Reorganize halt_thread()

b75158d

Minor reorganization to halt_thread() to streamline the branching and comparisons. Signed-off-by: Peter Mitsis <[email protected]>

peter-mitsis force-pushed the pmitsis-halt-thread-reorder branch from a1f61c9 to b75158d Compare November 27, 2024 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: Reorganize halt_thread() #81677

kernel: Reorganize halt_thread() #81677

peter-mitsis commented Nov 20, 2024

andyross left a comment

andyross Nov 20, 2024

andyross Nov 20, 2024

peter-mitsis Nov 20, 2024

peter-mitsis commented Nov 20, 2024

andyross commented Nov 21, 2024

peter-mitsis commented Nov 22, 2024

kernel: Reorganize halt_thread() #81677

Are you sure you want to change the base?

kernel: Reorganize halt_thread() #81677

Conversation

peter-mitsis commented Nov 20, 2024

andyross left a comment

Choose a reason for hiding this comment

andyross Nov 20, 2024

Choose a reason for hiding this comment

andyross Nov 20, 2024

Choose a reason for hiding this comment

peter-mitsis Nov 20, 2024

Choose a reason for hiding this comment

peter-mitsis commented Nov 20, 2024

andyross commented Nov 21, 2024

peter-mitsis commented Nov 22, 2024