-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/pm_layered: align pm_blocker_t for speed #18846
Conversation
Murdock results✔️ PASSED 72a1e93 sys/pm_layered: pm_get_blocker
ArtifactsThis only reflects a subset of all builds from https://ci-prod.riot-os.org. Please refer to https://ci.riot-os.org for a complete build for now. |
When you say speed can you put any numbers on this? |
i got no numbers but read a huge amount of assembly for the stm32-f767 and samr21 |
nucleo-f764zi master:
this PR
samr21 ante PR
post PR:
|
I'm a huge fan of switching GPIOs to compare before and after with an oscilloscope. On most platforms it's on the same clock domain as the CPU and should introduce a fixed latency. It should allow to show improvements. |
Not sure if this is really necessary if we already see a reduction on code generated. |
92408cf
to
a036042
Compare
a036042
to
72a1e93
Compare
I changed the pm shell command like below and ran the diff --git a/sys/shell/cmds/pm.c b/sys/shell/cmds/pm.c
index 1074bad99e..7178e1f300 100644
--- a/sys/shell/cmds/pm.c
+++ b/sys/shell/cmds/pm.c
@@ -26,6 +26,7 @@
#include "periph/pm.h"
#include "shell.h"
+#include "board.h"
#ifdef MODULE_PM_LAYERED
#include "pm_layered.h"
@@ -76,7 +77,9 @@ static int cmd_block(char *arg)
printf("Blocking power mode %d.\n", mode);
fflush(stdout);
+ LED0_OFF;
pm_block(mode);
+ LED0_ON;
return 0;
}
@@ -117,7 +120,9 @@ static int cmd_unblock(char *arg)
printf("Unblocking power mode %d.\n", mode);
fflush(stdout);
+ LED1_OFF;
pm_unblock(mode);
+ LED1_ON;
return 0;
} On master (ccbb304) I get:
With this PR (72a1e93) I get:
Am I holding it wrong? But I wouldn't block if other CPUs benefit from this patch! |
no your tests are right the improvements for block and unblock are low (they will be non if assert is removed and might be larger when the more verbose assert is used), I would expect 1-2 cycles less spend in block and unblock for a cortexm0 cpu since the fetch of the not taken jump to assert_panic is not in the direct path of the program counter (moved from 0x10 or 0x0e to the end of the function) this is not due to alignment but by the 'attribute((optimize(3)))'. the alignment will not benefit block and unblock since they are always byte access. m0: m7 the improvements of the alignment are very obvious in the for pm_set_lowest gain may be in the alignment (pm_blocker might have been missaligned before -> ldr (m0) needs two memory accesses ldrb (m7) might have both parts of pm_blocker in different cache-lines or they allign (in that case ther will be no gain by aligning). and at last these gains depend on the memory speed there is some place where microchip states memory access take 1 bus cycle but i dont know which bus they talk about |
@jue89:
Somehow the <stdatomic.h> atomic_fetch_add is a generic (applys to different datatypes) even in C -- some buildin magic or crazy macros I guess for the test i just used the same struct (did not change the type of .blocker[]) and it compiled to the same code than with changed types (might not be working for all architectures) stm32 f767 atomic without assert:
atomic with assert:
so counting lines atomic should be slower - return in 0x22 vs 0x18 for irq_disable for the samr21 atomic looks like this:
so there is no atomic support in the cpu but workaround function that dis- and enable irq |
18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=benpicco a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>
18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>
18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19101: CI: update check-labels-action r=miri64 a=kaspar030 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Kaspar Schleiser <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>
pm_(un)block add attribute optimize(3) - shortens hotpath by moving the jump to panic behind return
pm_get_blocker
=
instead of memcpy to ease readabilityContribution description
this PR aligns pm_blocker_t that enables the compiler to use word loads instead of byte loads with
cortexm0: whole struct access (pm_get_blocker and pm_set_lowest used memcpy or bytewise access to read pm_blocker) with this it is ldr (word) and some register operations
Testing procedure
read
Issues/PRs references
#17607 started the journey
#18821 opened the rabbit hole
#18842