sys/pm_layered: align pm_blocker_t for speed #18846

kfessel · 2022-11-04T14:12:11Z

pm_(un)block add attribute optimize(3) - shortens hotpath by moving the jump to panic behind return

pm_get_blocker = instead of memcpy to ease readability

Contribution description

this PR aligns pm_blocker_t that enables the compiler to use word loads instead of byte loads with
cortexm0: whole struct access (pm_get_blocker and pm_set_lowest used memcpy or bytewise access to read pm_blocker) with this it is ldr (word) and some register operations

Testing procedure

read

Issues/PRs references

#17607 started the journey
#18821 opened the rabbit hole
#18842

riot-ci · 2022-11-04T14:19:27Z

Murdock results

✔️ PASSED

72a1e93 sys/pm_layered: pm_get_blocker = instead of memcopy -ease readability

Success	Failures	Total	Runtime
2000	0	2000	06m:47s

Artifacts

Documentation preview

This only reflects a subset of all builds from https://ci-prod.riot-os.org. Please refer to https://ci.riot-os.org for a complete build for now.

benpicco · 2022-11-04T14:34:02Z

When you say speed can you put any numbers on this?

kfessel · 2022-11-04T14:36:26Z

i got no numbers but read a huge amount of assembly for the stm32-f767 and samr21

kfessel · 2022-11-04T14:43:51Z

nucleo-f764zi master:


Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	4b07      	ldr	r3, [pc, #28]	; (28 <pm_set_lowest+0x28>)
   a:	789a      	ldrb	r2, [r3, #2]
   c:	b93a      	cbnz	r2, 1e <pm_set_lowest+0x1e>
   e:	785a      	ldrb	r2, [r3, #1]
  10:	b942      	cbnz	r2, 24 <pm_set_lowest+0x24>
  12:	7818      	ldrb	r0, [r3, #0]
  14:	3800      	subs	r0, #0
  16:	bf18      	it	ne
  18:	2001      	movne	r0, #1
  1a:	f7ff fffe 	bl	0 <pm_set>
  1e:	f384 8810 	msr	PRIMASK, r4
  22:	bd10      	pop	{r4, pc}
  24:	2002      	movs	r0, #2
  26:	e7f8      	b.n	1a <pm_set_lowest+0x1a>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d101      	bne.n	14 <pm_block+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3301      	adds	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd08      	pop	{r3, pc}
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a04      	ldr	r2, [pc, #16]	; (1c <pm_unblock+0x1c>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	b90b      	cbnz	r3, 12 <pm_unblock+0x12>
   e:	f7ff fffe 	bl	0 <_assert_panic>
  12:	3b01      	subs	r3, #1
  14:	5413      	strb	r3, [r2, r0]
  16:	f381 8810 	msr	PRIMASK, r1
  1a:	bd08      	pop	{r3, pc}
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b082      	sub	sp, #8
   2:	f3ef 8210 	mrs	r2, PRIMASK
   6:	b672      	cpsid	i
   8:	4b0b      	ldr	r3, [pc, #44]	; (38 <pm_get_blocker+0x38>)
   a:	8819      	ldrh	r1, [r3, #0]
   c:	789b      	ldrb	r3, [r3, #2]
   e:	f8ad 1000 	strh.w	r1, [sp]
  12:	f88d 3002 	strb.w	r3, [sp, #2]
  16:	f382 8810 	msr	PRIMASK, r2
  1a:	9b00      	ldr	r3, [sp, #0]
  1c:	2000      	movs	r0, #0
  1e:	b2da      	uxtb	r2, r3
  20:	f362 0007 	bfi	r0, r2, #0, #8
  24:	f3c3 2207 	ubfx	r2, r3, #8, #8
  28:	f3c3 4307 	ubfx	r3, r3, #16, #8
  2c:	f362 200f 	bfi	r0, r2, #8, #8
  30:	f363 4017 	bfi	r0, r3, #16, #8
  34:	b002      	add	sp, #8
  36:	4770      	bx	lr
  38:	00000000 	.word	0x00000000

this PR

Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	4b07      	ldr	r3, [pc, #28]	; (28 <pm_set_lowest+0x28>)
   a:	789a      	ldrb	r2, [r3, #2]
   c:	b93a      	cbnz	r2, 1e <pm_set_lowest+0x1e>
   e:	785a      	ldrb	r2, [r3, #1]
  10:	b942      	cbnz	r2, 24 <pm_set_lowest+0x24>
  12:	7818      	ldrb	r0, [r3, #0]
  14:	3800      	subs	r0, #0
  16:	bf18      	it	ne
  18:	2001      	movne	r0, #1
  1a:	f7ff fffe 	bl	0 <pm_set>
  1e:	f384 8810 	msr	PRIMASK, r4
  22:	bd10      	pop	{r4, pc}
  24:	2002      	movs	r0, #2
  26:	e7f8      	b.n	1a <pm_set_lowest+0x1a>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d004      	beq.n	1a <pm_block+0x1a>
  10:	3301      	adds	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd08      	pop	{r3, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a04      	ldr	r2, [pc, #16]	; (1c <pm_unblock+0x1c>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	b123      	cbz	r3, 18 <pm_unblock+0x18>
   e:	3b01      	subs	r3, #1
  10:	5413      	strb	r3, [r2, r0]
  12:	f381 8810 	msr	PRIMASK, r1
  16:	bd08      	pop	{r3, pc}
  18:	f7ff fffe 	bl	0 <_assert_panic>
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b082      	sub	sp, #8
   2:	f3ef 8310 	mrs	r3, PRIMASK
   6:	b672      	cpsid	i
   8:	4a03      	ldr	r2, [pc, #12]	; (18 <pm_get_blocker+0x18>)
   a:	6812      	ldr	r2, [r2, #0]
   c:	9200      	str	r2, [sp, #0]
   e:	f383 8810 	msr	PRIMASK, r3
  12:	9800      	ldr	r0, [sp, #0]
  14:	b002      	add	sp, #8
  16:	4770      	bx	lr
  18:	00000000 	.word	0x00000000

samr21 ante PR

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2304      	movs	r3, #4
   a:	4a08      	ldr	r2, [pc, #32]	; (2c <pm_set_lowest+0x2c>)
   c:	0018      	movs	r0, r3
   e:	3b01      	subs	r3, #1
  10:	5c99      	ldrb	r1, [r3, r2]
  12:	2900      	cmp	r1, #0
  14:	d105      	bne.n	22 <pm_set_lowest+0x22>
  16:	2b00      	cmp	r3, #0
  18:	d1f8      	bne.n	c <pm_set_lowest+0xc>
  1a:	0018      	movs	r0, r3
  1c:	f7ff fffe 	bl	0 <pm_set>
  20:	e001      	b.n	26 <pm_set_lowest+0x26>
  22:	2804      	cmp	r0, #4
  24:	d1fa      	bne.n	1c <pm_set_lowest+0x1c>
  26:	f384 8810 	msr	PRIMASK, r4
  2a:	bd10      	pop	{r4, pc}
  2c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d101      	bne.n	14 <pm_block+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3301      	adds	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd10      	pop	{r4, pc}
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_unblock+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2b00      	cmp	r3, #0
   e:	d101      	bne.n	14 <pm_unblock+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3b01      	subs	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd10      	pop	{r4, pc}
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	b513      	push	{r0, r1, r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2204      	movs	r2, #4
   a:	4668      	mov	r0, sp
   c:	490a      	ldr	r1, [pc, #40]	; (38 <pm_get_blocker+0x38>)
   e:	f7ff fffe 	bl	0 <memcpy>
  12:	f384 8810 	msr	PRIMASK, r4
  16:	9b00      	ldr	r3, [sp, #0]
  18:	24ff      	movs	r4, #255	; 0xff
  1a:	0018      	movs	r0, r3
  1c:	0a19      	lsrs	r1, r3, #8
  1e:	0c1a      	lsrs	r2, r3, #16
  20:	4021      	ands	r1, r4
  22:	4020      	ands	r0, r4
  24:	0209      	lsls	r1, r1, #8
  26:	4022      	ands	r2, r4
  28:	0412      	lsls	r2, r2, #16
  2a:	4308      	orrs	r0, r1
  2c:	0e1b      	lsrs	r3, r3, #24
  2e:	061b      	lsls	r3, r3, #24
  30:	4310      	orrs	r0, r2
  32:	4318      	orrs	r0, r3
  34:	bd16      	pop	{r1, r2, r4, pc}
  36:	46c0      	nop			; (mov r8, r8)
  38:	00000000 	.word	0x00000000

post PR:

Disassembly of section .text.pm_set_lowest:

00000000 <pm_set_lowest>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8410 	mrs	r4, PRIMASK
   6:	b672      	cpsid	i
   8:	2304      	movs	r3, #4
   a:	4a08      	ldr	r2, [pc, #32]	; (2c <pm_set_lowest+0x2c>)
   c:	0018      	movs	r0, r3
   e:	3b01      	subs	r3, #1
  10:	5c99      	ldrb	r1, [r3, r2]
  12:	2900      	cmp	r1, #0
  14:	d105      	bne.n	22 <pm_set_lowest+0x22>
  16:	2b00      	cmp	r3, #0
  18:	d1f8      	bne.n	c <pm_set_lowest+0xc>
  1a:	0018      	movs	r0, r3
  1c:	f7ff fffe 	bl	0 <pm_set>
  20:	e001      	b.n	26 <pm_set_lowest+0x26>
  22:	2804      	cmp	r0, #4
  24:	d1fa      	bne.n	1c <pm_set_lowest+0x1c>
  26:	f384 8810 	msr	PRIMASK, r4
  2a:	bd10      	pop	{r4, pc}
  2c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_block:

00000000 <pm_block>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d004      	beq.n	1a <pm_block+0x1a>
  10:	3301      	adds	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd10      	pop	{r4, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b510      	push	{r4, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_unblock+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2b00      	cmp	r3, #0
   e:	d004      	beq.n	1a <pm_unblock+0x1a>
  10:	3b01      	subs	r3, #1
  12:	5413      	strb	r3, [r2, r0]
  14:	f381 8810 	msr	PRIMASK, r1
  18:	bd10      	pop	{r4, pc}
  1a:	f7ff fffe 	bl	0 <_assert_panic>
  1e:	46c0      	nop			; (mov r8, r8)
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_get_blocker:

00000000 <pm_get_blocker>:
   0:	f3ef 8310 	mrs	r3, PRIMASK
   4:	b672      	cpsid	i
   6:	4a02      	ldr	r2, [pc, #8]	; (10 <pm_get_blocker+0x10>)
   8:	6810      	ldr	r0, [r2, #0]
   a:	f383 8810 	msr	PRIMASK, r3
   e:	4770      	bx	lr
  10:	00000000 	.word	0x00000000

jue89 · 2022-11-04T14:47:54Z

I'm a huge fan of switching GPIOs to compare before and after with an oscilloscope. On most platforms it's on the same clock domain as the CPU and should introduce a fixed latency. It should allow to show improvements.

sys/pm_layered/pm.c

benpicco · 2022-11-04T15:46:05Z

I'm a huge fan of switching GPIOs to compare before and after with an oscilloscope. On most platforms it's on the same clock domain as the CPU and should introduce a fixed latency. It should allow to show improvements.

Not sure if this is really necessary if we already see a reduction on code generated.

jue89 · 2022-11-04T17:27:53Z

I changed the pm shell command like below and ran the tests/periph_pm application on the samr30-xpro:

diff --git a/sys/shell/cmds/pm.c b/sys/shell/cmds/pm.c
index 1074bad99e..7178e1f300 100644
--- a/sys/shell/cmds/pm.c
+++ b/sys/shell/cmds/pm.c
@@ -26,6 +26,7 @@
 
 #include "periph/pm.h"
 #include "shell.h"
+#include "board.h"
 
 #ifdef MODULE_PM_LAYERED
 #include "pm_layered.h"
@@ -76,7 +77,9 @@ static int cmd_block(char *arg)
     printf("Blocking power mode %d.\n", mode);
     fflush(stdout);
 
+    LED0_OFF;
     pm_block(mode);
+    LED0_ON;
 
     return 0;
 }
@@ -117,7 +120,9 @@ static int cmd_unblock(char *arg)
     printf("Unblocking power mode %d.\n", mode);
     fflush(stdout);
 
+    LED1_OFF;
     pm_unblock(mode);
+    LED1_ON;
 
     return 0;
 }

On master (ccbb304) I get:

pm_block: 1.16us
pm_unblock: 1.00us to 1.16us

With this PR (72a1e93) I get:

pm_block: 1.00us to 1.16us
pm_unblock: 1.00us to 1.16us

Am I holding it wrong?

But I wouldn't block if other CPUs benefit from this patch!

kfessel · 2022-11-04T18:42:16Z

@jue89

Am I holding it wrong?

no your tests are right the improvements for block and unblock are low (they will be non if assert is removed and might be larger when the more verbose assert is used), I would expect 1-2 cycles less spend in block and unblock for a cortexm0 cpu since the fetch of the not taken jump to assert_panic is not in the direct path of the program counter (moved from 0x10 or 0x0e to the end of the function) this is not due to alignment but by the 'attribute((optimize(3)))'. the alignment will not benefit block and unblock since they are always byte access.

m0:
block runs from 0 to 0x1c before and 0 to 0x18 after PR
unblock runs from 0 to 0x1c before and 0 to 0x18 after PR

m7
block runs from 0 to 0x1c before and 0 to 0x18 after PR
unblock runs from 0 to 0x1a before and 0 to 0x16 after PR

the improvements of the alignment are very obvious in the pm_get_blocker (where for m0 memcpy call was replace by a copy word, and on the m7 the 4*ldrb by 1 ldr)

for pm_set_lowest gain may be in the alignment (pm_blocker might have been missaligned before -> ldr (m0) needs two memory accesses ldrb (m7) might have both parts of pm_blocker in different cache-lines or they allign (in that case ther will be no gain by aligning).
With this pr they align the one ldr is one memory access.
(pm_get_blocker shows how the reads might have been split before and after this pr they aren't)

and at last these gains depend on the memory speed there is some place where microchip states memory access take 1 bus cycle but i dont know which bus they talk about

kfessel · 2022-11-04T20:28:45Z

@jue89:
i just thought there might be another gain to have by switching to assert (since i saw not stack being build for block and unblock using atomic access) but it turned out that the stack wasn't build because i didn't assert in the atomic variants
atomic versions with assert also build a stack.

__attribute__((optimize(3)))
void pm_block(unsigned mode)
{
    DEBUG("[pm_layered] pm_block(%d)\n", mode);
#if 1
    assert(atomic_fetch_add(&pm_blocker.blockers[mode],1)<255);
#elif 1
    atomic_fetch_add(&pm_blocker.blockers[mode],1);
#else
    unsigned state = irq_disable();
    assert(pm_blocker.blockers[mode] != 255);
    pm_blocker.blockers[mode]++;
    irq_restore(state);
#endif
}

__attribute__((optimize(3)))
void pm_unblock(unsigned mode)
{
    DEBUG("[pm_layered] pm_unblock(%d)\n", mode);
#if 1
    assert(atomic_fetch_sub(&pm_blocker.blockers[mode],1)>0);
#elif 1
    atomic_fetch_sub(&pm_blocker.blockers[mode],1);
#else 
    unsigned state = irq_disable();
    assert(pm_blocker.blockers[mode] > 0);
    pm_blocker.blockers[mode]--;
    irq_restore(state);
#endif
}

Somehow the <stdatomic.h> atomic_fetch_add is a generic (applys to different datatypes) even in C -- some buildin magic or crazy macros I guess

for the test i just used the same struct (did not change the type of .blocker[]) and it compiled to the same code than with changed types (might not be working for all architectures)

stm32 f767 atomic without assert:

00000000 <pm_block>:
   0:	4b06      	ldr	r3, [pc, #24]	; (1c <pm_block+0x1c>)
   2:	f3bf 8f5b 	dmb	ish
   6:	4418      	add	r0, r3
   8:	e8d0 3f4f 	ldrexb	r3, [r0]
   c:	3301      	adds	r3, #1
   e:	e8c0 3f42 	strexb	r2, r3, [r0]
  12:	2a00      	cmp	r2, #0
  14:	d1f8      	bne.n	8 <pm_block+0x8>
  16:	f3bf 8f5b 	dmb	ish
  1a:	4770      	bx	lr
  1c:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	4b06      	ldr	r3, [pc, #24]	; (1c <pm_unblock+0x1c>)
   2:	f3bf 8f5b 	dmb	ish
   6:	4418      	add	r0, r3
   8:	e8d0 3f4f 	ldrexb	r3, [r0]
   c:	3301      	adds	r3, #1
   e:	e8c0 3f42 	strexb	r2, r3, [r0]
  12:	2a00      	cmp	r2, #0
  14:	d1f8      	bne.n	8 <pm_unblock+0x8>
  16:	f3bf 8f5b 	dmb	ish
  1a:	4770      	bx	lr
  1c:	00000000 	.word	0x00000000

atomic with assert:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3bf 8f5b 	dmb	ish
   6:	4b08      	ldr	r3, [pc, #32]	; (28 <pm_block+0x28>)
   8:	4418      	add	r0, r3
   a:	e8d0 3f4f 	ldrexb	r3, [r0]
   e:	1c5a      	adds	r2, r3, #1
  10:	e8c0 2f41 	strexb	r1, r2, [r0]
  14:	2900      	cmp	r1, #0
  16:	d1f8      	bne.n	a <pm_block+0xa>
  18:	b2db      	uxtb	r3, r3
  1a:	f3bf 8f5b 	dmb	ish
  1e:	2bff      	cmp	r3, #255	; 0xff
  20:	d000      	beq.n	24 <pm_block+0x24>
  22:	bd08      	pop	{r3, pc}
  24:	f7ff fffe 	bl	0 <_assert_panic>
  28:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3bf 8f5b 	dmb	ish
   6:	4b08      	ldr	r3, [pc, #32]	; (28 <pm_unblock+0x28>)
   8:	4418      	add	r0, r3
   a:	e8d0 3f4f 	ldrexb	r3, [r0]
   e:	1c5a      	adds	r2, r3, #1
  10:	e8c0 2f41 	strexb	r1, r2, [r0]
  14:	2900      	cmp	r1, #0
  16:	d1f8      	bne.n	a <pm_unblock+0xa>
  18:	b2db      	uxtb	r3, r3
  1a:	f3bf 8f5b 	dmb	ish
  1e:	2bff      	cmp	r3, #255	; 0xff
  20:	d000      	beq.n	24 <pm_unblock+0x24>
  22:	bd08      	pop	{r3, pc}
  24:	f7ff fffe 	bl	0 <_assert_panic>
  28:	00000000 	.word	0x00000000

so counting lines atomic should be slower - return in 0x22 vs 0x18 for irq_disable

for the samr21 atomic looks like this:

00000000 <pm_block>:
   0:	4b05      	ldr	r3, [pc, #20]	; (18 <pm_block+0x18>)
   2:	b510      	push	{r4, lr}
   4:	2205      	movs	r2, #5
   6:	2101      	movs	r1, #1
   8:	1818      	adds	r0, r3, r0
   a:	f7ff fffe 	bl	0 <__atomic_fetch_add_1>
   e:	28ff      	cmp	r0, #255	; 0xff
  10:	d000      	beq.n	14 <pm_block+0x14>
  12:	bd10      	pop	{r4, pc}
  14:	f7ff fffe 	bl	0 <_assert_panic>
  18:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	4b05      	ldr	r3, [pc, #20]	; (18 <pm_unblock+0x18>)
   2:	b510      	push	{r4, lr}
   4:	2205      	movs	r2, #5
   6:	2101      	movs	r1, #1
   8:	1818      	adds	r0, r3, r0
   a:	f7ff fffe 	bl	0 <__atomic_fetch_sub_1>
   e:	2800      	cmp	r0, #0
  10:	d000      	beq.n	14 <pm_unblock+0x14>
  12:	bd10      	pop	{r4, pc}
  14:	f7ff fffe 	bl	0 <_assert_panic>
  18:	00000000 	.word	0x00000000

so there is no atomic support in the cpu but workaround function that dis- and enable irq

18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=benpicco a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>

18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>

18477: gnrc_static: add static network configuration r=miri64 a=benpicco 19101: CI: update check-labels-action r=miri64 a=kaspar030 19155: Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" r=maribu a=Teufelchen1 Revert "sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath" This reverts commit 5447203. ### Contribution description Compiling `examples/gnrc_networking_mac` using `TOOLCHAIN=llvm` yields the following error: ``` RIOT/sys/pm_layered/pm.c:77:16: error: unknown attribute 'optimize' ignored [-Werror,-Wunknown-attributes] __attribute__((optimize(3))) ``` As indicated, this is because the attribute `optimize` is GCC only and not present in LLVM. Compare the manpages of [GCC](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) and [LLVM](https://clang.llvm.org/docs/AttributeReference.html). ### Testing procedure Since this should only affect performance and not behavior, no special testing is needed. I am not aware of any tests in RIOT which could verify that assumption. ### Issues/PRs references Introduced in #18846 There is another instance of this attribute being used in[ shell_lock.c](https://github.com/RIOT-OS/RIOT/blob/6fb340d654ac8da07759cb9199c6aaa478589aa7/sys/shell_lock/shell_lock.c#L80). Since the usage is security related, I omit it from this PR. Co-authored-by: Benjamin Valentin <[email protected]> Co-authored-by: Kaspar Schleiser <[email protected]> Co-authored-by: Teufelchen1 <[email protected]>

kfessel requested a review from kaspar030 as a code owner November 4, 2022 14:12

github-actions bot added the Area: sys Area: System label Nov 4, 2022

kfessel added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed Area: sys Area: System labels Nov 4, 2022

kfessel requested a review from benpicco November 4, 2022 14:27

benpicco requested a review from jue89 November 4, 2022 14:30

jue89 reviewed Nov 4, 2022

View reviewed changes

sys/pm_layered/pm.c Show resolved Hide resolved

sys/pm_layered: align pm_blocker_t for speed

2197396

kfessel force-pushed the p-pm-layerd-speedup1 branch from 92408cf to a036042 Compare November 4, 2022 15:52

github-actions bot added the Area: sys Area: System label Nov 4, 2022

kfessel added 2 commits November 4, 2022 16:59

sys/pm_layered: pm_(un)block add attribute optimize(3) -shortens hotpath

5447203

sys/pm_layered: pm_get_blocker = instead of memcopy -ease readability

72a1e93

kfessel force-pushed the p-pm-layerd-speedup1 branch from a036042 to 72a1e93 Compare November 4, 2022 16:00

benpicco approved these changes Nov 8, 2022

View reviewed changes

kfessel merged commit c354ab6 into RIOT-OS:master Nov 8, 2022

Teufelchen1 mentioned this pull request Jan 16, 2023

Revert "sys/pm_layered: pm_(un)block add attribute optimize(3)" #19155

Merged

kaspar030 added this to the Release 2023.01 milestone Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sys/pm_layered: align pm_blocker_t for speed #18846

sys/pm_layered: align pm_blocker_t for speed #18846

kfessel commented Nov 4, 2022

riot-ci commented Nov 4, 2022 •

edited

Loading

benpicco commented Nov 4, 2022

kfessel commented Nov 4, 2022

kfessel commented Nov 4, 2022 •

edited

Loading

jue89 commented Nov 4, 2022

benpicco commented Nov 4, 2022 •

edited

Loading

jue89 commented Nov 4, 2022

kfessel commented Nov 4, 2022 •

edited

Loading

kfessel commented Nov 4, 2022 •

edited

Loading

sys/pm_layered: align pm_blocker_t for speed #18846

sys/pm_layered: align pm_blocker_t for speed #18846

Conversation

kfessel commented Nov 4, 2022

Contribution description

Testing procedure

Issues/PRs references

riot-ci commented Nov 4, 2022 • edited Loading

Murdock results

Artifacts

benpicco commented Nov 4, 2022

kfessel commented Nov 4, 2022

kfessel commented Nov 4, 2022 • edited Loading

nucleo-f764zi master:

this PR

samr21 ante PR

post PR:

jue89 commented Nov 4, 2022

benpicco commented Nov 4, 2022 • edited Loading

jue89 commented Nov 4, 2022

kfessel commented Nov 4, 2022 • edited Loading

kfessel commented Nov 4, 2022 • edited Loading

riot-ci commented Nov 4, 2022 •

edited

Loading

kfessel commented Nov 4, 2022 •

edited

Loading

benpicco commented Nov 4, 2022 •

edited

Loading

kfessel commented Nov 4, 2022 •

edited

Loading

kfessel commented Nov 4, 2022 •

edited

Loading