Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/pm_layered: move (un)block assert for minor speedup #18842

Merged
merged 1 commit into from
Nov 4, 2022

Conversation

kfessel
Copy link
Contributor

@kfessel kfessel commented Nov 3, 2022

@jue89 is doing good work adding pm_layerd where its needed, this PR shall optimize pm_block and unblock

Contribution description

this moves the assert into the irq guard guard which will spare us the extra load that is need for it (irq disable invalidate memory to ensure there is not reordering)

Testing procedure

read

compile

do speed tests

read assembly

I do not have a slow arm at hand -> i read the assembly for
old:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	4b07      	ldr	r3, [pc, #28]	; (20 <pm_block+0x20>)
   4:	5c1a      	ldrb	r2, [r3, r0]
   6:	2aff      	cmp	r2, #255	; 0xff
   8:	d101      	bne.n	e <pm_block+0xe>
   a:	f7ff fffe 	bl	0 <_assert_panic>
   e:	f3ef 8110 	mrs	r1, PRIMASK
  12:	b672      	cpsid	i
  14:	5c1a      	ldrb	r2, [r3, r0]
  16:	3201      	adds	r2, #1
  18:	541a      	strb	r2, [r3, r0]
  1a:	f381 8810 	msr	PRIMASK, r1
  1e:	bd08      	pop	{r3, pc}
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	4b07      	ldr	r3, [pc, #28]	; (20 <pm_unblock+0x20>)
   4:	5c1a      	ldrb	r2, [r3, r0]
   6:	b90a      	cbnz	r2, c <pm_unblock+0xc>
   8:	f7ff fffe 	bl	0 <_assert_panic>
   c:	f3ef 8110 	mrs	r1, PRIMASK
  10:	b672      	cpsid	i
  12:	5c1a      	ldrb	r2, [r3, r0]
  14:	3a01      	subs	r2, #1
  16:	541a      	strb	r2, [r3, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd08      	pop	{r3, pc}
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

this PR:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_block+0x20>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	2bff      	cmp	r3, #255	; 0xff
   e:	d101      	bne.n	14 <pm_block+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3301      	adds	r3, #1
  16:	5413      	strb	r3, [r2, r0]
  18:	f381 8810 	msr	PRIMASK, r1
  1c:	bd08      	pop	{r3, pc}
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a04      	ldr	r2, [pc, #16]	; (1c <pm_unblock+0x1c>)
   a:	5c13      	ldrb	r3, [r2, r0]
   c:	b90b      	cbnz	r3, 12 <pm_unblock+0x12>
   e:	f7ff fffe 	bl	0 <_assert_panic>
  12:	3b01      	subs	r3, #1
  14:	5413      	strb	r3, [r2, r0]
  16:	f381 8810 	msr	PRIMASK, r1
  1a:	bd08      	pop	{r3, pc}
  1c:	00000000 	.word	0x00000000

atomic:

00000000 <pm_block>:
   0:	4b07      	ldr	r3, [pc, #28]	; (20 <pm_block+0x20>)
   2:	f3bf 8f5b 	dmb	ish
   6:	eb03 0080 	add.w	r0, r3, r0, lsl #2
   a:	e850 3f00 	ldrex	r3, [r0]
   e:	3301      	adds	r3, #1
  10:	e840 3200 	strex	r2, r3, [r0]
  14:	2a00      	cmp	r2, #0
  16:	d1f8      	bne.n	a <pm_block+0xa>
  18:	f3bf 8f5b 	dmb	ish
  1c:	4770      	bx	lr
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	4b07      	ldr	r3, [pc, #28]	; (20 <pm_unblock+0x20>)
   2:	f3bf 8f5b 	dmb	ish
   6:	eb03 0080 	add.w	r0, r3, r0, lsl #2
   a:	e850 3f00 	ldrex	r3, [r0]
   e:	3b01      	subs	r3, #1
  10:	e840 3200 	strex	r2, r3, [r0]
  14:	2a00      	cmp	r2, #0
  16:	d1f8      	bne.n	a <pm_unblock+0xa>
  18:	f3bf 8f5b 	dmb	ish
  1c:	4770      	bx	lr
  1e:	bf00      	nop
  20:	00000000 	.word	0x00000000

the atomic has no assert i just wanted to see what the ++ would look like without irq disable. dmb ish is a memory barrier

this pr but using uint_fast8_t:

00000000 <pm_block>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a06      	ldr	r2, [pc, #24]	; (24 <pm_block+0x24>)
   a:	f852 3020 	ldr.w	r3, [r2, r0, lsl #2]
   e:	2bff      	cmp	r3, #255	; 0xff
  10:	d101      	bne.n	16 <pm_block+0x16>
  12:	f7ff fffe 	bl	0 <_assert_panic>
  16:	3301      	adds	r3, #1
  18:	f842 3020 	str.w	r3, [r2, r0, lsl #2]
  1c:	f381 8810 	msr	PRIMASK, r1
  20:	bd08      	pop	{r3, pc}
  22:	bf00      	nop
  24:	00000000 	.word	0x00000000

Disassembly of section .text.pm_unblock:

00000000 <pm_unblock>:
   0:	b508      	push	{r3, lr}
   2:	f3ef 8110 	mrs	r1, PRIMASK
   6:	b672      	cpsid	i
   8:	4a05      	ldr	r2, [pc, #20]	; (20 <pm_unblock+0x20>)
   a:	f852 3020 	ldr.w	r3, [r2, r0, lsl #2]
   e:	b90b      	cbnz	r3, 14 <pm_unblock+0x14>
  10:	f7ff fffe 	bl	0 <_assert_panic>
  14:	3b01      	subs	r3, #1
  16:	f842 3020 	str.w	r3, [r2, r0, lsl #2]
  1a:	f381 8810 	msr	PRIMASK, r1
  1e:	bd08      	pop	{r3, pc}
  20:	00000000 	.word	0x00000000

from what I read about arm memory access times the last variant may be faster (even though it is longer) since the ldr.w is aligned for all modes while ldrb might be slower if not aligned (mode 1,2,3). but in both cases the assert should be moved to reduce the number of ldr.

Issues/PRs references

#18821
#17607

@github-actions github-actions bot added the Area: sys Area: System label Nov 3, 2022
@benpicco benpicco added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Nov 3, 2022
@riot-ci
Copy link

riot-ci commented Nov 3, 2022

Murdock results

✔️ PASSED

f3ed268 sys/pm_layered: move (un)block assert for minor speedup

Success Failures Total Runtime
2000 0 2000 06m:36s

Artifacts

This only reflects a subset of all builds from https://ci-prod.riot-os.org. Please refer to https://ci.riot-os.org for a complete build for now.

@benpicco benpicco added the Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer label Nov 3, 2022
@benpicco benpicco merged commit f954a8b into RIOT-OS:master Nov 4, 2022
@jue89
Copy link
Contributor

jue89 commented Nov 4, 2022

I'm curios: could you share the code behind your experiments? I already thought about using something like atomic_fetch_add_u8() (it's not available on cortexm, I guess?).

@maribu
Copy link
Member

maribu commented Nov 4, 2022

Note that for RIOT internal code the atomic utils can be a better option, as they are trivially inline-able and easier to optimize.

(The compilers typically assume that falling back to OS support is expensive and implemented via a mutex, while we just disable IRQs which typically is cheap.)

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

I used stdatomic.h atomic_fetch_add on a atomic_uint_fast8_t for the .blockers inside pm_blocker_t
I hope these are enough hits since I would redo to share

i just realized cortexm0 an m4 are quite different -> the assembly i posted above was m4 (nucleo-f767zi) (the m0 assembly also improved)

@kfessel
Copy link
Contributor Author

kfessel commented Nov 4, 2022

atomic_utils.h got me irq disable and enable on stm32-f767 and samr21 (no arm atomic operations where used)

@kaspar030 kaspar030 added this to the Release 2023.01 milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants