Speed up compilation of all our WriteAttribute machinery. #11603

bzbarsky-apple · 2021-11-09T20:55:37Z

It turns out that instantiating fairly heavy-weight templates hundreds
of times is slow to compile.

Instead of having an instantiation per attribute, switch to only
instantiating the complex templates per type of attribute, with thin
per-attribute wrappers for auto-deriving the cluster id and attribute
id. This shaves over a minute of wall-clock time off compiling
chip-tool for me, and close to 2 minutes of total CPU time.

Problem

Slower compiles than we want.

Change overview

See above.

Testing

No behavior changes, did lots of measurement of compile times.

github-actions · 2021-11-09T21:22:40Z

PR #11603: Size comparison from 89898f8 to fbc6e3d

Full report (9 builds for k32w, p6, qpg, telink)

platform	target	config	section	`89898f8`	`fbc6e3d`
k32w	lock-app	k32w061+debug	(read/write)	592360	592360
			.bss	68524	68524
			.data	1880	1880
			.text	516156	516156
	shell	k32w061+debug	(read/write)	658016	658016
			.bss	79324	79324
			.data	1848	1848
			.text	571044	571044
	lighting-app	k32w061+se05x+release	(read/write)	699648	699648
			.bss	77996	77996
			.data	1912	1912
			.text	613940	613940
p6	all-clusters-app	default	(read/write)	`2299528`	`2299528`
			.bss	112448	112448
			.data	2536	2536
			.heap	918360	918360
			.text	1257792	1257792
	lock-app	default	(read/write)	`2212184`	`2212184`
			.bss	101256	101256
			.data	2408	2408
			.heap	929680	929680
			.text	1170448	1170448
qpg	lighting-app	qpg6100+debug	(read only)	490776	490776
			(read/write)	114140	114140
			.bss	51152	51152
			.data	1012	1012
			.text	485456	485456
	lock-app	qpg6100+debug	(read only)	466988	466988
			(read/write)	114144	114144
			.bss	50096	50096
			.data	968	968
			.text	461668	461668
	persistent-storage-app	qpg6100+debug	(read only)	153400	153400
			(read/write)	114140	114140
			.bss	19616	19616
			.data	364	364
			.text	148080	148080
telink	lighting-app	tlsr9518adk80d	(read/write)	663750	663750
			bss	69272	69272
			noinit	33216	33216
			text	458596	458596

It turns out that instantiating fairly heavy-weight templates hundreds of times is slow to compile. Instead of having an instantiation per attribute, switch to only instantiating the complex templates per _type_ of attribute, with thin per-attribute wrappers for auto-deriving the cluster id and attribute id. This shaves over a minute of wall-clock time off compiling chip-tool for me, and close to 2 minutes of total CPU time.

github-actions · 2021-11-09T21:47:14Z

PR #11603: Size comparison from 89898f8 to edb6f1c

Decreases (2 builds for linux)

platform	target	config	section	`89898f8`	`edb6f1c`	change	% change
linux	chip-tool	debug	(read only)	4995029	`4615541`	-379488	-7.6
			.rodata	242064	241360	-704	-0.3
			.text	4481717	4102933	-378784	-8.5
	tv-app	debug	.data.rel.ro	59448	59432	-16	-0.0

Full report (38 builds for efr32, esp32, k32w, linux, mbed, nrfconnect, p6, qpg, telink)

platform	target	config	section	`89898f8`	`edb6f1c`	change	% change
efr32	lighting-app	BRD4161A	(read only)	742904	742904	0	0.0
			(read/write)	116268	116268	0	0.0
			.bss	114484	114484	0	0.0
			.data	1784	1784	0	0.0
			.text	742896	742896	0	0.0
		BRD4161A+rpc	(read only)	730440	730440	0	0.0
			(read/write)	132892	132892	0	0.0
			.bss	130988	130988	0	0.0
			.data	1900	1900	0	0.0
			.text	730432	730432	0	0.0
	lock-app	BRD4161A	(read only)	722192	722192	0	0.0
			(read/write)	114084	114084	0	0.0
			.bss	112340	112340	0	0.0
			.data	1744	1744	0	0.0
			.text	722184	722184	0	0.0
	window-app	BRD4161A	(read only)	723088	723088	0	0.0
			(read/write)	114412	114412	0	0.0
			.bss	112660	112660	0	0.0
			.data	1748	1748	0	0.0
			.text	723080	723080	0	0.0
esp32	all-clusters-app	c3devkit	(read only)	880694	880694	0	0.0
			(read/write)	`1306536`	`1306536`	0	0.0
			.dram0.bss	58464	58464	0	0.0
			.dram0.data	16472	16472	0	0.0
			.flash.rodata	198360	198360	0	0.0
			.flash.text	880694	880694	0	0.0
			.iram0.text	57526	57526	0	0.0
		m5stack	(read only)	911867	911867	0	0.0
			(read/write)	423864	423864	0	0.0
			.dram0.bss	60968	60968	0	0.0
			.dram0.data	32108	32108	0	0.0
			.flash.rodata	204624	204624	0	0.0
			.flash.text	911867	911867	0	0.0
			.iram0.text	125115	125115	0	0.0
k32w	lighting-app	k32w061+se05x+release	(read/write)	699648	699648	0	0.0
			.bss	77996	77996	0	0.0
			.data	1912	1912	0	0.0
			.text	613940	613940	0	0.0
	lock-app	k32w061+debug	(read/write)	592360	592360	0	0.0
			.bss	68524	68524	0	0.0
			.data	1880	1880	0	0.0
			.text	516156	516156	0	0.0
	shell	k32w061+debug	(read/write)	658016	658016	0	0.0
			.bss	79324	79324	0	0.0
			.data	1848	1848	0	0.0
			.text	571044	571044	0	0.0
linux	all-clusters-app	debug	(read only)	1710601	1710601	0	0.0
			(read/write)	126528	126528	0	0.0
			.bss	57872	57872	0	0.0
			.data	1042	1042	0	0.0
			.data.rel.ro	62352	62352	0	0.0
			.dynamic	592	592	0	0.0
			.got	4088	4088	0	0.0
			.init	27	27	0	0.0
			.init_array	552	552	0	0.0
			.rodata	139765	139765	0	0.0
			.text	1437346	1437346	0	0.0
	bridge-app	debug+rpc	(read only)	1298253	1298253	0	0.0
			(read/write)	77072	77072	0	0.0
			.bss	42768	42768	0	0.0
			.data	1568	1568	0	0.0
			.data.rel.ro	27760	27760	0	0.0
			.dynamic	592	592	0	0.0
			.got	3952	3952	0	0.0
			.init	27	27	0	0.0
			.init_array	408	408	0	0.0
			.rodata	111540	111540	0	0.0
			.text	`1090821`	`1090821`	0	0.0
	chip-tool	debug	(read only)	4995029	`4615541`	-379488	-7.6
			(read/write)	134760	134760	0	0.0
			.bss	25840	25840	0	0.0
			.data	2256	2256	0	0.0
			.data.rel.ro	101232	101232	0	0.0
			.dynamic	592	592	0	0.0
			.got	4368	4368	0	0.0
			.init	27	27	0	0.0
			.init_array	432	432	0	0.0
			.rodata	242064	241360	-704	-0.3
			.text	4481717	4102933	-378784	-8.5
	lighting-app	debug+rpc	(read only)	`1557945`	`1557945`	0	0.0
			(read/write)	110088	110088	0	0.0
			.bss	48432	48432	0	0.0
			.data	1202	1202	0	0.0
			.data.rel.ro	55168	55168	0	0.0
			.dynamic	608	608	0	0.0
			.got	4112	4112	0	0.0
			.init	27	27	0	0.0
			.init_array	528	528	0	0.0
			.rodata	128977	128977	0	0.0
			.text	1295410	1295410	0	0.0
	ota-provider-app	debug	(read only)	1259721	1259721	0	0.0
			(read/write)	75336	75336	0	0.0
			.bss	44864	44864	0	0.0
			.data	752	752	0	0.0
			.data.rel.ro	24616	24616	0	0.0
			.dynamic	592	592	0	0.0
			.got	4016	4016	0	0.0
			.init	27	27	0	0.0
			.init_array	448	448	0	0.0
			.rodata	113216	113216	0	0.0
			.text	1050258	1050258	0	0.0
	ota-requestor-app	debug	(read only)	1344281	1344281	0	0.0
			(read/write)	79104	79104	0	0.0
			.bss	47328	47328	0	0.0
			.data	816	816	0	0.0
			.data.rel.ro	25880	25880	0	0.0
			.dynamic	592	592	0	0.0
			.got	3992	3992	0	0.0
			.init	27	27	0	0.0
			.init_array	472	472	0	0.0
			.rodata	124232	124232	0	0.0
			.text	`1121250`	`1121250`	0	0.0
	shell	debug	(read only)	789065	789065	0	0.0
			(read/write)	65480	65480	0	0.0
			.bss	23912	23912	0	0.0
			.data	242	242	0	0.0
			.data.rel.ro	36816	36816	0	0.0
			.dynamic	592	592	0	0.0
			.got	3528	3528	0	0.0
			.init	27	27	0	0.0
			.init_array	344	344	0	0.0
			.rodata	78191	78191	0	0.0
			.text	609362	609362	0	0.0
	tv-app	debug	(read only)	1842281	1842281	0	0.0
			(read/write)	407936	407936	0	0.0
			.bss	340112	340112	0	0.0
			.data	2736	2736	0	0.0
			.data.rel.ro	59448	59432	-16	-0.0
			.dynamic	592	592	0	0.0
			.got	4408	4408	0	0.0
			.init	27	27	0	0.0
			.init_array	616	616	0	0.0
			.rodata	156456	156456	0	0.0
			.text	1541906	1541906	0	0.0
mbed	all-clusters-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0	0.0
			(read/write)	2290856	2290856	0	0.0
			.bss	179436	179436	0	0.0
			.data	5232	5232	0	0.0
			.heap	851776	851776	0	0.0
			.text	`1253456`	`1253456`	0	0.0
	lighting-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0	0.0
			(read/write)	2270952	2270952	0	0.0
			.bss	172492	172492	0	0.0
			.data	5584	5584	0	0.0
			.heap	858368	858368	0	0.0
			.text	1233552	1233552	0	0.0
	lock-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0	0.0
			(read/write)	2248672	2248672	0	0.0
			.bss	171388	171388	0	0.0
			.data	5568	5568	0	0.0
			.heap	859488	859488	0	0.0
			.text	`1211272`	`1211272`	0	0.0
	pigweed-app	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0	0.0
			(read/write)	1139744	1139744	0	0.0
			.bss	11752	11752	0	0.0
			.data	4368	4368	0	0.0
			.heap	1020328	1020328	0	0.0
			.text	103128	103128	0	0.0
	shell	CY8CPROTO_062_4343W+release	(read only)	6224	6224	0	0.0
			(read/write)	`2048864`	`2048864`	0	0.0
			.bss	156456	156456	0	0.0
			.data	4976	4976	0	0.0
			.heap	875016	875016	0	0.0
			.text	1011464	1011464	0	0.0
nrfconnect	lighting-app	nrf52840dk_nrf52840	(read/write)	862155	862155	0	0.0
			bss	111460	111460	0	0.0
			rodata	96924	96924	0	0.0
			text	578128	578128	0	0.0
		nrf52840dk_nrf52840+rpc	(read/write)	824503	824503	0	0.0
			bss	107812	107812	0	0.0
			rodata	88104	88104	0	0.0
			text	552276	552276	0	0.0
		nrf5340dk_nrf5340_cpuapp	(read/write)	787162	787162	0	0.0
			bss	112832	112832	0	0.0
			rodata	92180	92180	0	0.0
			text	507600	507600	0	0.0
	lock-app	nrf52840dk_nrf52840	(read/write)	838831	838831	0	0.0
			bss	110492	110492	0	0.0
			rodata	93296	93296	0	0.0
			text	559612	559612	0	0.0
		nrf5340dk_nrf5340_cpuapp	(read/write)	764142	764142	0	0.0
			bss	111904	111904	0	0.0
			rodata	88600	88600	0	0.0
			text	489176	489176	0	0.0
	pigweed-app	nrf52840dk_nrf52840	(read/write)	497327	497327	0	0.0
			bss	51824	51824	0	0.0
			rodata	45780	45780	0	0.0
			text	339436	339436	0	0.0
	pump-app	nrf52840dk_nrf52840	(read/write)	844955	844955	0	0.0
			bss	110632	110632	0	0.0
			rodata	95004	95004	0	0.0
			text	563772	563772	0	0.0
	pump-controller-app	nrf52840dk_nrf52840	(read/write)	838699	838699	0	0.0
			bss	110528	110528	0	0.0
			rodata	93292	93292	0	0.0
			text	559348	559348	0	0.0
	shell	nrf52840dk_nrf52840	(read/write)	776431	776431	0	0.0
			bss	109280	109280	0	0.0
			rodata	72564	72564	0	0.0
			text	520004	520004	0	0.0
		nrf5340dk_nrf5340_cpuapp	(read/write)	691482	691482	0	0.0
			bss	110264	110264	0	0.0
			rodata	67204	67204	0	0.0
			text	440612	440612	0	0.0
p6	all-clusters-app	default	(read/write)	`2299528`	`2299528`	0	0.0
			.bss	112448	112448	0	0.0
			.data	2536	2536	0	0.0
			.heap	918360	918360	0	0.0
			.text	1257792	1257792	0	0.0
	lock-app	default	(read/write)	`2212184`	`2212184`	0	0.0
			.bss	101256	101256	0	0.0
			.data	2408	2408	0	0.0
			.heap	929680	929680	0	0.0
			.text	1170448	1170448	0	0.0
qpg	lighting-app	qpg6100+debug	(read only)	490776	490776	0	0.0
			(read/write)	114140	114140	0	0.0
			.bss	51152	51152	0	0.0
			.data	1012	1012	0	0.0
			.text	485456	485456	0	0.0
	lock-app	qpg6100+debug	(read only)	466988	466988	0	0.0
			(read/write)	114144	114144	0	0.0
			.bss	50096	50096	0	0.0
			.data	968	968	0	0.0
			.text	461668	461668	0	0.0
	persistent-storage-app	qpg6100+debug	(read only)	153400	153400	0	0.0
			(read/write)	114140	114140	0	0.0
			.bss	19616	19616	0	0.0
			.data	364	364	0	0.0
			.text	148080	148080	0	0.0
telink	lighting-app	tlsr9518adk80d	(read/write)	663750	663750	0	0.0
			bss	69272	69272	0	0.0
			noinit	33216	33216	0	0.0
			text	458596	458596	0	0.0

bzbarsky-apple · 2021-11-09T23:06:54Z

@msandstedt @saurabhst @jepenven-silabs @jmartinez-silabs @Damian-Nordic Please take a look?

…ip#11603) It turns out that instantiating fairly heavy-weight templates hundreds of times is slow to compile. Instead of having an instantiation per attribute, switch to only instantiating the complex templates per _type_ of attribute, with thin per-attribute wrappers for auto-deriving the cluster id and attribute id. This shaves over a minute of wall-clock time off compiling chip-tool for me, and close to 2 minutes of total CPU time.

bzbarsky-apple requested a review from mrjerryjohns November 9, 2021 20:55

boring-cyborg bot added app controller darwin labels Nov 9, 2021

pullapprove bot requested review from vivien-apple, wbschiller, woody-apple, yufengwangca and yunhanw-google November 9, 2021 20:55

pullapprove bot added the review - pending label Nov 9, 2021

bzbarsky-apple force-pushed the faster-write-compile branch from fbc6e3d to edb6f1c Compare November 9, 2021 21:25

woody-apple approved these changes Nov 9, 2021

View reviewed changes

yunhanw-google approved these changes Nov 9, 2021

View reviewed changes

mrjerryjohns approved these changes Nov 9, 2021

View reviewed changes

jmartinez-silabs approved these changes Nov 9, 2021

View reviewed changes

pullapprove bot added review - approved and removed review - pending labels Nov 9, 2021

woody-apple merged commit be75489 into project-chip:master Nov 9, 2021

bzbarsky-apple deleted the faster-write-compile branch November 9, 2021 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up compilation of all our WriteAttribute machinery. #11603

Speed up compilation of all our WriteAttribute machinery. #11603

bzbarsky-apple commented Nov 9, 2021

github-actions bot commented Nov 9, 2021

github-actions bot commented Nov 9, 2021 •

edited

Loading

bzbarsky-apple commented Nov 9, 2021

Speed up compilation of all our WriteAttribute machinery. #11603

Speed up compilation of all our WriteAttribute machinery. #11603

Conversation

bzbarsky-apple commented Nov 9, 2021

Problem

Change overview

Testing

github-actions bot commented Nov 9, 2021

github-actions bot commented Nov 9, 2021 • edited Loading

bzbarsky-apple commented Nov 9, 2021

github-actions bot commented Nov 9, 2021 •

edited

Loading