-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Zephyr instrumentation subsystem #57371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments on retained memory usage - would suggest using retention subsystem.
|
||
*/ | ||
|
||
const static struct device *retained_mem_dev = DEVICE_DT_GET(DT_ALIAS(retainedmemdevice)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would use a chosen node for this, similar to this: https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/boot/mcuboot_recovery_retention/boards/nrf52840dk_nrf52840_mem.overlay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm chosen
in this example is select boot mode, which is not really necessary, but I'm not sure I'm following you here. Could you please expand on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a board dts or an overlay, you could have something like this:
retainedmem {
compatible = "zephyr,retained-ram";
status = "okay";
#address-cells = <1>;
#size-cells = <1>;
instru0: instru@0 {
compatible = "zephyr,retention";
status = "okay";
reg = <0x0 0x100>;
prefix = [08 04];
checksum = <1>;
};
};
chosen {
zephyr,instrumentation-area = &instru0;
};
Then that way you can access it by using static const struct device *retained_mem_dev = DEVICE_DT_GET(DT_CHOSEN(zephyr_instrumentation_area));
This is not a required change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, got it. Thanks for the explanation!
void* stopper_callee; | ||
|
||
/* | ||
* TODO(gromero): remove hack and use new RETAINED_MEM_MUTEXES=n config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise retention subsystem has a similar Kconfig, note that both retained memory and retention subsystem mutex Kconfigs would need to be set to n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm Hi. I was not able to find any Kconfig syntax that would allow forcing RETAINED_MEM_MUTEXES=n
and RETENTION_MUTEXES=n
from subsys/instrumentation/Kconfig
. I put them in prj.conf
(which works, but is annoying for the user that would need to know about it and set it per project). Any idea on how to set them in Kconfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is not possible to de-select a Kconfig from a Kconfig, I will need to rework how it is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have posted this: #59254 will have a look at it after the 3.4 release is complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm Hi Jamie. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass look see and spotted some things, will wait for the non-draft to look harder
} | ||
|
||
__no_instrumentation__ | ||
void instr_event_handler(enum instr_event_types type, void *callee, void *caller) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to account for SMP in any way. Presumably it would be some metadata in the event or an entirely different stream (probably preferable to avoid locking) per core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teburd Right, it doesn't currently account for SMP. But indeed, what I had in mind was to use one stream per CPU to avoid locking/lock contention issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only downside of this is that output streams need to be muxed in some way then, and will have some lock likely around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teburd Not sure if I understood your last comment correctly. From your first comment I understood that to avoid locking issues you're suggesting one buffer -- so one stream -- per CPU (you said "core" but since a core may have multiple CPUs, I assumed it would be actually one buffer/stream per CPU to avoid the locking, because each CPU will execute, for instance, different threads, hence different code and stack to trace/profile) -- which I agreed. I'm confused about which approach you referred to as "this" in your last comment tho :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to come back later to this
@nordicjm @teburd @carlocaione @nashif Thanks a lot for the first pass and for the valuable suggestions! I'm working on getting the CI green now, but I'm also going on vacation today, so bear with me. I'll address your comments next :-) |
Another question that came to mind, could this work with a sampling profiler, e.g. ITM periodically sampling of the program counter? |
The underlying polling mechanism would be quite different than the compiler instrumentation this is based on, where entry/exit hooks are inserted at compile time. The subsystem could certainly be updated to accommodate that in a second stage, using the underlying buffer processing tools, transports, etc., but I think doing one job well with the compiler-generated entry/exit hooks first would be more efficient to start. |
Architecture WG:
|
ddceaed
to
aecf328
Compare
|
||
#include <zephyr/instrumentation/instrumentation.h> | ||
|
||
#define DEBUG 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this should be here
|
||
void instr_rb_init(void); | ||
|
||
struct instr_record *instr_rb_put_item_claim(enum instr_event_types type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs documentation
struct instr_record *instr_rb_get_item(struct instr_record *record); | ||
int instr_rb_get_item_wrapping(struct instr_record *record); | ||
|
||
/* TODO(gromero): get_item_size() probably not for external use. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO needs to be done
subsys/instrumentation/Kconfig
Outdated
default 256 | ||
range 1 4096 | ||
help | ||
Maximum number of functions to collect statistics. Set the maximum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks.
* | ||
*/ | ||
|
||
const struct device *retention0 = DEVICE_DT_GET(DT_NODELABEL(retention0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retention works in uninitialised RAM already, if it needs additional code to work across another core than it can be added as a new retention driver, that's not a problem
#endif | ||
|
||
/* See instr_fundamentals_initialized() */ | ||
uint16_t magic = 0xABBA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would want to be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that's not right. Making it static
defeats the purpose of 'magic'. static
variables are initialized before program execution. What is needed here is a dynamic variable, which will be initialized at runtime, this way it's possible to use that variable (magic
) to verify that all other important dynamic variables are properly initialized too. I'm reverting this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The likelyhood of another variable in a program or library being just called "magic" seems fairly high, so by not making this static you're creating a high chance of variable name collision.
head = 0; | ||
|
||
/* Second, write remaining item bytes not written yet. */ | ||
second_chunk = instr_rb_get_item_size(rec->header.type) - first_chunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason the zephyr ring buffer isn't used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordicjm The reason is that we planned at the beginning to use certain fields in the packet to inform variable length records allowing "maximum compression" when transferring the records from the target to the host. But the data stream is decoded in the host using lib babeltrace. However, we found out that it currently only supports Common Trace Format v1.8, which does not support decoding variable length data based on a length field (only for char[] iirc). Hence I ended up using fixed-length records, which now can be used with the Zephyr ring buffer API (which only supports fixed-length items). I'm investigating now if I can replace it with Zephyr ring buffer API to simply the PR for the moment.
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
I just would like to say I'm finding some cycles to address the latest comments on this PR this week. |
d09b4ee
to
998d743
Compare
56b8103
to
93767f7
Compare
Cool, would be great to see this and improvements on it in the tree |
93767f7
to
8c966ad
Compare
I've updated the PR addressing most of the @nordicjm comments. While I wait for the CI to get green I'm investigating if I can replace my variable-length ring buffer API with Zephyr's ring buffer API to simply the PR for the moment. |
9e2142c
to
5f573cc
Compare
Blocked by issue #66334 |
This is merged in, so hopefully unblocks progress here? |
@microbuilder Yep, thanks! |
I was able to get this working in a virtualized environment ( Steps:
There are existing knobs to tie the serial to a |
We should get that into any eventual documentation as an example. |
92be12d
to
f86f3b9
Compare
@carlocaione @evgeniy-paltsev cc @nashif @nordicjm @teburd @erwango Hi! Folks, this PR is done for another review in my view. @carlocaione Erik (@Ablu) addressed your comment on allowing the instrumentation to run on a VM, so you don't have to touch any cable to try it -- see the example in Erik's comment above. I've rebased on main and updated the branch with his changes. So now it's trivial to experiment with the instrumentation using the mps2_an385 board (QEMU VM). @evgeniy-paltsev, your comments were addressed but are still blocking the PR. Could you please have another look? In general, how can we move on and get this PR merged? Over the months since my presentation in Prague a handful of people showed interest in this work and also expressed that it's a reasonable approach. Is there anything controversial that needs more discussion? |
The CI errors seem not related to the changes in this PR. Should I re-trigger? |
f86f3b9
to
2ca67b5
Compare
I accidentally closed this RFC PR when trying to understand the CI errors. But no worries, I think it's time to remove the RFC status and update the description text since the code was tested on new boards since its first version and after several review rounds. Please check PR: #70253 |
Hello Zephyr community!
This is the PoC / initial implementation for the proposed Zephyr instrumentation subsystem. It leverages the compiler function instrumentation to collect tracing and profiling information in Zephyr RTOS.
This has been tested against the ST
b_u585i_iot02a
board. Please note that, since it relies on theretained_mem
driver, if another board is used for testing the proper device-tree overlay for the target board needs to be added toboards/
dir accordingly.Please see
samples/subsys/instrumentation/src/main.c
sample for details on how to experiment with it.RFC: #57373
Thanks!
Cheers,
Gustavo & Kevin