Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-9476] Managed string storage PoC #3628

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented May 6, 2024

What does this PR do?
Make use of new string storage APIs introduced in DataDog/libdatadog#414 to prevent having to keep a lot of strings being duplicated and managed by heap recorder.

This is currently a PoC and not production ready.

Motivation:

A bottleneck analysis using Datadog's native profiler has shown that stacktrace management in heap recorder is one of the biggest bottlenecks with heap profiling:

Picture 1 Picture 2
Stacktrace storage in heap recorder (copying, hashing, cleanup)
~4% of entire CPU profile
~30% of the overhead during sampling
~16.3% of the overhead during flushing

This work is similar to that being done already in libdatadog and thus is work we can potentially remove entirely from the equation if libdatadog exposed APIs that allowed stacktrace storage to survive across profiles.

This first implementation accomplishes this solely on the strings table but could potentially be expanded to the entire stacktrace. Nevertheless, it already shows some great promise by allowing sampleRate=1 with negligible overhead compared to the currently merged implementation with sampleRate=10 in our test app:

image image

Additional Notes:

How to test the change?

Unsure? Have a question? Request a review!

@github-actions github-actions bot added the profiling Involves Datadog profiling label May 6, 2024
@AlexJF AlexJF force-pushed the alexjf/prof-9476-managed-string-storage branch 3 times, most recently from 419511d to c255595 Compare August 29, 2024 15:05
@AlexJF AlexJF force-pushed the alexjf/prof-9476-managed-string-storage branch from 46f1c96 to 872fe65 Compare September 2, 2024 13:40
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of notes/ideas :)

@@ -59,7 +59,7 @@ typedef struct {
} heap_recorder_iteration_data;

// Initialize a new heap recorder.
heap_recorder* heap_recorder_new(void);
heap_recorder* heap_recorder_new(const ddog_prof_ManagedStringStorage *string_storage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Doesn't const ddog_prof_ManagedStringStorage * mean the data being pointed to is immutable? 🤔

Because I'm pretty sure we're mutating it via the libdatadog helpers

Comment on lines +197 to +198
uint32_t label_key_allocation_class;
uint32_t label_key_gc_gen_age;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It occurs to me that it may be nice for libdatadog to specify a typedef (of a one-item-struct) to represent this, so we get a better type than "a naked integer".

Comment on lines +1044 to +1050
static u_int32_t intern_or_raise(struct stack_recorder_state *state, const char *str) {
if (str == NULL) {
return 0;
}
ddog_CharSlice char_slice;
char_slice.len = strlen(str);
char_slice.ptr = str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It may be worth introducing a macro version of this to allow the compiler to hardcode the strlen at compile-time for static strings, like DDOG_CHARSLICE_C does.

Comment on lines +601 to +604
.name = DDOG_CHARSLICE_C(""),
.name_id = frame->name,
.filename = DDOG_CHARSLICE_C(""),
.filename_id = frame->filename,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll see how this evolves on the libdatadog but I still think it's really weird/bizarre/meh that libdatadog doesn't allow us to just use NULL for strings in situations where they can be empty, as this looks a lot more awkward because of it.

Comment on lines +829 to 831
st_index_t hash = st_hash(&frame->name, sizeof(frame->name), seed);
hash = st_hash(&frame->filename, sizeof(frame->filename), hash);
hash = st_hash(&frame->line, sizeof(frame->line), hash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I guess we could probably hash the whole frame object in one now?

E.g. st_hash(frame, sizeof(frame), ...)?

Comment on lines -926 to +852
.name = string_from_char_slice(location->function.name),
.filename = string_from_char_slice(location->function.filename),
.name = intern_or_raise(recorder, &location->function.name),
.filename = intern_or_raise(recorder, &location->function.filename),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for the PoC (or even for v1?) this is enough, although conceptually this could be done way earlier, e.g. in the stack recorder, since by definition we know all of these strings must have already been interned, because they've been sampled by the allocation profiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, I just realized that the managed string storage is different from the regular profile string storage so no, actually, these strings have not been interned already (but they could be...)

Comment on lines +861 to 865
void heap_stack_free(heap_recorder *recorder, heap_stack *stack) {
for (u_int16_t i = 0; i < stack->frames_len; i++) {
unintern_or_raise(recorder, stack->frames[i].filename);
unintern_or_raise(recorder, stack->frames[i].name);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: More of a naming thing, but I had to check exactly what unintern does on the libdatadog side, since I was thinking "wait, does this mean we're deleting the string right now?"

Maybe release or something like that may be a bit more explicit? (Perhaps even decrement_usage?)

Comment on lines +888 to 903
for (size_t i = 0; i < stack1->frames_len; i++) {
heap_frame* frame1 = &stack1->frames[i];
heap_frame* frame2 = &stack2->frames[i];

// Then come names, they are usually smaller than filenames
const char *name1, *name2;
size_t name1_len = heap_record_key_entry_name(key_record1, i, &name1);
size_t name2_len = heap_record_key_entry_name(key_record2, i, &name2);
if (name1_len != name2_len) {
return ((int) name1_len) - ((int) name2_len);
}
int name_cmp_result = strncmp(name1, name2, name1_len);
if (name_cmp_result != 0) {
return name_cmp_result;
if (frame1->name != frame2->name) {
return ((int) frame1->name) - ((int) frame2->name);
}

// Then come filenames
const char *filename1, *filename2;
int64_t filename1_len = heap_record_key_entry_filename(key_record1, i, &filename1);
int64_t filename2_len = heap_record_key_entry_filename(key_record2, i, &filename2);
if (filename1_len != filename2_len) {
return ((int) filename1_len) - ((int) filename2_len);
if (frame1->filename != frame2->filename) {
return ((int) frame1->filename) - ((int) frame2->filename);
}
int filename_cmp_result = strncmp(filename1, filename2, filename1_len);
if (filename_cmp_result != 0) {
return filename_cmp_result;

if (frame1->line != frame2->line) {
return ((int) frame1->line) - ((int)frame2->line);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an optimization here, now that this is all just a bunch of numbers, maybe we can just memcmp the two stacks? (I think this would be a stable comparison similar to the current implementation?)

Comment on lines 870 to 872
st_index_t heap_stack_hash(heap_stack *stack, st_index_t seed) {
st_index_t hash = seed;
for (uint64_t i = 0; i < stack->frames_len; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is all a bunch of numbers, I wonder if we could hash it all in one go (rather than going frame by frame).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this along with the cmp version are great points!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants