-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-9476] Managed string storage PoC #3628
base: master
Are you sure you want to change the base?
Conversation
419511d
to
c255595
Compare
46f1c96
to
872fe65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of notes/ideas :)
@@ -59,7 +59,7 @@ typedef struct { | |||
} heap_recorder_iteration_data; | |||
|
|||
// Initialize a new heap recorder. | |||
heap_recorder* heap_recorder_new(void); | |||
heap_recorder* heap_recorder_new(const ddog_prof_ManagedStringStorage *string_storage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Doesn't const ddog_prof_ManagedStringStorage *
mean the data being pointed to is immutable? 🤔
Because I'm pretty sure we're mutating it via the libdatadog helpers
uint32_t label_key_allocation_class; | ||
uint32_t label_key_gc_gen_age; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It occurs to me that it may be nice for libdatadog to specify a typedef (of a one-item-struct) to represent this, so we get a better type than "a naked integer".
static u_int32_t intern_or_raise(struct stack_recorder_state *state, const char *str) { | ||
if (str == NULL) { | ||
return 0; | ||
} | ||
ddog_CharSlice char_slice; | ||
char_slice.len = strlen(str); | ||
char_slice.ptr = str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It may be worth introducing a macro version of this to allow the compiler to hardcode the strlen at compile-time for static strings, like DDOG_CHARSLICE_C
does.
.name = DDOG_CHARSLICE_C(""), | ||
.name_id = frame->name, | ||
.filename = DDOG_CHARSLICE_C(""), | ||
.filename_id = frame->filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll see how this evolves on the libdatadog but I still think it's really weird/bizarre/meh that libdatadog doesn't allow us to just use NULL for strings in situations where they can be empty, as this looks a lot more awkward because of it.
st_index_t hash = st_hash(&frame->name, sizeof(frame->name), seed); | ||
hash = st_hash(&frame->filename, sizeof(frame->filename), hash); | ||
hash = st_hash(&frame->line, sizeof(frame->line), hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I guess we could probably hash the whole frame object in one now?
E.g. st_hash(frame, sizeof(frame), ...)
?
.name = string_from_char_slice(location->function.name), | ||
.filename = string_from_char_slice(location->function.filename), | ||
.name = intern_or_raise(recorder, &location->function.name), | ||
.filename = intern_or_raise(recorder, &location->function.filename), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for the PoC (or even for v1?) this is enough, although conceptually this could be done way earlier, e.g. in the stack recorder, since by definition we know all of these strings must have already been interned, because they've been sampled by the allocation profiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, I just realized that the managed string storage is different from the regular profile string storage so no, actually, these strings have not been interned already (but they could be...)
void heap_stack_free(heap_recorder *recorder, heap_stack *stack) { | ||
for (u_int16_t i = 0; i < stack->frames_len; i++) { | ||
unintern_or_raise(recorder, stack->frames[i].filename); | ||
unintern_or_raise(recorder, stack->frames[i].name); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: More of a naming thing, but I had to check exactly what unintern
does on the libdatadog side, since I was thinking "wait, does this mean we're deleting the string right now?"
Maybe release or something like that may be a bit more explicit? (Perhaps even decrement_usage?)
for (size_t i = 0; i < stack1->frames_len; i++) { | ||
heap_frame* frame1 = &stack1->frames[i]; | ||
heap_frame* frame2 = &stack2->frames[i]; | ||
|
||
// Then come names, they are usually smaller than filenames | ||
const char *name1, *name2; | ||
size_t name1_len = heap_record_key_entry_name(key_record1, i, &name1); | ||
size_t name2_len = heap_record_key_entry_name(key_record2, i, &name2); | ||
if (name1_len != name2_len) { | ||
return ((int) name1_len) - ((int) name2_len); | ||
} | ||
int name_cmp_result = strncmp(name1, name2, name1_len); | ||
if (name_cmp_result != 0) { | ||
return name_cmp_result; | ||
if (frame1->name != frame2->name) { | ||
return ((int) frame1->name) - ((int) frame2->name); | ||
} | ||
|
||
// Then come filenames | ||
const char *filename1, *filename2; | ||
int64_t filename1_len = heap_record_key_entry_filename(key_record1, i, &filename1); | ||
int64_t filename2_len = heap_record_key_entry_filename(key_record2, i, &filename2); | ||
if (filename1_len != filename2_len) { | ||
return ((int) filename1_len) - ((int) filename2_len); | ||
if (frame1->filename != frame2->filename) { | ||
return ((int) frame1->filename) - ((int) frame2->filename); | ||
} | ||
int filename_cmp_result = strncmp(filename1, filename2, filename1_len); | ||
if (filename_cmp_result != 0) { | ||
return filename_cmp_result; | ||
|
||
if (frame1->line != frame2->line) { | ||
return ((int) frame1->line) - ((int)frame2->line); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an optimization here, now that this is all just a bunch of numbers, maybe we can just memcmp
the two stacks? (I think this would be a stable comparison similar to the current implementation?)
st_index_t heap_stack_hash(heap_stack *stack, st_index_t seed) { | ||
st_index_t hash = seed; | ||
for (uint64_t i = 0; i < stack->frames_len; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is all a bunch of numbers, I wonder if we could hash it all in one go (rather than going frame by frame).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this along with the cmp version are great points!
What does this PR do?
Make use of new string storage APIs introduced in DataDog/libdatadog#414 to prevent having to keep a lot of strings being duplicated and managed by heap recorder.
This is currently a PoC and not production ready.
Motivation:
A bottleneck analysis using Datadog's native profiler has shown that stacktrace management in heap recorder is one of the biggest bottlenecks with heap profiling:
This work is similar to that being done already in libdatadog and thus is work we can potentially remove entirely from the equation if libdatadog exposed APIs that allowed stacktrace storage to survive across profiles.
This first implementation accomplishes this solely on the strings table but could potentially be expanded to the entire stacktrace. Nevertheless, it already shows some great promise by allowing sampleRate=1 with negligible overhead compared to the currently merged implementation with sampleRate=10 in our test app:
Additional Notes:
How to test the change?
Unsure? Have a question? Request a review!