Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

Merged

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Nov 28, 2023

What does this PR do?

This PR follows #3281 by actually implementing the heap recorder native module to support tracking the number (not size yet) of live heap objects.

How does it work?

The heap recorder essentially plugs into the allocation sampling mechanism and will record the object ids (via Object::object_id) and allocation stacktraces of all objects whose allocation we sampled.

At profile serialization time, we flush/update the heap recorder, at which point we make use of ObjectSpace::_id2ref to try to obtain an object reference from our stored ids. This will fail only in these cases:

  1. The id isn't valid (shouldn't happen since we record the id directly by calling Object::object_id.
  2. The object the id refers to is no longer alive -> This leads us to clean up the associated recorded live object record and acts as a more robust alternative to listening to free tracepoints (which are easy to miss).
  3. The VM is operating in multi-ractor mode and the object we sampled is not shareable between ractors -> We consciously choose not to support ractors initially for this feature so we'll also handle these as free.

For all ids for which we were still able to retrieve a valid object reference, we'll iterate through them and generate corresponding libdatadog samples with heap-live-samples values (properly scaled according to allocation sampling rate).

NOTE: This implementation relies on the new object id mechanism introduced in Ruby >= 2.7 which guarantees uniqueness (i.e. never re-used) and stickiness across eventual memory compaction. Prior to Ruby 2.7 object ids were related to memory locations and it'd thus be very easy to "miss" frees if the memory location got re-used for a different object. If that happened, we'd keep on reporting the original object as alive when it really wasn't.

Motivation:
Actually implement basic heap profiling (count only, no size yet).

Additional Notes:

How to test the change?
Non-zero heap-live-samples values should now be included in the resulting Ruby profiles when an app is executed with:

DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLEd=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true ddtracerb exec ...

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added the profiling Involves Datadog profiling label Nov 28, 2023
@AlexJF AlexJF changed the base branch from alexjf/prof-8667-heap-profiling-part3-free-and-queue to alexjf/prof-8667-heap-profiling-part2-allocations November 28, 2023 18:21
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch 2 times, most recently from ca8be88 to 37d81ae Compare November 28, 2023 19:15
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes, but this is a pretty interesting alternative!

@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 37d81ae to 89a15dc Compare December 12, 2023 17:44
@github-actions github-actions bot added appsec Application Security monitoring product core Involves Datadog core libraries integrations Involves tracing integrations tracing labels Dec 12, 2023
@AlexJF AlexJF changed the base branch from alexjf/prof-8667-heap-profiling-part2-allocations to master December 12, 2023 18:17
@AlexJF AlexJF changed the title [PROF-8667] Heap Profiling - Part 2 - Live Tracking [PROF-8667] Heap Profiling - Part 2 - Heap Recorder Dec 12, 2023
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 89a15dc to a8357de Compare December 12, 2023 18:26
@github-actions github-actions bot removed integrations Involves tracing integrations appsec Application Security monitoring product tracing core Involves Datadog core libraries labels Dec 12, 2023
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from a8357de to 12c22c5 Compare December 13, 2023 13:28
@AlexJF AlexJF marked this pull request as ready for review December 13, 2023 13:30
@AlexJF AlexJF requested review from a team as code owners December 13, 2023 13:30
[PROF-8667] Some comments and location fixes.

[PROF-8667] Fix typo and enable 'records live heap objects' test.

[PROF-8667] Heap Profiling - Part 2 - Live Tracking

[PROF-8667] Remove one remaining usage of sum.

[PROF-8667] Address comments
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 12c22c5 to 051c803 Compare December 13, 2023 15:47
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a pass on everything but the heap_recorder.c -- I was halfway there still. Will come back tomorrow to finish it :)

ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (14b40d4) 98.23% compared to head (1106473) 98.23%.
Report is 22 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3287   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files        1253     1253           
  Lines       72982    73015   +33     
  Branches     3429     3430    +1     
=======================================
+ Hits        71693    71728   +35     
+ Misses       1289     1287    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch 2 times, most recently from b5d8df2 to 5e6537d Compare December 14, 2023 18:11
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from e9c26ce to ed57a26 Compare December 14, 2023 18:52
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while, but this looks great! Left a final set of comments/suggestions, but here sir, take my 👍

ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
Comment on lines 747 to 769
ddog_prof_Location *locations_arr = ruby_xcalloc(locations_len, sizeof(ddog_prof_Location));
for (size_t i = 0; i < locations_len; i++) {
VALUE location = rb_ary_entry(locations, i);
ENFORCE_TYPE(location, T_ARRAY);
VALUE name = rb_ary_entry(location, 0);
VALUE filename = rb_ary_entry(location, 1);
VALUE line = rb_ary_entry(location, 2);
ENFORCE_TYPE(name, T_STRING);
ENFORCE_TYPE(filename, T_STRING);
ENFORCE_TYPE(line, T_FIXNUM);
locations_arr[i] = (ddog_prof_Location) {
.line = line,
.function = (ddog_prof_Function) {
.name = char_slice_from_ruby_string(name),
.filename = char_slice_from_ruby_string(filename),
}
};
}
ddog_prof_Slice_Location ddog_locations = {
.len = locations_len,
.ptr = locations_arr,
};
heap_recorder_testonly_assert_hash_matches(ddog_locations);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're leaking the locations_arr 🤣.

I guess to avoid having to rb_rescue2 and whatnot, maybe heap_recorder_testonly_assert_hash_matches could return Qnil or an exception (e.g. using rb_exc_new_str) and then this method would free the locations_arr and then rb_exc_raise(...) the result of the matches if it was not Qnil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! We are running in C99 so I think we can rely on variable length arrays here and keep everything sane?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's reasonable.

To be honest in the past I've experimented with variable length arrays in ddtrace and did not fully grasp what happens on stack overflows. With what I know today, maybe what I saw was an interaction with Ruby's detection of stack overflows...? Need to retest again.

This to say, I'm not 100% confident of using them for the actual production code yet, but for the testing code I don't think it matters at all.

spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
Comment on lines +640 to +649
heap_record_key* heap_record_key_new(heap_stack *stack) {
heap_record_key *key = ruby_xmalloc(sizeof(heap_record_key));
key->type = HEAP_STACK;
key->heap_stack = stack;
return key;
}

void heap_record_key_free(heap_record_key *key) {
ruby_xfree(key);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It looks to me that maybe we could have the heap_record_key own the stack as well?

E.g. whenever a heap_record key gets malloc'd, we know it's because there's a new stack being tracked, and they will also get freed together, so we could make freeing the stack become the responsibility of the key and simplify this a bit I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a kind of 1:1 mapping between a heap-allocated heap_record_key and a heap_record so whether stack is owned by key or record seems a bit equivalent from a "created together, freed together".

IMO, the more flexible nature of heap_record_key with its potential non-heap-allocated stacks or even not having a heap_stack at all, makes it slightly more confusing to give ownership to. Thus why I preferred to keep ownership to the record and not the key.

@AlexJF AlexJF merged commit bb96f8f into master Dec 19, 2023
218 checks passed
@AlexJF AlexJF deleted the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch December 19, 2023 12:18
@github-actions github-actions bot added this to the 1.19.0 milestone Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants