[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

AlexJF · 2023-11-28T18:20:51Z

What does this PR do?

This PR follows #3281 by actually implementing the heap recorder native module to support tracking the number (not size yet) of live heap objects.

How does it work?

The heap recorder essentially plugs into the allocation sampling mechanism and will record the object ids (via Object::object_id) and allocation stacktraces of all objects whose allocation we sampled.

At profile serialization time, we flush/update the heap recorder, at which point we make use of ObjectSpace::_id2ref to try to obtain an object reference from our stored ids. This will fail only in these cases:

The id isn't valid (shouldn't happen since we record the id directly by calling Object::object_id.
The object the id refers to is no longer alive -> This leads us to clean up the associated recorded live object record and acts as a more robust alternative to listening to free tracepoints (which are easy to miss).
The VM is operating in multi-ractor mode and the object we sampled is not shareable between ractors -> We consciously choose not to support ractors initially for this feature so we'll also handle these as free.

For all ids for which we were still able to retrieve a valid object reference, we'll iterate through them and generate corresponding libdatadog samples with heap-live-samples values (properly scaled according to allocation sampling rate).

NOTE: This implementation relies on the new object id mechanism introduced in Ruby >= 2.7 which guarantees uniqueness (i.e. never re-used) and stickiness across eventual memory compaction. Prior to Ruby 2.7 object ids were related to memory locations and it'd thus be very easy to "miss" frees if the memory location got re-used for a different object. If that happened, we'd keep on reporting the original object as alive when it really wasn't.

Motivation:
Actually implement basic heap profiling (count only, no size yet).

Additional Notes:

How to test the change?
Non-zero heap-live-samples values should now be included in the resulting Ruby profiles when an app is executed with:

DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLEd=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true ddtracerb exec ...

For Datadog employees:

If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

ivoanjo

Left a few notes, but this is a pretty interesting alternative!

ext/ddtrace_profiling_native_extension/collectors_cpu_and_wall_time_worker.c

ext/ddtrace_profiling_native_extension/heap_recorder.c

ext/ddtrace_profiling_native_extension/ruby_helpers.c

[PROF-8667] Some comments and location fixes. [PROF-8667] Fix typo and enable 'records live heap objects' test. [PROF-8667] Heap Profiling - Part 2 - Live Tracking [PROF-8667] Remove one remaining usage of sum. [PROF-8667] Address comments

ivoanjo

Here's a pass on everything but the heap_recorder.c -- I was halfway there still. Will come back tomorrow to finish it :)

ext/ddtrace_profiling_native_extension/ruby_helpers.c

ext/ddtrace_profiling_native_extension/heap_recorder.c

codecov-commenter · 2023-12-13T18:48:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (14b40d4) 98.23% compared to head (1106473) 98.23%.
Report is 22 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3287   +/-   ##
=======================================
  Coverage   98.23%   98.23%           
=======================================
  Files        1253     1253           
  Lines       72982    73015   +33     
  Branches     3429     3430    +1     
=======================================
+ Hits        71693    71728   +35     
+ Misses       1289     1287    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…rder

ivoanjo

Took me a while, but this looks great! Left a final set of comments/suggestions, but here sir, take my 👍

ext/ddtrace_profiling_native_extension/ruby_helpers.c

ext/ddtrace_profiling_native_extension/heap_recorder.c

ivoanjo · 2023-12-14T20:08:04Z

ext/ddtrace_profiling_native_extension/stack_recorder.c

+  ddog_prof_Location *locations_arr = ruby_xcalloc(locations_len, sizeof(ddog_prof_Location));
+  for (size_t i = 0; i < locations_len; i++) {
+    VALUE location = rb_ary_entry(locations, i);
+    ENFORCE_TYPE(location, T_ARRAY);
+    VALUE name = rb_ary_entry(location, 0);
+    VALUE filename = rb_ary_entry(location, 1);
+    VALUE line = rb_ary_entry(location, 2);
+    ENFORCE_TYPE(name, T_STRING);
+    ENFORCE_TYPE(filename, T_STRING);
+    ENFORCE_TYPE(line, T_FIXNUM);
+    locations_arr[i] = (ddog_prof_Location) {
+      .line = line,
+        .function = (ddog_prof_Function) {
+          .name = char_slice_from_ruby_string(name),
+          .filename = char_slice_from_ruby_string(filename),
+        }
+    };
+  }
+  ddog_prof_Slice_Location ddog_locations = {
+    .len = locations_len,
+    .ptr = locations_arr,
+  };
+  heap_recorder_testonly_assert_hash_matches(ddog_locations);


We're leaking the locations_arr 🤣.

I guess to avoid having to rb_rescue2 and whatnot, maybe heap_recorder_testonly_assert_hash_matches could return Qnil or an exception (e.g. using rb_exc_new_str) and then this method would free the locations_arr and then rb_exc_raise(...) the result of the matches if it was not Qnil?

Good catch! We are running in C99 so I think we can rely on variable length arrays here and keep everything sane?

Yeah, I think that's reasonable.

To be honest in the past I've experimented with variable length arrays in ddtrace and did not fully grasp what happens on stack overflows. With what I know today, maybe what I saw was an interaction with Ruby's detection of stack overflows...? Need to retest again.

This to say, I'm not 100% confident of using them for the actual production code yet, but for the testing code I don't think it matters at all.

spec/datadog/profiling/stack_recorder_spec.rb

ext/ddtrace_profiling_native_extension/heap_recorder.c

ivoanjo · 2023-12-15T18:40:31Z

ext/ddtrace_profiling_native_extension/heap_recorder.c

+heap_record_key* heap_record_key_new(heap_stack *stack) {
+  heap_record_key *key = ruby_xmalloc(sizeof(heap_record_key));
+  key->type = HEAP_STACK;
+  key->heap_stack = stack;
+  return key;
+}
+
+void heap_record_key_free(heap_record_key *key) {
+  ruby_xfree(key);
+}


Minor: It looks to me that maybe we could have the heap_record_key own the stack as well?

E.g. whenever a heap_record key gets malloc'd, we know it's because there's a new stack being tracked, and they will also get freed together, so we could make freeing the stack become the responsibility of the key and simplify this a bit I guess.

There is a kind of 1:1 mapping between a heap-allocated heap_record_key and a heap_record so whether stack is owned by key or record seems a bit equivalent from a "created together, freed together".

IMO, the more flexible nature of heap_record_key with its potential non-heap-allocated stacks or even not having a heap_stack at all, makes it slightly more confusing to give ownership to. Thus why I preferred to keep ownership to the record and not the key.

github-actions bot added the profiling Involves Datadog profiling label Nov 28, 2023

AlexJF changed the base branch from alexjf/prof-8667-heap-profiling-part3-free-and-queue to alexjf/prof-8667-heap-profiling-part2-allocations November 28, 2023 18:21

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch 2 times, most recently from ca8be88 to 37d81ae Compare November 28, 2023 19:15

ivoanjo reviewed Nov 29, 2023

View reviewed changes

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 37d81ae to 89a15dc Compare December 12, 2023 17:44

github-actions bot added appsec Application Security monitoring product core Involves Datadog core libraries integrations Involves tracing integrations tracing labels Dec 12, 2023

AlexJF changed the base branch from alexjf/prof-8667-heap-profiling-part2-allocations to master December 12, 2023 18:17

AlexJF changed the title ~~[PROF-8667] Heap Profiling - Part 2 - Live Tracking~~ [PROF-8667] Heap Profiling - Part 2 - Heap Recorder Dec 12, 2023

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 89a15dc to a8357de Compare December 12, 2023 18:26

github-actions bot removed integrations Involves tracing integrations appsec Application Security monitoring product tracing core Involves Datadog core libraries labels Dec 12, 2023

AlexJF mentioned this pull request Dec 13, 2023

[PROF-8667] Heap Profiling - Part 2 - Allocations #3282

Closed

2 tasks

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from a8357de to 12c22c5 Compare December 13, 2023 13:28

AlexJF marked this pull request as ready for review December 13, 2023 13:30

AlexJF requested review from a team as code owners December 13, 2023 13:30

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from 12c22c5 to 051c803 Compare December 13, 2023 15:47

ivoanjo reviewed Dec 13, 2023

View reviewed changes

[PROF-8667] Address first batch of comments

774c96a

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch 2 times, most recently from b5d8df2 to 5e6537d Compare December 14, 2023 18:11

[PROF-8667] Test to verify matching hash implementations in heap reco…

acab787

…rder

[PROF-8667] Fix heap recorder hash calculations after line type change

ed57a26

AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch from e9c26ce to ed57a26 Compare December 14, 2023 18:52

AlexJF mentioned this pull request Dec 15, 2023

[PROF-8667] Heap Profiling - Part 3 - Snapshot system #3328

Merged

2 tasks

ivoanjo approved these changes Dec 15, 2023

View reviewed changes

AlexJF added 4 commits December 18, 2023 13:02

[PROF-8667] Address more comments.

74ff074

[PROF-8667] Lock comment clarification

36f556d

[PROF-8667] Fix minor issues with heap tests in stack_recorder

1d14698

[PROF-8667] Fix lint error

1106473

AlexJF merged commit bb96f8f into master Dec 19, 2023
218 checks passed

AlexJF deleted the alexjf/prof-8667-heap-profiling-part2-allocations-with-ids branch December 19, 2023 12:18

github-actions bot added this to the 1.19.0 milestone Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

AlexJF commented Nov 28, 2023 •

edited

Loading

ivoanjo left a comment

ivoanjo left a comment

codecov-commenter commented Dec 13, 2023 •

edited

Loading

ivoanjo left a comment

ivoanjo Dec 14, 2023

AlexJF Dec 18, 2023

ivoanjo Jan 2, 2024

ivoanjo Dec 15, 2023

AlexJF Dec 18, 2023

[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

[PROF-8667] Heap Profiling - Part 2 - Heap Recorder #3287

Conversation

AlexJF commented Nov 28, 2023 • edited Loading

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 13, 2023 • edited Loading

Codecov Report

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Dec 14, 2023

Choose a reason for hiding this comment

AlexJF Dec 18, 2023

Choose a reason for hiding this comment

ivoanjo Jan 2, 2024

Choose a reason for hiding this comment

ivoanjo Dec 15, 2023

Choose a reason for hiding this comment

AlexJF Dec 18, 2023

Choose a reason for hiding this comment

AlexJF commented Nov 28, 2023 •

edited

Loading

codecov-commenter commented Dec 13, 2023 •

edited

Loading