Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-8667] Heap Profiling - Part 5 - Size #3333

Merged
merged 5 commits into from
Jan 5, 2024

Conversation

AlexJF
Copy link
Contributor

@AlexJF AlexJF commented Dec 15, 2023

What does this PR do?

This PR follows #3329 by adding live-heap-size values to heap samples. These values give an idea of how much estimated heap memory the weighted samples tracked by live-heap-samples are using.

How does it work?

On each heap_recorder_flush, for all object records associated with objects that were deemed to be still alive, we call rb_obj_memsize_of to get an updated value for that object's estimated size in memory.

When writing a heap sample to a profile on serialization, we use the weighted object size (i.e. multiplied by the sampling weight of that object) as the value for the heap-live-size profile type for that sample.

Motivation:
Go one step further from understanding the approximate number of objects alive in the heap, to understanding the approximate space those objects are using in heap.

Additional Notes:

How to test the change?
Heap samples included in profiles when an app is executed with:

DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLEd=true DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED=true ddtracerb exec ...

should now include values for heap-live-size profile type. This can be trivially tested by downloading these profiles and looking at them with some pprof parsing tool like https://github.com/felixge/pprofutils

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

@github-actions github-actions bot added the profiling Involves Datadog profiling label Dec 15, 2023
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from 357bc28 to bf9c073 Compare December 15, 2023 18:27
@AlexJF AlexJF marked this pull request as ready for review December 15, 2023 18:28
@AlexJF AlexJF requested review from a team as code owners December 15, 2023 18:28
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM.

I'm curious about the performance impact of this one.

case T_ARRAY:
case T_HASH:
case T_REGEXP:
case T_DATA:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I soft-wonder if we'll run into trouble with T_DATA (or data/typeddata) objects, since the size for these is up to native extension authors. Since these methods don't get used very often, they may be buggy or inefficient. I guess... let's maybe keep an eye out on these ones?

// Wrapper around rb_obj_memsize_of to avoid hitting crashing paths.
size_t ruby_obj_memsize_of(VALUE obj) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, the crashing path is only when an object is invalid -- all of the Ruby types are accounted for in rb_obj_memsize_of(...) so I'm not sure it's worth having this wrapper around it -- rb_bug(...) is a pretty big thing that the VM only uses when it truly believes there's something wrong at the VM/native level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra "hidden" rb_bug call for the T_NODE case which was the one that worried me a bit more. The fact that they use rb_bug in that place may be a hint that it should be impossible for us to accidentally track one. But I'm also not 100% sure if that's the case or if they just assumed whoever called obj_memsize_of would be doing that check first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right -- I had missed that one. I don't know enough about RNode; intuitively if they're calling rb_bug it would be weird if we ever got handed one of those objects but I definitely see your concern.

I guess it may worth leaving a bit more context as a comment on why we're doing this, but I'm convinced :)

ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch 2 times, most recently from f79c044 to 2d2f8a3 Compare December 21, 2023 12:08
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from bf9c073 to ea38d22 Compare December 21, 2023 12:41
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch 2 times, most recently from 458ed3c to 581b912 Compare December 21, 2023 14:13
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from ea38d22 to e73b8d7 Compare December 21, 2023 14:14
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch from 581b912 to 4610e55 Compare December 21, 2023 14:41
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from e73b8d7 to a6a9db6 Compare December 21, 2023 14:43
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch 2 times, most recently from c20d042 to 66dfb9d Compare December 21, 2023 15:03
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from a6a9db6 to d580e76 Compare December 21, 2023 15:04
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch from 66dfb9d to 4f3b556 Compare December 21, 2023 15:10
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch 2 times, most recently from df7f59b to 1579489 Compare December 21, 2023 16:45
@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (98917ca) 98.23% compared to head (63c286e) 98.24%.
Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #3333    +/-   ##
========================================
  Coverage   98.23%   98.24%            
========================================
  Files        1253     1254     +1     
  Lines       73039    73204   +165     
  Branches     3431     3429     -2     
========================================
+ Hits        71752    71917   +165     
  Misses       1287     1287            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part4-labels branch from 4f3b556 to 021d9b5 Compare January 3, 2024 10:39
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from 72a056e to 13327a0 Compare January 3, 2024 11:24
Base automatically changed from alexjf/prof-8667-heap-profiling-part4-labels to master January 3, 2024 16:59
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from 13327a0 to 9806fcd Compare January 3, 2024 17:33
@AlexJF AlexJF force-pushed the alexjf/prof-8667-heap-profiling-part5-size branch from 9806fcd to bd49938 Compare January 3, 2024 17:44
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another pass! LGTM 👍

// Wrapper around rb_obj_memsize_of to avoid hitting crashing paths.
size_t ruby_obj_memsize_of(VALUE obj) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right -- I had missed that one. I don't know enough about RNode; intuitively if they're calling rb_bug it would be weird if we ever got handed one of those objects but I definitely see your concern.

I guess it may worth leaving a bit more context as a comment on why we're doing this, but I'm convinced :)

ext/ddtrace_profiling_native_extension/ruby_helpers.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
@@ -85,6 +85,7 @@ def self.build_profiler_component(settings:, agent_settings:, optional_tracer:)
cpu_time_enabled: RUBY_PLATFORM.include?('linux'), # Only supported on Linux currently
alloc_samples_enabled: allocation_profiling_enabled,
heap_samples_enabled: heap_profiling_enabled,
heap_size_enabled: heap_profiling_enabled,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to our future selves, a separate config for this gets introduced in #3360 .

ext/ddtrace_profiling_native_extension/stack_recorder.c Outdated Show resolved Hide resolved
ext/ddtrace_profiling_native_extension/heap_recorder.c Outdated Show resolved Hide resolved
let(:sample_rate) { 50 }
let(:metric_values) do
{ 'cpu-time' => 101, 'cpu-samples' => 1, 'wall-time' => 789, 'alloc-samples' => sample_rate, 'timeline' => 42 }
end
let(:labels) { { 'label_a' => 'value_a', 'label_b' => 'value_b', 'state' => 'unknown' }.to_a }

let(:a_string) { 'a beautiful string' }
let(:an_array) { (1..10).to_a }
let(:a_hash) { { 'a' => 1, 'b' => '2', 'c' => true } }
let(:an_array) { (1..100).to_a.compact }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Huh, interesting, I would've expected Ruby to right-size the underlying array (e.g. not need the compact) when doing #to_a on a range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be needed but you had suggested calling it would enforce it in a previous comment so I added it for good measure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed harmless. Usually an array gets created it gets right-sized, and only later appends trigger the growth behavior (although I haven't looked at Range#to_a so I'm assuming it calls the right APIs to do that ;) )

spec/datadog/profiling/stack_recorder_spec.rb Outdated Show resolved Hide resolved
@AlexJF AlexJF merged commit d809f33 into master Jan 5, 2024
218 checks passed
@AlexJF AlexJF deleted the alexjf/prof-8667-heap-profiling-part5-size branch January 5, 2024 13:06
@github-actions github-actions bot added this to the 1.19.0 milestone Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants