From da099a975985ae19dc2e36c38b46135caff45456 Mon Sep 17 00:00:00 2001 From: Ivo Anjo Date: Thu, 7 Nov 2024 14:55:15 +0000 Subject: [PATCH] [PROF-9470] Enable "heap clean after GC" profiler optimization by default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit **What does this PR do?** This PR changes the optimization added in #4020 to be enabled by default. I've collected a fresh set of benchmarking results for this feature in [this google doc](https://docs.google.com/document/d/143jmyzB7rMJ9W2hKN0JoDbjo2m3oCVCzvPToHVjLRAM/edit?tab=t.0#heading=h.f00wz5x8kwg6). The TL;DR is that results seem to be... very close. E.g. sometimes we slightly improve things, but often the numbers seem too close to tell. But on the other hand this also means that there are no regressions, and thus no reason not to enable the feature by default. **Motivation:** As a recap, without this optimization, the Ruby heap profiler works by sampling allocated objects, collecting and keeping metadata about these objects (stack trace, etc). Then, at serialization time (every 60 seconds), the profiler checks which objects are still alive; any objects still alive get included in the heap profile; any objects that have since been garbage collected get their metadata dropped. The above scheme has a weak-point: some objects are allocated and almost immediately become garbage collected. Because the profiler only checks for object liveness at serialization time, this can mean that in the extreme, an object born and collected at the beginning of the profiling period can still be tracked for almost 60 seconds until the profiler finally figures out that the object is no longer alive. This has two consequences: 1. The profiler uses more memory, since it’s collecting metadata for already-dead objects 2. The profiler has more work to do at the end of the 60-second period – it needs to check an entire 60 seconds of sampled objects The heap profiling clean after GC optimization adds an extra mechanism that, based on Ruby GC activity, triggers periodic checking of young objects (e.g. objects that have been alive for few GC generations). Thus: a. The profiler identifies and clears garbage objects faster, thus overall needing less memory b. The profiler has less work to do at the end of the 60-second period ...trading it off with a smaller periodic pass **Additional Notes:** I've also removed the separate benchmarking configuration, to avoid having too many long-running benchmarking variants. **How to test the change?** I've updated the specs for the setting, and the optimization itself has existing test coverage that was added back in #4020. --- .gitlab/benchmarks.yml | 10 ---------- lib/datadog/core/configuration/settings.rb | 6 +++--- spec/datadog/core/configuration/settings_spec.rb | 8 ++++---- 3 files changed, 7 insertions(+), 17 deletions(-) diff --git a/.gitlab/benchmarks.yml b/.gitlab/benchmarks.yml index fb0d18f3043..bd5bf1dbbea 100644 --- a/.gitlab/benchmarks.yml +++ b/.gitlab/benchmarks.yml @@ -104,16 +104,6 @@ only-profiling-heap: DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED: "true" ADD_TO_GEMFILE: "gem 'datadog', github: 'datadog/dd-trace-rb', ref: '$CI_COMMIT_SHA'" -only-profiling-heap-clean-after-gc: - extends: .benchmarks - variables: - DD_BENCHMARKS_CONFIGURATION: only-profiling - DD_PROFILING_ENABLED: "true" - DD_PROFILING_ALLOCATION_ENABLED: "true" - DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED: "true" - DD_PROFILING_HEAP_CLEAN_AFTER_GC_ENABLED: "true" - ADD_TO_GEMFILE: "gem 'datadog', github: 'datadog/dd-trace-rb', ref: '$CI_COMMIT_SHA'" - only-profiling-gvl: extends: .benchmarks variables: diff --git a/lib/datadog/core/configuration/settings.rb b/lib/datadog/core/configuration/settings.rb index f9e49c3d00b..eb0386b3004 100644 --- a/lib/datadog/core/configuration/settings.rb +++ b/lib/datadog/core/configuration/settings.rb @@ -518,13 +518,13 @@ def initialize(*_) # Controls if the heap profiler should attempt to clean young objects after GC, rather than just at # serialization time. This lowers memory usage and high percentile latency. # - # Only takes effect when used together with `gc_enabled: true` and `experimental_heap_enabled: true`. + # Only has effect when used together with `gc_enabled: true` and `experimental_heap_enabled: true`. # - # @default false + # @default true option :heap_clean_after_gc_enabled do |o| o.type :bool o.env 'DD_PROFILING_HEAP_CLEAN_AFTER_GC_ENABLED' - o.default false + o.default true end end diff --git a/spec/datadog/core/configuration/settings_spec.rb b/spec/datadog/core/configuration/settings_spec.rb index 1ad353dd737..8bc17004801 100644 --- a/spec/datadog/core/configuration/settings_spec.rb +++ b/spec/datadog/core/configuration/settings_spec.rb @@ -932,7 +932,7 @@ context 'is not defined' do let(:environment) { nil } - it { is_expected.to be false } + it { is_expected.to be true } end [true, false].each do |value| @@ -947,10 +947,10 @@ describe '#heap_clean_after_gc_enabled=' do it 'updates the #heap_clean_after_gc_enabled setting' do - expect { settings.profiling.advanced.heap_clean_after_gc_enabled = true } + expect { settings.profiling.advanced.heap_clean_after_gc_enabled = false } .to change { settings.profiling.advanced.heap_clean_after_gc_enabled } - .from(false) - .to(true) + .from(true) + .to(false) end end