-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-8667] Heap Profiling - Part 1 - Setup #3281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking great! Did a first pass but ran out of time, so here's the comments I got so far :)
ext/ddtrace_profiling_native_extension/collectors_cpu_and_wall_time_worker.c
Outdated
Show resolved
Hide resolved
ext/ddtrace_profiling_native_extension/collectors_thread_context.c
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another full detailed pass! I'm still missing a pass at the specs, will do so asap :)
ext/ddtrace_profiling_native_extension/collectors_cpu_and_wall_time_worker.c
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass on the specs!
spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb
Outdated
Show resolved
Hide resolved
spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb
Outdated
Show resolved
Hide resolved
spec/datadog/profiling/collectors/cpu_and_wall_time_worker_spec.rb
Outdated
Show resolved
Hide resolved
0cb2a47
to
b939766
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a last round of comments. Most of them are minor, and the others are more for later than for this PR.
I think this is in good shape to go in. Here sir, have my 👍
it 'do not corrupt/overwrite non-heap-samples' do | ||
expect(non_heap_samples.size).to eq(2) | ||
|
||
sum_cpu_time = 0 | ||
sum_alloc_samples = 0 | ||
|
||
non_heap_samples.each do |s| | ||
sum_cpu_time += s.values[:'cpu-time'] | ||
sum_alloc_samples += s.values[:'alloc-samples'] | ||
end | ||
|
||
expect(sum_cpu_time).to be > 0 | ||
expect(sum_alloc_samples).to eq(@num_allocations * sample_rate) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I very rarely add tests for things that don't happen, since a lot of things don't happen in most cases. I'm curious to know your thinking on why this one is worth having around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with the intention in general. But in this case, because of the extra work needed to "mock" allocation tracking in these tests, this section is the only one where we actually have heap_samples_enabled
.
So all the other tests in this suite that validate non-heap profiling do not cover heap_samples_enabled=true
case which is also the only case where we do "out-of-band" sample writing that could, in theory, overwrite/mess with something that happens in the normal recording. The intention of this test is to serve as a canary for such a problem.
An alternative might be adding a positive test case with heap_samples_enabled=true
to the existing test context?
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #3281 +/- ##
==========================================
- Coverage 98.23% 98.23% -0.01%
==========================================
Files 1253 1253
Lines 72766 72991 +225
Branches 3417 3430 +13
==========================================
+ Hits 71481 71702 +221
- Misses 1285 1289 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Did a final tiny pass, I think this is ready!
YAY! |
**What does this PR do?** This PR introduces a new setting for the profiler: `profiling.advanced.allocation_counting_enabled` which controls the profiler allocation counting feature. This setting is off by default, enabling us to reduce allocation profiling overhead slightly. **Motivation:** We actually used to have this setting back in the ddtrace 1.x series. We introduced it in #2635 and removed it in #3281 -- by tieing it directly to allocation profiling. I decided to re-introduce it so we can disable this feature by default. **Additional Notes:** This PR sits atop #3797 because it makes sense to measure both overhead improvements together, but is otherwise completely independent. **How to test the change?** Here's the results for running the `profiler_allocation` benchmark: ``` ruby 2.7.7p221 (2022-11-24 revision 168ec2b1e5) [x86_64-linux] Warming up -------------------------------------- Allocations (baseline) 1.419M i/100ms Calculating ------------------------------------- Allocations (baseline) 14.535M (± 2.1%) i/s - 146.197M in 10.062717s Warming up -------------------------------------- Allocations (alloc_profiling_enabled) 1.122M i/100ms Calculating ------------------------------------- Allocations (alloc_profiling_enabled) 11.636M (± 2.2%) i/s - 116.679M in 10.032209s Warming up -------------------------------------- Allocations (alloc_profiling_disabled) 1.124M i/100ms Calculating ------------------------------------- Allocations (alloc_profiling_disabled) 11.866M (± 2.6%) i/s - 119.175M in 10.050580s Comparison: Allocations (baseline): 14534979.3 i/s Allocations (alloc_profiling_disabled): 11865926.7 i/s - 1.22x slower Allocations (alloc_profiling_enabled): 11635919.9 i/s - 1.25x slower ``` The difference is close to the margin of error; nevertheless this feature was showing up on the native profiler, and since it was on the hot path for allocation profiling, I think it's worth it.
What does this PR do?
This PR paves the way for the introduction of heap profiling functionality by:
DD_PROFILING_EXPERIMENTAL_HEAP_ENABLED
, false by default).DD_PROFILING_EXPERIMENTAL_ALLOCATION_ENABLED
andDD_PROFILING_EXPERIMENTAL_ALLOCATION_SAMPLE_RATE
) and improving the warnings on broken rubies.heap-live-samples
data at profile serialization time.Future PRs will gradually build on the heap recorder implementation to actually enable heap data collection.
Motivation:
Start adding heap profiling capabilities in easy-to-review chunks.
Additional Notes:
How to test the change?
The only visible result of this PR is the inclusion of
heap-live-samples
values in the resulting Ruby profiles when an app is executed with:For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance
.Unsure? Have a question? Request a review!