Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-4535] Report code provenance metadata with Ruby profiles #1813

Merged
merged 5 commits into from
Jan 21, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Dec 16, 2021

The code provenance metadata will be used to power grouping and categorization of stack traces, and is basically a list of gem names, version and paths that have been loaded into a Ruby app that is being profiled. (Gems that have not been loaded are not reported)

This PR is pretty-much feature-complete, but I'm marking it as a draft because:

  1. I still need to fix the benchmark that is broken
  2. The profiling backend needs to be updated to correctly accept this data (currently it seems to reject the entire profile)

In terms of the profiling architecture, the code provenance metadata doesn't quite fit very much with the existing structure, and so the current approach seems somewhat tacked on.

I left a comment in the Recorder discussing this in detail, but TL;DR the Ruby profiler will be switched to report data through libddprof which will be a big architectural shift and mean many classes will probably change quite a lot (including the Recorder) and so it's not worth doing a huge refactoring now that we'll throw away in Q1.

@ivoanjo ivoanjo requested a review from a team December 16, 2021 12:38
@ivoanjo ivoanjo marked this pull request as draft December 16, 2021 12:38
@ivoanjo ivoanjo changed the title Report code provenance metadata with Ruby profiles Draft: Report code provenance metadata with Ruby profiles Dec 16, 2021
@ivoanjo ivoanjo changed the title Draft: Report code provenance metadata with Ruby profiles Draft: [PROF-4535] Report code provenance metadata with Ruby profiles Dec 16, 2021
@ivoanjo ivoanjo self-assigned this Dec 16, 2021
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4535-report-code-provenance branch from 1e659f9 to b931c45 Compare December 16, 2021 18:11
@codecov-commenter
Copy link

Codecov Report

Merging #1813 (b931c45) into master (0aeb038) will increase coverage by 0.00%.
The diff coverage is 98.86%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #1813    +/-   ##
========================================
  Coverage   98.21%   98.21%            
========================================
  Files         931      933     +2     
  Lines       44920    45048   +128     
========================================
+ Hits        44120    44246   +126     
- Misses        800      802     +2     
Impacted Files Coverage Δ
spec/ddtrace/profiling/integration_spec.rb 97.31% <ø> (ø)
spec/ddtrace/configuration/components_spec.rb 99.40% <83.33%> (-0.40%) ⬇️
lib/ddtrace/configuration/components.rb 98.26% <100.00%> (+0.04%) ⬆️
lib/ddtrace/configuration/settings.rb 100.00% <100.00%> (ø)
lib/ddtrace/ext/profiling.rb 100.00% <100.00%> (ø)
lib/ddtrace/profiling.rb 100.00% <100.00%> (ø)
...ib/ddtrace/profiling/collectors/code_provenance.rb 100.00% <100.00%> (ø)
lib/ddtrace/profiling/flush.rb 100.00% <100.00%> (ø)
lib/ddtrace/profiling/recorder.rb 97.95% <100.00%> (+0.08%) ⬆️
...b/ddtrace/profiling/transport/http/api/endpoint.rb 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0aeb038...b931c45. Read the comment docs.

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far!

ivoanjo added a commit that referenced this pull request Jan 5, 2022
Although not stated explicitly, the Ruby profiler was previously
using the 1.2 intake format.

The 1.3 format (lightly documented
[here](https://github.com/DataDog/profiling-backend/blob/prod/README.md#v3-intake-format-used-by-go-net-native)
[Datadog-internal link, apologies]) shuffles around the fields a bit:

* `recording-start` => `start`
* `recording-end` => `end`
* `data[0]` + `types[0]` => `data[somefilename]`
* `runtime` => `family`
* `format` is removed
* `version` is added

This change is not observable to customers; but
is a requirement to submitting extra files along with profiles,
as we plan to do in #1813.
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4535-report-code-provenance branch from b931c45 to 29ac298 Compare January 5, 2022 17:17
@ivoanjo
Copy link
Member Author

ivoanjo commented Jan 5, 2022

I've previously stated that this was marked as a draft because

This PR is pretty-much feature-complete, but I'm marking it as a draft because:

  1. I still need to fix the benchmark that is broken
  2. The profiling backend needs to be updated to correctly accept this data (currently it seems to reject the entire profile)

Item 1. has since been fixed, and 2. is fixed by #1820 . After that is merged in, I'll rebase this PR again, and this should be good to go.

ivoanjo added a commit that referenced this pull request Jan 6, 2022
Although not stated explicitly, the Ruby profiler was previously
using the 1.2 intake format.

The 1.3 format (lightly documented
[here](https://github.com/DataDog/profiling-backend/blob/prod/README.md#v3-intake-format-used-by-go-net-native)
[Datadog-internal link, apologies]) shuffles around the fields a bit:

* `recording-start` => `start`
* `recording-end` => `end`
* `data[0]` + `types[0]` => `data[somefilename]`
* `runtime` => `family`
* `format` is removed
* `version` is added

This change is not observable to customers; but
is a requirement to submitting extra files along with profiles,
as we plan to do in #1813.
The `CodeProvenance` collector collects library metadata for loaded
files in the Ruby VM. This data powers grouping and categorization
of stack trace data.

Also updated the `ProfilingDevelopment.md` with the new class and
removed classes/modules that no longer exist.
Adding new arguments becomes really awkward and error-prone with this
many positional arguments (and many of them being optional), so I
decided to switch the Flush class to use keyword arguments.

Lots of support-for-older-rubies boilerplate here :(
I'm not quite happy with how complex wiring this in is, and also not
with how it looks (see also TODO on `Recording`), but I think it
strikes a good balance between respecting the current architecture
and also not requiring a massive refactoring.
The benchmark was broken by the addition of a `code_provenance` field
to the flush object, which is not relevant to this benchmark.

I did a bit of magic in a REPL to update the marshalled data to not
break the benchmark.
I ran into this issue in the tests being run on GitHub Actions,
since it installs our dependencies inside the dd-trace-rb folder.

It's unclear to me if it can happen in actual customer setups,
but I've decided to fix it anyway.
@ivoanjo ivoanjo force-pushed the ivoanjo/prof-4535-report-code-provenance branch from 29ac298 to d3b464b Compare January 7, 2022 10:34
@ivoanjo ivoanjo changed the title Draft: [PROF-4535] Report code provenance metadata with Ruby profiles [PROF-4535] Report code provenance metadata with Ruby profiles Jan 7, 2022
@ivoanjo ivoanjo marked this pull request as ready for review January 7, 2022 10:39
@ivoanjo
Copy link
Member Author

ivoanjo commented Jan 7, 2022

All set, ready for review/re-review! :)

@ivoanjo ivoanjo merged commit 93cd757 into master Jan 21, 2022
@ivoanjo ivoanjo deleted the ivoanjo/prof-4535-report-code-provenance branch January 21, 2022 15:11
@github-actions github-actions bot added this to the 0.55.0 milestone Jan 21, 2022
@ivoanjo ivoanjo mentioned this pull request Jan 24, 2022
1 task
ivoanjo added a commit that referenced this pull request Feb 1, 2022
The "code provenance" metadata was added in #1813 but is not yet in
use (and was never in any released version of ddtrace), so it's
OK/safe to rename this field.
ivoanjo added a commit that referenced this pull request Feb 1, 2022
The "code provenance" metadata was added in #1813 but is not yet in
use (and was never in any released version of ddtrace), so it's
OK/safe to rename this field.
ivoanjo added a commit that referenced this pull request Feb 25, 2022
…e.json`

The profiling team decided to rename this file for consistency.

The code provenance feature (#1813) is not yet exposed to customers,
and the only release made with the old file name is 1.0.0.beta1 so
this does not cause any regression.
@ivoanjo ivoanjo modified the milestones: 0.55.0, 1.0.0.beta1 Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants