Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: introduce priorities PriorityRuleSamplerKeep (3) and PriorityRuleSamplerReject (-3) for rulesSampler choices. #1012

Merged
merged 2 commits into from
Sep 26, 2021

Conversation

j-rufino
Copy link
Contributor

@j-rufino j-rufino commented Sep 22, 2021

Changed priority to 3,-3 for traces sampled/not based on sampling rate configuration

Problem statement and changes
So far there were 3 mechanisms to control sampling decisions in the agent, using the following priority values:

  • Sampling based on rates set by feedback loop between Agent and Tracer using priorities 1 (accept) and 0 (reject)
  • Sampling based on sampling rate rules set on the tracer using priorities 1 (accept) and 0 (reject)
  • Sampling based on explicit manual (programatic) user choice using priorities 2 (accept) and -2 (reject)

Priorities 0 and 1 being used by both the dynamic agent-tracer feedback loop and by the configuration rules sampling, has two problems, that we intend to resolve by using separate priority values (-3 and 3) when the sampling decision is based on config rules.

  • Problem 1: The agent will always reject spans/traces with negative priority, but it might decide to keep some with priority 0. This behaviour is not intuitive for users that intend to reject a specific % of spans/traces.

  • Problem 2: In the backend it's not possible to differentiate and produce stats on which spans/traces were sampled based on one mechanism vs the other.

Backwards compatibility
As far as I can tell, since the first commit of priority sampler, the behaviour of the agent is to reject spans with negative priority and accept those with priority > 0 and so no corresponding changes on the agent are required for this to be the behaviour.

See relevant code for this:

First commit of the priority sampler in the agent: https://github.com/DataDog/datadog-agent/blob/5f64165c8c719bdab64e63db3e4d1735f341f610/pkg/trace/sampler/prioritysampler.go#L107

I have verified that the current version of the agent works with this changes by locally setting up the agent and a test application using the modified tracer and I have verified that traces are correctly reported to the backend and visible in the UI.

@knusbaum knusbaum added this to the Triage milestone Sep 22, 2021
@knusbaum knusbaum requested a review from gbbr September 22, 2021 17:15
@knusbaum
Copy link
Contributor

Hi @j-rufino,

Can we clarify how this change will behave with older agents that don't know about the new priorities? Also would you link to the datadog-agent change that corresponds with this?

@j-rufino
Copy link
Contributor Author

Hi @knusbaum,

Thanks for looking into this so quickly and for the excellent point, I should have documented that in the PR in the first place.

As far as I can tell, since the first commit of priority sampler, the behaviour of the agent is to reject spans with negative priority and accept those with priority > 0 and so no corresponding changes on the agent are required for this to be the behaviour.

See relevant code for this:

First commit of the priority sampler in the agent: https://github.com/DataDog/datadog-agent/blob/5f64165c8c719bdab64e63db3e4d1735f341f610/pkg/trace/sampler/prioritysampler.go#L107

I have verified that the current version of the agent works with this changes by locally setting up the agent and a test application using the modified tracer and I have verified that traces are correctly reported to the backend and visible in the UI.

However, until we don't introduce some changes on the agent sampler statistics (and customers update the agent), the counter "datadog.trace_agent.receiver.traces_priority" will still report priority 2. See here.

Although the changes in the agent are decoupled from this, I will do create a separate PR for that and link it here as requested ASAP.

// PrioritySamplingRuleReject informs the backend that a trace should be rejected and not stored as specified by
// a sampling rule.
// This should be used by the rules sampler based on the rules configured by the user
PrioritySamplingRuleReject = -3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PrioritySamplingRuleReject = -3
PriorityRuleSamplerReject = -3

I think this will make it much clearer that this is here to clarify that the "rule sampler" made this choice. WDYT?

Comment on lines 12 to 14
// PrioritySamplingRuleReject informs the backend that a trace should be rejected and not stored as specified by
// a sampling rule.
// This should be used by the rules sampler based on the rules configured by the user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PrioritySamplingRuleReject informs the backend that a trace should be rejected and not stored as specified by
// a sampling rule.
// This should be used by the rules sampler based on the rules configured by the user
// PriorityRuleSamplerReject specifies that the rule sampler has decided that this trace should be rejected.

Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there! 👋🏻 Thanks for your first PR. I left a few suggestions. Hope it helps!

@j-rufino
Copy link
Contributor Author

j-rufino commented Sep 24, 2021

Hi there! 👋🏻 Thanks for your first PR. I left a few suggestions. Hope it helps!

Hi Gabriel, thanks for the suggestions, I agree is a better naming :) applied and committed.

Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me. One thing that would be nice please, is if you could read our contribution guidelines: https://github.com/DataDog/dd-trace-go/blob/v1/CONTRIBUTING.md. I mean specifically to the PR title here, and the commit message.

For now, I will edit the title, but for the future, in your next PRs 🙂

@gbbr gbbr changed the title Changed priority to 3,-3 for traces sampled based on sampling rate config ddtrace/tracer: introduce priorities PriorityRuleSamplerKeep (3) and PriorityRuleSamplerReject (-3) for rulesSampler choices. Sep 24, 2021
@j-rufino
Copy link
Contributor Author

This is fine by me. One thing that would be nice please, is if you could read our contribution guidelines: https://github.com/DataDog/dd-trace-go/blob/v1/CONTRIBUTING.md. I mean specifically to the PR title here, and the commit message.

For now, I will edit the title, but for the future, in your next PRs 🙂

Thank you Gabriel, I really appreciate the help and flexibility here. I should definitely have paid more attention to that, and will certainly take it into account for the next PRs.

@knusbaum knusbaum modified the milestones: Triage, 1.34.0 Sep 26, 2021
Copy link
Contributor

@knusbaum knusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@knusbaum knusbaum merged commit d59eb44 into v1 Sep 26, 2021
@knusbaum knusbaum deleted the jrufino/p3 branch September 26, 2021 02:48
@j-rufino
Copy link
Contributor Author

j-rufino commented Oct 18, 2021

Pleas note that after discussing with other integration teams we will be re-encoding these priority as -2, 3.

See up to date list of priority encodings here: https://docs.google.com/document/d/1lVhgeG6uS0C9DVmN3t74tBDL7h44duwynIQqExljxRc/edit#

gbbr pushed a commit that referenced this pull request Oct 20, 2021
…} instead of new values (#1030)

This change partially reverts #1012 and takes a different approach. Instead of using newly defined priority values which turned out to be problematic in the pipeline, we use already existing User{Keep,Reject} values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants