-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single Span Sampling #2128
Single Span Sampling #2128
Conversation
Co-authored-by: Ivo Anjo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good. Some questions.
Only hard ask here is to do some performance testing. I'm particularly interested in:
100+ spans per trace, all traces dropped, rule configured to keep one span
vs
100+ spans per trace, all traces kept, no rules evaluated
This should give us the relative cost increase in a worst case.
@@ -75,6 +77,7 @@ def build_tracer(settings, agent_settings) | |||
enabled: settings.tracing.enabled, | |||
trace_flush: trace_flush, | |||
sampler: sampler, | |||
span_sampler: build_span_sampler(settings), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love that span_sampler
is its own component within the tracer, but it seems better to have smaller, simpler responsibilities than to have one sampler that's complex.
# These rules allow a span to be kept when its encompassing trace is dropped. | ||
# | ||
# The syntax for single span sampling rules can be found here: | ||
# TODO: <Single Span Sampling documentation URL here> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out: let's update this when we can.
def get_trace(trace_op) | ||
trace_op.flush! | ||
trace_op.flush! do |spans| | ||
spans.select! { |span| single_sampled?(span) } unless trace_op.sampled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this will add some cost: having to iterate over each span in each dropped trace. Off the top of my head, not sure how we avoid this, but it is worth noting in regards of possible performance impact.
Fine for now; let's just measure the performance of this before merging, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks show it's not measurably slower, unless single span sampling actually matches, which causes 3 set_tag
operations, which are a bit slow.
Benchmarks have been added and attached results to this PR as a comment.
@@ -20,6 +20,9 @@ class RateSampler < Sampler | |||
# * +sample_rate+: the sample rate as a {Float} between 0.0 and 1.0. 0.0 | |||
# means that no trace will be sampled; 1.0 means that all traces will be | |||
# sampled. | |||
# | |||
# DEV-2.0: Allow for `sample_rate` zero (drop all) to be allowed. This eases | |||
# DEV-2.0: usage for many consumers of the {RateSampler} class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Can you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded the comment in the PR, but here's the gist:
All internal users of RateSampler
(RuleSampler and now Single Span Sampling) want sample_rate == 0 to mean "drop all", but they can't do that because of the validation that happens in the RateSampler
initializer.
The way they get around it is to not set the value in the initializer, but call:
sampler = RateSampler.new
sampler.sample_rate = sample_rate # There's no validation here
This bypasses the validation. Ideally, the RateSampler
would respect any rate between 0.0 and 1.0.
@@ -54,6 +56,13 @@ def match?(span) | |||
end | |||
end | |||
|
|||
def ==(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a span really the same as another span if their service and name are the same? For sampling purposes?
Can you explain the logic behind this a little more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This equality method referrers to the Matcher
class, not to spans.
If two matchers have the exact same instance variables, which is the only state they can have, they are the same.
Currently, the name
and service
matchers are all the instance variables a Matcher
can have, so if two matchers have the same name
and service
they are effectively the same.
Here are the benchmark results. Memory usage does have any significant change. The following benchmarks are measuring execution time. The numbers TraceOperation is kept by trace-level sampling:1. And no single span sampling is configured (baseline):This code path does not consult the single span sampler.
TraceOperation is reject by trace-level sampling:2. And no single span sampling is configured:This code path does not consult the single span sampler.
3. Simple span sampling is configured and all spans are rejected:The difference between this benchmark and the previous one is the cost to consult single span rules.
4. Simple span sampling is configured and all spans are kept:One side effect of being Single Span Sampled is that 3 tags are added to each span successfully being single sampled, thus more overhead is expected.
ConclusionsThe only code path with meaningful performance impact is the The rules by themselves are not very expensive: the difference between Maybe not surprising, but dropped traces ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance looks acceptable for a worst case. Tests are passing (minus one annoying MacOS test) so I think this should be good to merge as soon as its rebased.
Nice work!
This PR adds support for single span sampling to the tracer.
Single Span Sampling allows you to:
It is configured through the documented environment variables:
DD_SPAN_SAMPLING_RULES
,ENV_SPAN_SAMPLING_RULES_FILE
All changes in this feature branch have been individually reviewed.