-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tailsampling: only send to next consumer once #1735
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1735 +/- ##
==========================================
+ Coverage 91.15% 91.26% +0.10%
==========================================
Files 272 272
Lines 16256 16263 +7
==========================================
+ Hits 14818 14842 +24
+ Misses 1010 996 -14
+ Partials 428 425 -3
Continue to review full report at Codecov.
|
|
statCountTracesSampled.M(int64(1)), | ||
) | ||
decisionNotSampled++ | ||
finalDecision := sampling.NotSampled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add some comments to explain what the code is doing? Even better would be to break this function into smaller ones, it is very large and difficult to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this code is difficult to understand. Instead of trying to fix/refactor the existing implementation do you think its worthwhile to rewrite this processor to rely on the new groupbytrace? The implementation would be much simpler.
I asked @jpkrohling about this and he was open to the idea. I can create an issue to discuss further if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-smith-zocdoc are you suggesting to discard this PR and instead submit a bigger refactoring? I am open to the idea, but it depends on what exactly is the refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to open a separate issue for it. With the deadlock bug I found its not something we could do yet anyway.
a267df2
to
33755fd
Compare
) | ||
} | ||
|
||
func (tsp *tailSamplingSpanProcessor) makeDecision(id idbatcher.ID, trace *sampling.TraceData) (sampling.Decision, *Policy, int64, int64, int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan I pulled this method out, hopefully that makes the ontick a little more clear. A large amount of this code is purely for metrics
// any single policy that decides to sample will cause the decision to be sampled | ||
// the nextConsumer will get the context from the first matching policy | ||
finalDecision = sampling.Sampled | ||
if matchingPolicy == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason this is necessary is to get the matching context from the policy which contains tags for the policy name
trace.Lock() | ||
traceBatches := trace.ReceivedBatches | ||
trace.ReceivedBatches = nil | ||
trace.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the lock was taken twice when a policy matched, I just combined them into a single acquisition
unrelated test failure
|
another unrelated test failure
|
) | ||
} | ||
|
||
func (tsp *tailSamplingSpanProcessor) makeDecision(id idbatcher.ID, trace *sampling.TraceData) (sampling.Decision, *Policy, int64, int64, int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very easy to have a bug when the function returns 3 int64 parameters if you accidentally swap the order here in the return statement or at the call site.
One possible way to avoid this is to have a struct with fields and return it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix
if err != nil { | ||
trace.Decisions[i] = sampling.NotSampled | ||
evaluateErrorCount++ | ||
tsp.logger.Error("Sampling policy error", zap.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to logger.Debug. See https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#logging
Please fix the conflicting files. |
Did you see this locally? I cannot find this failure on CircleCI. |
@tigrannajaryan the failure was in a previous commit, here is the run https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector/3605/workflows/78743f26-5ef1-4b37-a886-5cfadbd4877f/jobs/38229/steps |
…ext consumer once fixes open-telemetry#1514
… a little more readable
c70030f
to
fd7e40c
Compare
Made requested changes. Contrib tests are also failing on master |
Description:
Previously if multiple sampling policies chose to sample, the trace would be send to the next consumer multiple times. This fixes it to only send it once.
Link to tracking Issue: #1514
Testing:
Unit test added as well as a simple manual test using the reproduction case in the original issue.
Documentation:
None. Do I need to add any?