tailsampling: only send to next consumer once #1735

chris-smith-zocdoc · 2020-09-03T16:13:01Z

Description:

Previously if multiple sampling policies chose to sample, the trace would be send to the next consumer multiple times. This fixes it to only send it once.

Link to tracking Issue: #1514

Testing:
Unit test added as well as a simple manual test using the reproduction case in the original issue.

Documentation:
None. Do I need to add any?

codecov · 2020-09-03T16:22:05Z

Codecov Report

Merging #1735 into master will increase coverage by 0.10%.
The diff coverage is 97.67%.

@@            Coverage Diff             @@
##           master    #1735      +/-   ##
==========================================
+ Coverage   91.15%   91.26%   +0.10%     
==========================================
  Files         272      272              
  Lines       16256    16263       +7     
==========================================
+ Hits        14818    14842      +24     
+ Misses       1010      996      -14     
+ Partials      428      425       -3

Impacted Files	Coverage Δ
...mplingprocessor/tailsamplingprocessor/processor.go	`74.03% <97.67%> (+7.37%)`	⬆️
translator/internaldata/resource_to_oc.go	`91.48% <0.00%> (+4.25%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8690937...fd7e40c. Read the comment docs.

chris-smith-zocdoc · 2020-09-03T16:23:28Z

contrib-test is failing on an unrelated test.

--- FAIL: TestMetrics (2.00s)
    observability_test.go:93: timedout waiting for metrics to arrive

tigrannajaryan · 2020-09-16T13:57:13Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

-					statCountTracesSampled.M(int64(1)),
-				)
-				decisionNotSampled++
+		finalDecision := sampling.NotSampled


Can you please add some comments to explain what the code is doing? Even better would be to break this function into smaller ones, it is very large and difficult to understand.

I agree this code is difficult to understand. Instead of trying to fix/refactor the existing implementation do you think its worthwhile to rewrite this processor to rely on the new groupbytrace? The implementation would be much simpler.

I asked @jpkrohling about this and he was open to the idea. I can create an issue to discuss further if you'd like.

@chris-smith-zocdoc are you suggesting to discard this PR and instead submit a bigger refactoring? I am open to the idea, but it depends on what exactly is the refactoring.

I'm going to open a separate issue for it. With the deadlock bug I found its not something we could do yet anyway.

chris-smith-zocdoc · 2020-09-22T22:06:40Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

+	)
+}
+
+func (tsp *tailSamplingSpanProcessor) makeDecision(id idbatcher.ID, trace *sampling.TraceData) (sampling.Decision, *Policy, int64, int64, int64) {


@tigrannajaryan I pulled this method out, hopefully that makes the ontick a little more clear. A large amount of this code is purely for metrics

chris-smith-zocdoc · 2020-09-22T22:09:48Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

+				// any single policy that decides to sample will cause the decision to be sampled
+				// the nextConsumer will get the context from the first matching policy
+				finalDecision = sampling.Sampled
+				if matchingPolicy == nil {


The only reason this is necessary is to get the matching context from the policy which contains tags for the policy name

chris-smith-zocdoc · 2020-09-22T22:11:16Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

+		trace.Lock()
+		traceBatches := trace.ReceivedBatches
+		trace.ReceivedBatches = nil
+		trace.Unlock()


Previously the lock was taken twice when a policy matched, I just combined them into a single acquisition

chris-smith-zocdoc · 2020-09-22T23:02:48Z

unrelated test failure

=== RUN   TestHTTPInvalidTLSCredentials
    otlp_test.go:615: 
        	Error Trace:	otlp_test.go:615
        	Error:      	Error message not equal:
        	            	expected: "failed to load TLS config: for auth via TLS, either both certificate and key must be supplied, or neither"
        	            	actual  : "listen tcp 127.0.0.1:50000: bind: address already in use"
        	Test:       	TestHTTPInvalidTLSCredentials
--- FAIL: TestHTTPInvalidTLSCredentials (0.00s)

chris-smith-zocdoc · 2020-09-22T23:32:17Z

another unrelated test failure

TestProcessManagerArgs - fluentbitextension

tigrannajaryan · 2020-09-24T15:30:41Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

+	)
+}
+
+func (tsp *tailSamplingSpanProcessor) makeDecision(id idbatcher.ID, trace *sampling.TraceData) (sampling.Decision, *Policy, int64, int64, int64) {


It is very easy to have a bug when the function returns 3 int64 parameters if you accidentally swap the order here in the return statement or at the call site.
One possible way to avoid this is to have a struct with fields and return it instead.

tigrannajaryan · 2020-09-24T15:33:17Z

processor/samplingprocessor/tailsamplingprocessor/processor.go

+		if err != nil {
+			trace.Decisions[i] = sampling.NotSampled
+			evaluateErrorCount++
+			tsp.logger.Error("Sampling policy error", zap.Error(err))


Please change to logger.Debug. See https://github.com/open-telemetry/opentelemetry-collector/blob/master/CONTRIBUTING.md#logging

tigrannajaryan · 2020-09-24T15:34:15Z

Please fix the conflicting files.

tigrannajaryan · 2020-09-24T15:51:24Z

related test failure

=== RUN   TestHTTPInvalidTLSCredentials
    otlp_test.go:615: 
        	Error Trace:	otlp_test.go:615
        	Error:      	Error message not equal:
        	            	expected: "failed to load TLS config: for auth via TLS, either both certificate and key must be supplied, or neither"
        	            	actual  : "listen tcp 127.0.0.1:50000: bind: address already in use"
        	Test:       	TestHTTPInvalidTLSCredentials
--- FAIL: TestHTTPInvalidTLSCredentials (0.00s)

Did you see this locally? I cannot find this failure on CircleCI.

chris-smith-zocdoc · 2020-09-24T16:49:40Z

@tigrannajaryan the failure was in a previous commit, here is the run https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector/3605/workflows/78743f26-5ef1-4b37-a886-5cfadbd4877f/jobs/38229/steps

…ext consumer once fixes open-telemetry#1514

… a little more readable

chris-smith-zocdoc · 2020-09-24T17:38:06Z

Made requested changes.

Contrib tests are also failing on master

chris-smith-zocdoc requested review from bogdandrutu, dmitryax, james-bebbington, nilebox, owais, pjanotti and tigrannajaryan as code owners September 3, 2020 16:13

tigrannajaryan self-assigned this Sep 11, 2020

tigrannajaryan reviewed Sep 16, 2020

View reviewed changes

chris-smith-zocdoc force-pushed the issue_1514 branch from a267df2 to 33755fd Compare September 22, 2020 22:04

chris-smith-zocdoc commented Sep 22, 2020

View reviewed changes

tigrannajaryan reviewed Sep 24, 2020

View reviewed changes

chris-smith-zocdoc added 6 commits September 24, 2020 13:20

tailsampling: when multiple policies choose to sample, only send to n…

fb6bf8c

…ext consumer once fixes open-telemetry#1514

move the decision making loop out of the onTick loop to make the code…

0449030

… a little more readable

add test

430f144

add coverage for error case

3facf2a

move metric counters into a struct

cba2248

change log to debug

fd7e40c

chris-smith-zocdoc force-pushed the issue_1514 branch from c70030f to fd7e40c Compare September 24, 2020 17:28

tigrannajaryan approved these changes Sep 24, 2020

View reviewed changes

tigrannajaryan merged commit 3cfaa77 into open-telemetry:master Sep 24, 2020

chris-smith-zocdoc deleted the issue_1514 branch October 1, 2020 18:45

chris-smith-zocdoc mentioned this pull request Oct 1, 2020

Tail Sampling Processor emits trace data multiple times #1514

Closed

Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this pull request Jul 5, 2024

[Metrics SDK] Cleanup ENABLE_METRICS_PREVIEW (open-telemetry#1735)

b8b715f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tailsampling: only send to next consumer once #1735

tailsampling: only send to next consumer once #1735

chris-smith-zocdoc commented Sep 3, 2020

codecov bot commented Sep 3, 2020 •

edited

Loading

chris-smith-zocdoc commented Sep 3, 2020

tigrannajaryan Sep 16, 2020

chris-smith-zocdoc Sep 16, 2020

tigrannajaryan Sep 22, 2020

chris-smith-zocdoc Sep 22, 2020

chris-smith-zocdoc Sep 22, 2020

chris-smith-zocdoc Sep 22, 2020

chris-smith-zocdoc Sep 22, 2020

chris-smith-zocdoc commented Sep 22, 2020 •

edited

Loading

chris-smith-zocdoc commented Sep 22, 2020

tigrannajaryan Sep 24, 2020

chris-smith-zocdoc Sep 24, 2020

tigrannajaryan Sep 24, 2020

tigrannajaryan commented Sep 24, 2020

tigrannajaryan commented Sep 24, 2020

chris-smith-zocdoc commented Sep 24, 2020

chris-smith-zocdoc commented Sep 24, 2020

tailsampling: only send to next consumer once #1735

tailsampling: only send to next consumer once #1735

Conversation

chris-smith-zocdoc commented Sep 3, 2020

codecov bot commented Sep 3, 2020 • edited Loading

Codecov Report

chris-smith-zocdoc commented Sep 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chris-smith-zocdoc commented Sep 22, 2020 • edited Loading

chris-smith-zocdoc commented Sep 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Sep 24, 2020

tigrannajaryan commented Sep 24, 2020

chris-smith-zocdoc commented Sep 24, 2020

chris-smith-zocdoc commented Sep 24, 2020

codecov bot commented Sep 3, 2020 •

edited

Loading

chris-smith-zocdoc commented Sep 22, 2020 •

edited

Loading