-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsampling] Decision cache for non-sampled trace IDs #33722
[processor/tailsampling] Decision cache for non-sampled trace IDs #33722
Conversation
417d697
to
8c64b9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple of comments.
@@ -36,7 +36,7 @@ func TestLoadConfig(t *testing.T) { | |||
DecisionWait: 10 * time.Second, | |||
NumTraces: 100, | |||
ExpectedNewTracesPerSec: 10, | |||
DecisionCache: DecisionCacheConfig{SampledCacheSize: 500}, | |||
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1000, NonSampledCacheSize: 10000}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1000, NonSampledCacheSize: 10000}, | |
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1_000, NonSampledCacheSize: 10_000}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion is because it wasn't very clear on a first read that those were different numbers.
@@ -365,7 +382,14 @@ func (tsp *tailSamplingSpanProcessor) processTraces(resourceSpans ptrace.Resourc | |||
traceTd := ptrace.NewTraces() | |||
appendToTraces(traceTd, resourceSpans, spans) | |||
tsp.releaseSampledTrace(tsp.ctx, id, traceTd) | |||
tsp.telemetry.ProcessorTailSamplingEarlyReleasesFromCacheDecision.Add(tsp.ctx, int64(len(spans))) | |||
tsp.telemetry.ProcessorTailSamplingEarlyReleasesFromCacheDecision. | |||
Add(tsp.ctx, int64(len(spans)), metric.WithAttributes(attribute.String("decision", "sample"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interstingly, metric.WithAttributes performs very poorly, so it should be avoided on the hot path. I typically have a static var for that, and refer it later.
Using metric.WithAttributeSet should not cause any allocations on the hot path.
opentelemetry-collector-contrib/exporter/loadbalancingexporter/wrapped_exporter.go
Line 36 in b95672e
successAttr: attribute.NewSet(ea, attribute.Bool("success", true)), |
@@ -458,6 +482,9 @@ func (tsp *tailSamplingSpanProcessor) dropTrace(traceID pcommon.TraceID, deletio | |||
tsp.idToTrace.Delete(traceID) | |||
// Subtract one from numTracesOnMap per https://godoc.org/sync/atomic#AddUint64 | |||
tsp.numTracesOnMap.Add(^uint64(0)) | |||
if trace.FinalDecision != sampling.Sampled { | |||
tsp.nonSampledIDCache.Put(traceID, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be placed close to line 450, where the NotSampled decision is made. I think this code here will not apply to all NotSampled decisions, especially if the idToTrace cache is full.
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@jamesmoessis , do you want to address the review comments on this PR or on a follow-up? |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@jpkrohling I don't have capacity to work on this PR anymore since we have used our own tail sampler internally (which, incidentally, uses these decision caches as mandatory mechanisms). If someone else wants to pick this up feel free, since I won't have capacity. Otherwise feel free to close. |
I'm closing it for now, but others can start a new PR based on this. Thank you for laying down the work for this! |
…6040) Re-opening #33722 so it can be finished. I've addressed @jpkrohling's open comments from that PR. Description: Adds a decision cache for non sampled IDs. This is off-by-default. It's the same as the sampled IDs cache but is separate and has a separate size. **Link to tracking Issue: #31583 Testing: Unit test added. Cache type itself already tested. Documentation: Updated the readme with config options and descriptions. --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
…en-telemetry#36040) Re-opening open-telemetry#33722 so it can be finished. I've addressed @jpkrohling's open comments from that PR. Description: Adds a decision cache for non sampled IDs. This is off-by-default. It's the same as the sampled IDs cache but is separate and has a separate size. **Link to tracking Issue: open-telemetry#31583 Testing: Unit test added. Cache type itself already tested. Documentation: Updated the readme with config options and descriptions. --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
…en-telemetry#36040) Re-opening open-telemetry#33722 so it can be finished. I've addressed @jpkrohling's open comments from that PR. Description: Adds a decision cache for non sampled IDs. This is off-by-default. It's the same as the sampled IDs cache but is separate and has a separate size. **Link to tracking Issue: open-telemetry#31583 Testing: Unit test added. Cache type itself already tested. Documentation: Updated the readme with config options and descriptions. --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
…en-telemetry#36040) Re-opening open-telemetry#33722 so it can be finished. I've addressed @jpkrohling's open comments from that PR. Description: Adds a decision cache for non sampled IDs. This is off-by-default. It's the same as the sampled IDs cache but is separate and has a separate size. **Link to tracking Issue: open-telemetry#31583 Testing: Unit test added. Cache type itself already tested. Documentation: Updated the readme with config options and descriptions. --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
…en-telemetry#36040) Re-opening open-telemetry#33722 so it can be finished. I've addressed @jpkrohling's open comments from that PR. Description: Adds a decision cache for non sampled IDs. This is off-by-default. It's the same as the sampled IDs cache but is separate and has a separate size. **Link to tracking Issue: open-telemetry#31583 Testing: Unit test added. Cache type itself already tested. Documentation: Updated the readme with config options and descriptions. --------- Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Description:
Adds a decision cache for non sampled IDs. This is off-by-default.
It's the same as the sampled IDs cache but is separate and has a separate size.
**Link to tracking Issue: #31583
Testing:
Unit test added. Cache type itself already tested.
Documentation: