Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsampling] Decision cache for non-sampled trace IDs #33722

Conversation

jamesmoessis
Copy link
Contributor

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a separate size.

**Link to tracking Issue: #31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

  • Updated the readme with config options and descriptions.

@jamesmoessis jamesmoessis requested a review from jpkrohling as a code owner June 24, 2024 02:06
@jamesmoessis jamesmoessis requested a review from a team June 24, 2024 02:06
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Jun 24, 2024
@jamesmoessis jamesmoessis force-pushed the jmoe/decision-cache-non-sampled branch from 417d697 to 8c64b9c Compare June 26, 2024 06:18
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of comments.

@@ -36,7 +36,7 @@ func TestLoadConfig(t *testing.T) {
DecisionWait: 10 * time.Second,
NumTraces: 100,
ExpectedNewTracesPerSec: 10,
DecisionCache: DecisionCacheConfig{SampledCacheSize: 500},
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1000, NonSampledCacheSize: 10000},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1000, NonSampledCacheSize: 10000},
DecisionCache: DecisionCacheConfig{SampledCacheSize: 1_000, NonSampledCacheSize: 10_000},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is because it wasn't very clear on a first read that those were different numbers.

@@ -365,7 +382,14 @@ func (tsp *tailSamplingSpanProcessor) processTraces(resourceSpans ptrace.Resourc
traceTd := ptrace.NewTraces()
appendToTraces(traceTd, resourceSpans, spans)
tsp.releaseSampledTrace(tsp.ctx, id, traceTd)
tsp.telemetry.ProcessorTailSamplingEarlyReleasesFromCacheDecision.Add(tsp.ctx, int64(len(spans)))
tsp.telemetry.ProcessorTailSamplingEarlyReleasesFromCacheDecision.
Add(tsp.ctx, int64(len(spans)), metric.WithAttributes(attribute.String("decision", "sample")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interstingly, metric.WithAttributes performs very poorly, so it should be avoided on the hot path. I typically have a static var for that, and refer it later.

Using metric.WithAttributeSet should not cause any allocations on the hot path.

successAttr: attribute.NewSet(ea, attribute.Bool("success", true)),

@@ -458,6 +482,9 @@ func (tsp *tailSamplingSpanProcessor) dropTrace(traceID pcommon.TraceID, deletio
tsp.idToTrace.Delete(traceID)
// Subtract one from numTracesOnMap per https://godoc.org/sync/atomic#AddUint64
tsp.numTracesOnMap.Add(^uint64(0))
if trace.FinalDecision != sampling.Sampled {
tsp.nonSampledIDCache.Put(traceID, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be placed close to line 450, where the NotSampled decision is made. I think this code here will not apply to all NotSampled decisions, especially if the idToTrace cache is full.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 19, 2024
@jpkrohling
Copy link
Member

@jamesmoessis , do you want to address the review comments on this PR or on a follow-up?

@github-actions github-actions bot removed the Stale label Jul 20, 2024
Copy link
Contributor

github-actions bot commented Aug 4, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 4, 2024
@jpkrohling jpkrohling removed the Stale label Aug 13, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 28, 2024
@jamesmoessis
Copy link
Contributor Author

@jpkrohling I don't have capacity to work on this PR anymore since we have used our own tail sampler internally (which, incidentally, uses these decision caches as mandatory mechanisms).

If someone else wants to pick this up feel free, since I won't have capacity. Otherwise feel free to close.

@github-actions github-actions bot removed the Stale label Aug 29, 2024
@jpkrohling
Copy link
Member

I'm closing it for now, but others can start a new PR based on this. Thank you for laying down the work for this!

@jpkrohling jpkrohling closed this Aug 29, 2024
jpkrohling added a commit that referenced this pull request Nov 26, 2024
…6040)

Re-opening
#33722
so it can be finished.
I've addressed @jpkrohling's open comments from that PR.

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a
separate size.

**Link to tracking Issue:
#31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

Updated the readme with config options and descriptions.

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
shivanthzen pushed a commit to shivanthzen/opentelemetry-collector-contrib that referenced this pull request Dec 5, 2024
…en-telemetry#36040)

Re-opening
open-telemetry#33722
so it can be finished.
I've addressed @jpkrohling's open comments from that PR.

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a
separate size.

**Link to tracking Issue:
open-telemetry#31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

Updated the readme with config options and descriptions.

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
ZenoCC-Peng pushed a commit to ZenoCC-Peng/opentelemetry-collector-contrib that referenced this pull request Dec 6, 2024
…en-telemetry#36040)

Re-opening
open-telemetry#33722
so it can be finished.
I've addressed @jpkrohling's open comments from that PR.

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a
separate size.

**Link to tracking Issue:
open-telemetry#31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

Updated the readme with config options and descriptions.

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…en-telemetry#36040)

Re-opening
open-telemetry#33722
so it can be finished.
I've addressed @jpkrohling's open comments from that PR.

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a
separate size.

**Link to tracking Issue:
open-telemetry#31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

Updated the readme with config options and descriptions.

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
AkhigbeEromo pushed a commit to sematext/opentelemetry-collector-contrib that referenced this pull request Jan 13, 2025
…en-telemetry#36040)

Re-opening
open-telemetry#33722
so it can be finished.
I've addressed @jpkrohling's open comments from that PR.

Description:

Adds a decision cache for non sampled IDs. This is off-by-default.

It's the same as the sampled IDs cache but is separate and has a
separate size.

**Link to tracking Issue:
open-telemetry#31583

Testing:
Unit test added. Cache type itself already tested.

Documentation:

Updated the readme with config options and descriptions.

---------

Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/tailsampling Tail sampling processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants