[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

pcwiese · 2022-10-06T17:44:22Z

What happened?

Description

I have a situation where spans for a trace are not sampled during the first decision point, but late arriving spans for the same trace are. This only reproduces for late arriving spans and only under certain conditions:

Steps to Reproduce

If the sampler is configured with a mix containing at least 1 string attribute policy, with invert match = true and another policy.

  tail_sampling:
    decision_wait: 5s
    policies:
    - name: ignore_traces_with_http_target
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        invert_match: true
        key: http.target
        values:
        - alive$
    - name: sample_traces_from_service_namespaces
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        key: service.namespace
        values:
        - ^foobar$

If you start a trace and send one span that causes the policies to evaluate to:
policies[0] = InvertSampled
policies[1] = NotSampled

then wait for the configured decision time, and send another span to complete the trace, the first span will be unsampled while the second one will be, which is not correct.

I sent two spans from a dotnet host that look like this:

    Console.WriteLine("Hit any key to generate a new trace.");
    Console.ReadLine();
    Console.WriteLine("Trace started...");

    using var activity = source.StartActivity(
        "HTTP GET",
        ActivityKind.Server);
    activity?.SetTag($"http.url", "http://localhost/bar");
    activity?.SetTag($"http.method", "GET");
    activity?.SetTag($"http.path", "/bar");

    using var subActivity1 = source.StartActivity(
        "HTTP GET",
        ActivityKind.Client);
    subActivity1?.SetTag($"http.url", "http://localhost/foo");
    subActivity1?.SetTag($"http.method", "GET");
    subActivity1?.SetTag($"http.path", "/foo");
    subActivity1?.Dispose();

    Console.WriteLine("Hit any key to complete the trace...");
    Console.ReadLine();
    Console.WriteLine("Trace complete.\n.\n");

The span generated from subActivity1 is delivered immediately, and after the decision wait time expires, produces the policy evaluations above. The trace's final decision is NotSampled. After completing the trace, the final span is delivered and immediately sampled and forwarded along.

Expected Result

Late arriving spans always get the same initial final decision, which in this case is NotSampled (if that is correct).

Actual Result

Late arriving spans do not always get the same initial final decision.

Collector version

v0.61.0

Environment information

Environment

OS: Ubuntu 20.04

OpenTelemetry Collector configuration

exporters:
  logging:
    loglevel: debug
processors:
  tail_sampling:
    decision_wait: 5s
    policies:
    - name: ignore_traces_with_http_target
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        invert_match: true
        key: http.target
        values:
        - alive$
    - name: sample_traces_from_service_namespaces
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        key: service.namespace
        values:
        - ^testhost$
receivers:
  otlp/sampler:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
service:
  pipelines:
    traces:
      exporters:
      - logging
      processors:
      - tail_sampling
      receivers:
      - otlp/sampler
  telemetry:
    logs:
      level: DEBUG

Log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

pcwiese · 2022-10-07T15:45:41Z

@jpkrohling @fabiovn

github-actions · 2022-10-07T16:46:15Z

Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

jpkrohling · 2022-10-19T17:33:31Z

Thanks for the report. I do not have time to work on this right now, but I'll add this to my queue. It would expedite fixing this if you could provide a test case or perhaps a PR.

See open-telemetry#14760

pcwiese · 2022-11-15T22:52:13Z

Thanks for the report. I do not have time to work on this right now, but I'll add this to my queue. It would expedite fixing this if you could provide a test case or perhaps a PR.

I put out a PR, please take a look and see if I'm off base here.

jpkrohling · 2022-11-22T19:09:02Z

Thanks, I'm adding this to my review queue.

pcwiese added bug Something isn't working needs triage New item requiring triage labels Oct 6, 2022

pcwiese changed the title ~~[TailSamplingProcessor] Late arriving spans can get different final decision from original~~ [processor/tailsamplingprocessor] Late arriving spans can get different final decision from original Oct 7, 2022

evan-bradley added priority:p2 Medium processor/tailsampling Tail sampling processor and removed needs triage New item requiring triage labels Oct 7, 2022

pcwiese added a commit to pcwiese/opentelemetry-collector-contrib that referenced this issue Nov 15, 2022

[tailsamplingprocessor] Honor final decision for late arriving spans

2117ecf

See open-telemetry#14760

pcwiese mentioned this issue Nov 15, 2022

[processor/tailsampling] Honor final decision for late arriving spans #16321

Merged

jpkrohling closed this as completed in #16321 Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

pcwiese commented Oct 6, 2022 •

edited

Loading

pcwiese commented Oct 7, 2022

github-actions bot commented Oct 7, 2022

jpkrohling commented Oct 19, 2022

pcwiese commented Nov 15, 2022

jpkrohling commented Nov 22, 2022

[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

Comments

pcwiese commented Oct 6, 2022 • edited Loading

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

pcwiese commented Oct 7, 2022

github-actions bot commented Oct 7, 2022

jpkrohling commented Oct 19, 2022

pcwiese commented Nov 15, 2022

jpkrohling commented Nov 22, 2022

pcwiese commented Oct 6, 2022 •

edited

Loading