Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsamplingprocessor] Late arriving spans can get different final decision from original #14760

Closed
pcwiese opened this issue Oct 6, 2022 · 5 comments · Fixed by #16321
Labels
bug Something isn't working priority:p2 Medium processor/tailsampling Tail sampling processor

Comments

@pcwiese
Copy link
Contributor

pcwiese commented Oct 6, 2022

What happened?

Description

I have a situation where spans for a trace are not sampled during the first decision point, but late arriving spans for the same trace are. This only reproduces for late arriving spans and only under certain conditions:

Steps to Reproduce

If the sampler is configured with a mix containing at least 1 string attribute policy, with invert match = true and another policy.

  tail_sampling:
    decision_wait: 5s
    policies:
    - name: ignore_traces_with_http_target
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        invert_match: true
        key: http.target
        values:
        - alive$
    - name: sample_traces_from_service_namespaces
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        key: service.namespace
        values:
        - ^foobar$

If you start a trace and send one span that causes the policies to evaluate to:
policies[0] = InvertSampled
policies[1] = NotSampled

then wait for the configured decision time, and send another span to complete the trace, the first span will be unsampled while the second one will be, which is not correct.

I sent two spans from a dotnet host that look like this:

    Console.WriteLine("Hit any key to generate a new trace.");
    Console.ReadLine();
    Console.WriteLine("Trace started...");

    using var activity = source.StartActivity(
        "HTTP GET",
        ActivityKind.Server);
    activity?.SetTag($"http.url", "http://localhost/bar");
    activity?.SetTag($"http.method", "GET");
    activity?.SetTag($"http.path", "/bar");

    using var subActivity1 = source.StartActivity(
        "HTTP GET",
        ActivityKind.Client);
    subActivity1?.SetTag($"http.url", "http://localhost/foo");
    subActivity1?.SetTag($"http.method", "GET");
    subActivity1?.SetTag($"http.path", "/foo");
    subActivity1?.Dispose();

    Console.WriteLine("Hit any key to complete the trace...");
    Console.ReadLine();
    Console.WriteLine("Trace complete.\n.\n");

The span generated from subActivity1 is delivered immediately, and after the decision wait time expires, produces the policy evaluations above. The trace's final decision is NotSampled. After completing the trace, the final span is delivered and immediately sampled and forwarded along.

Expected Result

Late arriving spans always get the same initial final decision, which in this case is NotSampled (if that is correct).

Actual Result

Late arriving spans do not always get the same initial final decision.

Collector version

v0.61.0

Environment information

Environment

OS: Ubuntu 20.04

OpenTelemetry Collector configuration

exporters:
  logging:
    loglevel: debug
processors:
  tail_sampling:
    decision_wait: 5s
    policies:
    - name: ignore_traces_with_http_target
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        invert_match: true
        key: http.target
        values:
        - alive$
    - name: sample_traces_from_service_namespaces
      type: string_attribute
      string_attribute:
        enabled_regex_matching: true
        key: service.namespace
        values:
        - ^testhost$
receivers:
  otlp/sampler:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
service:
  pipelines:
    traces:
      exporters:
      - logging
      processors:
      - tail_sampling
      receivers:
      - otlp/sampler
  telemetry:
    logs:
      level: DEBUG

Log output

No response

Additional context

No response

@pcwiese pcwiese added bug Something isn't working needs triage New item requiring triage labels Oct 6, 2022
@pcwiese pcwiese changed the title [TailSamplingProcessor] Late arriving spans can get different final decision from original [processor/tailsamplingprocessor] Late arriving spans can get different final decision from original Oct 7, 2022
@pcwiese
Copy link
Contributor Author

pcwiese commented Oct 7, 2022

@jpkrohling @fabiovn

@evan-bradley evan-bradley added priority:p2 Medium processor/tailsampling Tail sampling processor and removed needs triage New item requiring triage labels Oct 7, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Oct 7, 2022

Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling
Copy link
Member

Thanks for the report. I do not have time to work on this right now, but I'll add this to my queue. It would expedite fixing this if you could provide a test case or perhaps a PR.

@pcwiese
Copy link
Contributor Author

pcwiese commented Nov 15, 2022

Thanks for the report. I do not have time to work on this right now, but I'll add this to my queue. It would expedite fixing this if you could provide a test case or perhaps a PR.

I put out a PR, please take a look and see if I'm off base here.

@jpkrohling
Copy link
Member

Thanks, I'm adding this to my review queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/tailsampling Tail sampling processor
Projects
None yet
3 participants