Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Correct way how to not report fast spans #363

Open
rugpanov opened this issue Oct 21, 2019 · 12 comments
Open

Correct way how to not report fast spans #363

rugpanov opened this issue Oct 21, 2019 · 12 comments

Comments

@rugpanov
Copy link

I would like to trace slow operations only.
There is a manually chosen threshold to determine whether operation slow or not.
I haven't found any description about the correct way to implement it in your documentation, but what came to my mind is:

    final long startTime = System.nanoTime();
    Scope scope = tracer.buildSpan(operationName).startActive(true);
    try {
      //do some work
    } finally {
      long durationInMillis = (System.nanoTime() - startTime) / 1000000;
      if (durationInMillis < THE_THRESHOLD_IN_MILLS) {
        scope = null;
      } else {
        scope.close();
      }
    }

Is it the correct way to drop the spans? If not, what is the correct one?

@sjoerdtalsma
Copy link
Contributor

I don't know how to do that (I hope someone else can answer how to not report fast spans).

The scope should always be closed however, otherwise you risk having wrong parent spans if the thread happened to be reused from a threadpool.

Closing the scope and finishing the span are separate concerns. It is generally a bad idea to mix them. Probably what you want is to close the scope but not finish the span (don't know if this will accomplish your goal though) like so:

final long startTime = System.nanoTime();
final Span span = tracer.buildSpan(operationName).start();
try (Scope scope = tracer.scopeManager().activate(span)) {
    // ...
} finally {
    long durationInMillis = (System.nanoTime() - startTime) / 1000000;
    if (durationInMillis >= THE_THRESHOLD_IN_MILLIS) {
        span.finish();
    } else {
        // I doubt whether this prevents a report of the span ...
    }
}

@rugpanov rugpanov changed the title Correct way to report only slow operations Correct way how to not report fast spans Oct 21, 2019
@rugpanov
Copy link
Author

rugpanov commented Oct 21, 2019

My issue is still unresolved.

@sjoerdtalsma 's solution has two problems/questions to discuss:

  1. I don't know for sure what's happening with an unfinished span - does it stack in my sampler / any other memory leaks?
  2. If the parent span was not finished but children were finished, I will see them in the UI as <trace-without-root-span>

@tylerbenson
Copy link

I don't think most tracing systems will allow this since there is no way to know if a distributed request was made. My suggestion would be to look for a way to discard the span after it's finished.

@whiskeysierra
Copy link

The semantic conventions define sampling.priority as

If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism.

If I read this correctly you could add sampling.priority: 0 as a tag, if it's too short/fast. It does sound like it applies to the whole trace, not sure if that would be an issue in your case.

I also have no clue which implementations actually make use of that.

@yurishkuro
Copy link
Member

Jaeger clients will respect sampling.priority as described. The downside, as you mentioned, is that you can only make local decision, which can only apply to either future, or at best not-yet-finished (cf. jaegertracing/jaeger#1861), spans within the same process. That means the fast request may have still been sampled downstream (although they could use similar logic across the stack, which would help).

@rugpanov
Copy link
Author

rugpanov commented Nov 5, 2019

As a workaround, I am collecting all the operations under the trace by myself.
After the operation is finished, I have all its children with hierarchy and their durations and start times.
With all these data I am making a decision whether I should report the trace or not.

My solution requires O(n) additional space and O(n) complexity to collect data and parse it in case we're reporting the trace, but it gives me much more flexibility.

It still looks like there should be a more convenient solution to filter some traces when they're finishing to avoid redundant traffic and db memory consuming. Probably it can be my feature request.

@yurishkuro , my request duplicates https://github.com/jaegertracing/jaeger/issuesd/1861 , right?

@whiskeysierra
Copy link

I am collecting all the operations under the trace by myself.

Where do you do that?

@rugpanov
Copy link
Author

rugpanov commented Nov 5, 2019

To clarify: I do not use the tracing library API until all the data is collected and the decision is made.
I am collecting the data in my code. I made several interfaces to represent the trace and its operations, and when the decision is made, I use the data to transform it into calls to the tracing library.

@whiskeysierra
Copy link

How do you deal with the distributed nature of tracing? I mean your trace might span multiple services before it ends.

@rugpanov
Copy link
Author

rugpanov commented Nov 5, 2019

Currently, I do not have the problem - the infrastructure of my application is made in such a way, that all long operations are reported(not in real-time, but it doesn't matter for my case) in one place so I can catch them there and do not deal with solving distributed services problem.

@slto
Copy link

slto commented Sep 7, 2021

Is the proposed way to handle this via sampling.priority tag?

My use case is we have a library that wraps around JDBC calls to database. We are creating a span around the JDBC call. But when system is running normally these calls are fast enough and we don't need to see these spans in the trace. We want to see if we can enhance the code so it creates a span only if the time is above a threshold. As far as I can tell there is no API to abort/cancel a span. Setting sampling.priority would affect all subsequent spans and in this use case I just want this span not to be emitted. I am not sure if it's the right thing to do to constantly flip sampling.priority.

@aiguofer
Copy link

Well.. I've been googling for a while now trying to figure this out. We have a similar use-case with JDBC. In our case, we want to write a Proxy based wrapper for JDBC that records spans for any JDBC calls that make external requests. Since the interface API does not specify which methods fetch data from the underlying data source vs some form of cached state, it's very hard to do this manually. I was hoping to start a trace for all method calls but drop any traces that took less than 50 ms.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants