-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail sampling #263
Tail sampling #263
Conversation
Deploying logfire-docs with Cloudflare Pages
|
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
def shutdown(self) -> None: | ||
self.processor.shutdown() | ||
|
||
def force_flush(self, timeout_millis: int = 30000) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the method we were using to get logfire to work with AWS lambda? If so, I guess we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's moved to the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp I guess you merged this already but please address comments
tail_sampling: TailSamplingOptions | None | ||
"""Tail sampling options""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just be a typeddict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, I've also wanted ConsoleOptions
and PydanticPlugin
to be typed dicts, but they seemed worse. They can't define defaults in the class, and passing in a plain dict is actually not that user friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
# Avoid using the usual sampler if we're using tail-based sampling. | ||
# The TailSamplingProcessor will handle the random sampling part as well. | ||
sampler = ( | ||
ParentBasedTraceIdRatio(self.trace_sample_rate) | ||
if self.trace_sample_rate < 1 and self.tail_sampling is None | ||
else None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we document / tell users they can't combine these? Is there a world where I want to head sample down to 10% (to reduce overhead in the SDK) and then tail sample down to 1%? There's still advantages to head sampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can combine them, see the PR body or the new test_random_sampling
.
Is there a world where I want to head sample down to 10% (to reduce overhead in the SDK) and then tail sample down to 1%?
I don't know what this means. Tail sample down to a percentage?
It sounds like you want to be able to discard most spans up front randomly regardless of whether tail-sampling would include them, so that a span only gets through if it's 'notable' AND 'lucky'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly
return self.started[0][0] | ||
|
||
|
||
class TailSamplingProcessor(WrapperSpanProcessor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really starting to feel like we should stop adding layers / wrapping processors like this. I would prefer to have a single processor that handles everything (sampling, batching, retries, etc.). I feel like it could be optimized more and would be easier to understand. It can still have pluggable bits just more explicit and less abstract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This layer wraps all processors, including the console and user-defined processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that. But I feel like we should just make LogfireSpanProcessor
which does all of those things in one place. In particular we avoid double buffering.
self.duration: float = ( | ||
float('inf') if options.duration is None else options.duration * ONE_SECOND_IN_NANOSECONDS | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like these durations to have the unit in their name. So self.duration
-> self.duration_ns
and TailsamplingOptions.duration
-> TailSamplingOptions.duration_sec
or something like that.
There's some details which I'm unsure about such as defaults, but I'd like to get this through so we can test it out for ourselves. It could be quite useful in our backend, especially in places where we're using random trace sampling. Documentation and maybe some tweaking (including env var configuration) can come after we've used it.
Usage in a nutshell:
Improper usage will eat up memory as the spans are buffered.