Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support trace-level sampling #7

Merged
merged 2 commits into from
Dec 20, 2020
Merged

Support trace-level sampling #7

merged 2 commits into from
Dec 20, 2020

Conversation

lilymara-onesignal
Copy link
Contributor

The libhoney-rust library has a sample_rate configuration option,
which provides random sampling of events sent to honeycomb.
Unfortunately, this sampling is done at the event level, rather than the
trace level. This means that spans or events belonging to a single trace
will have separate sampling decisions made for them. This in turn means
that when libhoney's sampling is enabled, traces will never have
complete span information within them. This severely limits the utility
of this sampling.

This proposed change introduces sampling based on the trace ID, meaning
that spans and events which are a part of the same trace will always
have the same sampling decision made for them. This will ensure that a
complete picture of the trace is sent up to honeycomb.

There are no breaking changes in behavior or API in this changeset, you
can still use the sampling logic built into libhoney if desired, but the
new function new_honeycomb_telemetry_layer_with_trace_sampling has
been introduced, which creates a telemetry layer that uses this new
sampling logic.

Alternatives:

  • Use the sample_rate from the libhoney config struct as the
    trace-level sample rate, and reset the field to 1 before giving it to
    libhoney. This is undesirable as it makes the current behavior
    impossible to use, however it also makes it impossible to misuse this
    new function by providing a sample_rate to this function, and libhoney
    at the same time.
  • Perform this trace-level sampling within libhoney. This is
    undesirable as it would require either parsing the traceID from the
    string provided to libhoney, or using a hash of this string. It is
    simpler to do the calculation in this library, where we have the raw
    trace ID. This would also be a breaking change for libhoney, whereas
    here it can be opt-in.

Example

I ran the following code with libhoney's sample rate set to 5, and the new trace-level sample rate set to 5.

fn main() {
    register_global_subscriber();

    for i in 0..10 {
        let s = info_span!("running process loop", i);
        let _g = s.enter();

        register_dist_tracing_root(TraceId::generate(), None).unwrap();

        foo();
    }

    std::thread::sleep(std::time::Duration::from_secs(5));
}

#[instrument]
fn foo() {
    sleep(Duration::from_millis(500));
    bar();
}

#[instrument]
fn bar() {
    sleep(Duration::from_millis(500));
    baz();
}


#[instrument]
fn baz() {
    sleep(Duration::from_millis(500));
}

This screenshot illustrates the change in behavior between the current libhoney sampling, and the new trace-level sampling.

old-vs-new-sampling-behavior

The libhoney-rust library has a `sample_rate` configuration option,
which provides random sampling of events sent to honeycomb.
Unfortunately, this sampling is done at the event level, rather than the
trace level. This means that spans or events belonging to a single trace
will have separate sampling decisions made for them. This in turn means
that when libhoney's sampling is enabled, traces will never have
complete span information within them. This severely limits the utility
of this sampling.

This proposed change introduces sampling based on the trace ID, meaning
that spans and events which are a part of the same trace will always
have the same sampling decision made for them. This will ensure that a
complete picture of the trace is sent up to honeycomb.

There are no breaking changes in behavior or API in this changeset, you
can still use the sampling logic built into libhoney if desired, but the
new function `new_honeycomb_telemetry_layer_with_trace_sampling` has
been introduced, which creates a telemetry layer that uses this new
sampling logic.

Alternatives:
- Use the `sample_rate` from the `libhoney` config struct as the
  trace-level sample rate, and reset the field to 1 before giving it to
  libhoney. This is undesirable as it makes the current behavior
  impossible to use, however it also makes it impossible to misuse this
  new function by providing a sample_rate to this function, and libhoney
  at the same time.
- Perform this trace-level sampling within `libhoney`. This is
  undesirable as it would require either parsing the traceID from the
  string provided to libhoney, or using a hash of this string. It is
  simpler to do the calculation in this library, where we have the raw
  trace ID. This would also be a breaking change for libhoney, whereas
  here it can be opt-in.
@Fishrock123
Copy link
Contributor

I'll need this soon too. @inanna-malick are you able to take a look sometime soon?

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the constructor api could be reworked, but that can wait for 0.3 imo.

@@ -1,6 +1,6 @@
[package]
name = "tracing-honeycomb"
version = "0.2.0"
version = "0.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw 0.2 isn't even out yet, see #5

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.2 is out, I just never published it (sorry, 2020 has been.. very 2020 recently) https://github.com/inanna-malick/tracing-honeycomb/blob/master/tracing-honeycomb/Cargo.toml#L3

return true;
}

trace_id.0 as usize % self.sample_rate == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever but I would have expected the sample rate to be a 0 to 1 kind of range - is this sort of sample rate common in other tracing libraries?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also expect a float of some sort here - maybe an f32, with a 0 to 1 range

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it looks like you're using sample_rate <= 1 to signify 'sample everything' - how do you feel about making this an Option<SampleRate> instead, where type SampleRate = u32 (or f32 or w/e)

Copy link
Owner

@inanna-malick inanna-malick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, really the only thing I'd ask you to change is making sampling rate an Option. I'm fine with using a usize for the sampling rate, after talking to some friends it seems like that is a pretty standard way of handling sampling

@lilymara-onesignal
Copy link
Contributor Author

Thanks for the review @inanna-malick! I pushed up a fixup commit which changes the implementation to use an Option instead of a sentinel value. I did also change the rate type from a usize to a u128 though, in order to avoid casting the TraceID to a usize every time - because I believe this operation may panic if the u128 doesn't fit into a usize.

@inanna-malick
Copy link
Owner

This is great, thanks!

@inanna-malick inanna-malick merged commit d1c0290 into inanna-malick:master Dec 20, 2020
@lilymara-onesignal lilymara-onesignal deleted the trace-sampling branch December 23, 2020 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants