Skip to content
This repository has been archived by the owner on Aug 30, 2019. It is now read-only.

cmd/trace-agent: combine sampler rates #494

Merged
merged 8 commits into from
Oct 25, 2018
Merged

Conversation

raphaelgavache
Copy link
Member

@raphaelgavache raphaelgavache commented Oct 15, 2018

Traces are sampled by various independent samplers. If one decides to sample it, the trace is sampled. The usual samplers are: a scoreSampler and a prioritySampler.

In sampled traces we add the applied sampling_rate. This value is currently wrong as it does not take in account the sampling rate applied by the prioritySampler.

This PR aims at fixing the sample_rate values

  1. Fetching Priority sampler sample rate
    Retrieving client applied rate with the key _sampling_priority_rate_v1. If it does not exist, applying the next rate that will be sent to clients.

  2. Merging signature|error sampling rates with priority.
    Signature/Error samplers are run in parallel to the Priority sampler. Both are independent.
    Sampling rate is, signatureSamplingRate + prioritySamplingRate - signatureSamplingRate*prioritySamplingRate

  3. Add sampling rate to the pre-sampler and agent sampler rates that are already set in the trace

  4. Isolate sampling decision in a function and have specific tests for it

}

sampledTrace.Transactions = a.TransactionSampler.Extract(pt)
// TODO: attach to these transactions the client, pre-sampler and transaction sample rates.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure of what was the goal of Transactions.
Should I attach specific sample rates there somehow?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -24,7 +24,9 @@ import (
const (
// SamplingPriorityKey is the key of the sampling priority value in the metrics map of the root span
SamplingPriorityKey = "_sampling_priority_v1"
syncPeriod = 3 * time.Second
// SamplingPriorityRateKey is the key to get the sample rate applied to a trace
SamplingPriorityRateKey = "_sampling_priority_rate_v1"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to agree on the key format and it should be added to tracers payloads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you agree? Are we keeping this?

Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of questions here. I'd suggest we clarify all of them in Google Doc before starting the actual work. /cc @bmermet for questions regarding transactions.

It is not clear what this PR is solving. Also, I'd like very small tests running a suite of sub-tests that provide nothing but input and output so that it is clearly visible what is tested (e.g. check out table-driven tests). I'm happy to help with this. Next week I will spend time with you on this PR, hoping that you will be free. In the meantime, it'd be nice to put together a Google Doc where all the things that are outstanding questions here get clarified. Then, we can start the work on them.

@gbbr gbbr changed the title Fix sample_rate cmd/trace-agent: combine sampler rates Oct 23, 2018
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is good, but we need to improve a bit on the testing and on making it slightly more idiomatic in terms of Go. I hope my comments make sense.

// The sampler will keep or discard the trace, but we send everything so that it
// gets the big picture and can set the sampling rates accordingly.
samplers = append(samplers, a.PrioritySampler)
sampledTrace := a.sample(pt, hasPriority)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should look like this:

rate, ok := a.sample(pt)
if !ok {
    return
}
sampler.AddSampleRate(pt.Root, rate)

a.sampledTraceChan <- writer.SampledTrace{
    Trace: pt.Trace,
    Transactions: a.TransactionSampler.Extract(pt),
}

We shouldn't have to check if it's empty here. If we've reached this point, it must not be. If this check is needed here, it means it was missing somewhere else and we should add it there.

Also, if the Empty function is not used nor needed anywhere, let's remove it entirely.


// Trace sampling.
var sampledTrace writer.SampledTrace
func (a *Agent) sample(pt processedTrace, hasPriority bool) *writer.SampledTrace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should have the following signature, just like all the other samplers:

// sample returns true if the trace was sampled and the rate at which it was sampled.
func (a *Agent) sample(pt processedTrace) (rate float64, ok bool) {

The hasPriority value can be obtained at no cost by doing:

_, ok := pt.Root.Metrics[sampler.SamplingPriorityKey]

sampledTrace.Trace = &pt.Trace
}

sampledTrace.Transactions = a.TransactionSampler.Extract(pt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is called "sampler" it's not exactly the same as the other samplers in this function. Let's move it back to Process. It has no place here.

var sampledTrace writer.SampledTrace
func (a *Agent) sample(pt processedTrace, hasPriority bool) *writer.SampledTrace {
var sampledTrace writer.SampledTrace
var sampleRate, scoreSampleRate float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea when multiple things are of the same type, to prefix them with that type, instead of suffixing. For example:

var (
    rate1, rate2 float64
    ok1, ok2 bool
)

or

var (
    ratePriority, rateScore float64
    ok1, ok2 bool
)

As you prefer.

Similar to the standard library's net/http constants.

cmd/trace-agent/sampler.go Outdated Show resolved Hide resolved

// MergeParallelSamplingRates merges two rates from Sampler1, Sampler2. Both samplers law are independant,
// and {sampled} = {sampled by Sampler1} or {sampled by Sampler2}
func MergeParallelSamplingRates(rate1 float64, rate2 float64) float64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it CombineRates, it sounds much nicer when put into context. We should always consider the package name as part of the function name, in this case it's sampler.CombineRates vs. sampler.MergeParallelSamplingRates.

Do we need the world "parallel" or "merge"? I am not familiar with this formula.

sampler/scoresampler_test.go Show resolved Hide resolved
// Signature based on the hash of (env, service, name, resource, is_error) for the root, plus the set of
// (env, service, name, is_error) of each span.
func computeSignatureWithRootAndEnv(trace model.Trace, root *model.Span, env string) Signature {
func ComputeSignatureWithRootAndEnv(trace model.Trace, root *model.Span, env string) Signature {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the changes in this file, we won't need to use these. We shouldn't export something (technically making it public API) for the sake of having it usable in tests.

}
}

func TestSamplingRace(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this test, we don't need it. Tests already run in parallel and any races will be picked up by the race detector (if it is enabled).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I might have missed it.
What I'm testing here is parallel calls of a same trace signature on a unique sampler. This would trigger a race condition with the race test.

cmd/trace-agent/agent_test.go Show resolved Hide resolved
@raphaelgavache raphaelgavache force-pushed the raphael/fix_sampling_rate branch from b83d6e8 to b1882e2 Compare October 24, 2018 15:09
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making the changes. It is looking a lot better. You forgot to address some of my comments, and I've added a couple more.

cmd/trace-agent/agent.go Outdated Show resolved Hide resolved
if sampled {
sampledTrace.Trace = &pt.Trace
}
_, hasPriority := pt.Root.Metrics[sampler.SamplingPriorityKey]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _, ok := pt.Root.Metrics[sampler.SamplingPriorityKey]; ok {

}()
sampled = sampled || scoreSampled
sampleRate = sampler.MergeParallelSamplingRates(scoreSampleRate, sampleRate)
sampler.AddSampleRate(pt.Root, sampleRate)
Copy link
Contributor

@gbbr gbbr Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't follow this comment. Move this to Process and only apply it if the decision is to sample.

cmd/trace-agent/agent_test.go Show resolved Hide resolved
Service: "serv1",
Start: time.Now().UnixNano(),
Duration: (100 * time.Millisecond).Nanoseconds(),
Metrics: map[string]float64{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put all closing curly braces on the next line.

cmd/trace-agent/sampler.go Outdated Show resolved Hide resolved
atomic.AddUint64(&s.totalTraceCount, 1)

if s.engine.Sample(t.Trace, t.Root, t.Env) {
sampled, sampleRate = s.engine.Sample(t.Trace, t.Root, t.Env)
Copy link
Contributor

@gbbr gbbr Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow this comment. The word "sample" is repeated too much. Makes it hard to follow reading the code.

Suggested change
sampled, sampleRate = s.engine.Sample(t.Trace, t.Root, t.Env)
sampled, rate = s.engine.Sample(t.Trace, t.Root, t.Env)

sampler/coresampler.go Show resolved Hide resolved
SetTraceAppliedSampleRate(root, newRate)
}

// MockEngine mocks a sampler engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test component, so it must go in a test file, maybe coresampler_test.go?

@@ -24,7 +24,9 @@ import (
const (
// SamplingPriorityKey is the key of the sampling priority value in the metrics map of the root span
SamplingPriorityKey = "_sampling_priority_v1"
syncPeriod = 3 * time.Second
// SamplingPriorityRateKey is the key to get the sample rate applied to a trace
SamplingPriorityRateKey = "_sampling_priority_rate_v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you agree? Are we keeping this?

@@ -118,3 +119,12 @@ func (s *Sampler) logStats() {
}
}
}

type mockSamplerEngine struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must also go in a test file.


func newMockSampler(wantSampled bool, wantRate float64) *Sampler {
return &Sampler{
engine: sampler.NewMockEngine(wantSampled, wantRate)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curly on next line here too please 🙏

@raphaelgavache raphaelgavache force-pushed the raphael/fix_sampling_rate branch from 7c53e4f to 225e919 Compare October 25, 2018 12:43
Copy link
Contributor

@gbbr gbbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! I like how it turned out. Thanks for being patient with all the changes.

@gbbr gbbr merged commit 456dc89 into master Oct 25, 2018
@gbbr gbbr deleted the raphael/fix_sampling_rate branch October 25, 2018 16:34
@gbbr gbbr added this to the 6.7.0 milestone Oct 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants