-
Notifications
You must be signed in to change notification settings - Fork 31
cmd/trace-agent: combine sampler rates #494
Conversation
ed7bcf1
to
fa7f681
Compare
fa7f681
to
b83d6e8
Compare
cmd/trace-agent/agent.go
Outdated
} | ||
|
||
sampledTrace.Transactions = a.TransactionSampler.Extract(pt) | ||
// TODO: attach to these transactions the client, pre-sampler and transaction sample rates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure of what was the goal of Transactions.
Should I attach specific sample rates there somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can forget about that TODO this will be addressed in https://github.com/DataDog/datadog-trace-agent/compare/benjamin/new-transaction-sampling?expand=1
@@ -24,7 +24,9 @@ import ( | |||
const ( | |||
// SamplingPriorityKey is the key of the sampling priority value in the metrics map of the root span | |||
SamplingPriorityKey = "_sampling_priority_v1" | |||
syncPeriod = 3 * time.Second | |||
// SamplingPriorityRateKey is the key to get the sample rate applied to a trace | |||
SamplingPriorityRateKey = "_sampling_priority_rate_v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to agree on the key format and it should be added to tracers payloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you agree? Are we keeping this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of questions here. I'd suggest we clarify all of them in Google Doc before starting the actual work. /cc @bmermet for questions regarding transactions.
It is not clear what this PR is solving. Also, I'd like very small tests running a suite of sub-tests that provide nothing but input and output so that it is clearly visible what is tested (e.g. check out table-driven tests). I'm happy to help with this. Next week I will spend time with you on this PR, hoping that you will be free. In the meantime, it'd be nice to put together a Google Doc where all the things that are outstanding questions here get clarified. Then, we can start the work on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is good, but we need to improve a bit on the testing and on making it slightly more idiomatic in terms of Go. I hope my comments make sense.
cmd/trace-agent/agent.go
Outdated
// The sampler will keep or discard the trace, but we send everything so that it | ||
// gets the big picture and can set the sampling rates accordingly. | ||
samplers = append(samplers, a.PrioritySampler) | ||
sampledTrace := a.sample(pt, hasPriority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should look like this:
rate, ok := a.sample(pt)
if !ok {
return
}
sampler.AddSampleRate(pt.Root, rate)
a.sampledTraceChan <- writer.SampledTrace{
Trace: pt.Trace,
Transactions: a.TransactionSampler.Extract(pt),
}
We shouldn't have to check if it's empty here. If we've reached this point, it must not be. If this check is needed here, it means it was missing somewhere else and we should add it there.
Also, if the Empty
function is not used nor needed anywhere, let's remove it entirely.
cmd/trace-agent/agent.go
Outdated
|
||
// Trace sampling. | ||
var sampledTrace writer.SampledTrace | ||
func (a *Agent) sample(pt processedTrace, hasPriority bool) *writer.SampledTrace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should have the following signature, just like all the other samplers:
// sample returns true if the trace was sampled and the rate at which it was sampled.
func (a *Agent) sample(pt processedTrace) (rate float64, ok bool) {
The hasPriority
value can be obtained at no cost by doing:
_, ok := pt.Root.Metrics[sampler.SamplingPriorityKey]
cmd/trace-agent/agent.go
Outdated
sampledTrace.Trace = &pt.Trace | ||
} | ||
|
||
sampledTrace.Transactions = a.TransactionSampler.Extract(pt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is called "sampler" it's not exactly the same as the other samplers in this function. Let's move it back to Process
. It has no place here.
cmd/trace-agent/agent.go
Outdated
var sampledTrace writer.SampledTrace | ||
func (a *Agent) sample(pt processedTrace, hasPriority bool) *writer.SampledTrace { | ||
var sampledTrace writer.SampledTrace | ||
var sampleRate, scoreSampleRate float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good idea when multiple things are of the same type, to prefix them with that type, instead of suffixing. For example:
var (
rate1, rate2 float64
ok1, ok2 bool
)
or
var (
ratePriority, rateScore float64
ok1, ok2 bool
)
As you prefer.
Similar to the standard library's net/http
constants.
sampler/coresampler.go
Outdated
|
||
// MergeParallelSamplingRates merges two rates from Sampler1, Sampler2. Both samplers law are independant, | ||
// and {sampled} = {sampled by Sampler1} or {sampled by Sampler2} | ||
func MergeParallelSamplingRates(rate1 float64, rate2 float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it CombineRates
, it sounds much nicer when put into context. We should always consider the package name as part of the function name, in this case it's sampler.CombineRates
vs. sampler.MergeParallelSamplingRates
.
Do we need the world "parallel" or "merge"? I am not familiar with this formula.
sampler/signature.go
Outdated
// Signature based on the hash of (env, service, name, resource, is_error) for the root, plus the set of | ||
// (env, service, name, is_error) of each span. | ||
func computeSignatureWithRootAndEnv(trace model.Trace, root *model.Span, env string) Signature { | ||
func ComputeSignatureWithRootAndEnv(trace model.Trace, root *model.Span, env string) Signature { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the changes in this file, we won't need to use these. We shouldn't export something (technically making it public API) for the sake of having it usable in tests.
cmd/trace-agent/agent_test.go
Outdated
} | ||
} | ||
|
||
func TestSamplingRace(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this test, we don't need it. Tests already run in parallel and any races will be picked up by the race detector (if it is enabled).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I might have missed it.
What I'm testing here is parallel calls of a same trace signature on a unique sampler. This would trigger a race condition with the race test.
b83d6e8
to
b1882e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making the changes. It is looking a lot better. You forgot to address some of my comments, and I've added a couple more.
cmd/trace-agent/agent.go
Outdated
if sampled { | ||
sampledTrace.Trace = &pt.Trace | ||
} | ||
_, hasPriority := pt.Root.Metrics[sampler.SamplingPriorityKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _, ok := pt.Root.Metrics[sampler.SamplingPriorityKey]; ok {
cmd/trace-agent/agent.go
Outdated
}() | ||
sampled = sampled || scoreSampled | ||
sampleRate = sampler.MergeParallelSamplingRates(scoreSampleRate, sampleRate) | ||
sampler.AddSampleRate(pt.Root, sampleRate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't follow this comment. Move this to Process
and only apply it if the decision is to sample.
cmd/trace-agent/agent_test.go
Outdated
Service: "serv1", | ||
Start: time.Now().UnixNano(), | ||
Duration: (100 * time.Millisecond).Nanoseconds(), | ||
Metrics: map[string]float64{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put all closing curly braces on the next line.
cmd/trace-agent/sampler.go
Outdated
atomic.AddUint64(&s.totalTraceCount, 1) | ||
|
||
if s.engine.Sample(t.Trace, t.Root, t.Env) { | ||
sampled, sampleRate = s.engine.Sample(t.Trace, t.Root, t.Env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow this comment. The word "sample" is repeated too much. Makes it hard to follow reading the code.
sampled, sampleRate = s.engine.Sample(t.Trace, t.Root, t.Env) | |
sampled, rate = s.engine.Sample(t.Trace, t.Root, t.Env) |
sampler/coresampler.go
Outdated
SetTraceAppliedSampleRate(root, newRate) | ||
} | ||
|
||
// MockEngine mocks a sampler engine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test component, so it must go in a test file, maybe coresampler_test.go
?
@@ -24,7 +24,9 @@ import ( | |||
const ( | |||
// SamplingPriorityKey is the key of the sampling priority value in the metrics map of the root span | |||
SamplingPriorityKey = "_sampling_priority_v1" | |||
syncPeriod = 3 * time.Second | |||
// SamplingPriorityRateKey is the key to get the sample rate applied to a trace | |||
SamplingPriorityRateKey = "_sampling_priority_rate_v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you agree? Are we keeping this?
cmd/trace-agent/sampler.go
Outdated
@@ -118,3 +119,12 @@ func (s *Sampler) logStats() { | |||
} | |||
} | |||
} | |||
|
|||
type mockSamplerEngine struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must also go in a test file.
cmd/trace-agent/sampler.go
Outdated
|
||
func newMockSampler(wantSampled bool, wantRate float64) *Sampler { | ||
return &Sampler{ | ||
engine: sampler.NewMockEngine(wantSampled, wantRate)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curly on next line here too please 🙏
7c53e4f
to
225e919
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! I like how it turned out. Thanks for being patient with all the changes.
Traces are sampled by various independent samplers. If one decides to sample it, the trace is sampled. The usual samplers are: a scoreSampler and a prioritySampler.
In sampled traces we add the applied
sampling_rate
. This value is currently wrong as it does not take in account the sampling rate applied by the prioritySampler.This PR aims at fixing the sample_rate values
Fetching Priority sampler sample rate
Retrieving client applied rate with the key
_sampling_priority_rate_v1
. If it does not exist, applying the next rate that will be sent to clients.Merging signature|error sampling rates with priority.
Signature/Error samplers are run in parallel to the Priority sampler. Both are independent.
Sampling rate is, signatureSamplingRate + prioritySamplingRate - signatureSamplingRate*prioritySamplingRate
Add sampling rate to the pre-sampler and agent sampler rates that are already set in the trace
Isolate sampling decision in a function and have specific tests for it