-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[agent] Process data loss stats from clients #2010
[agent] Process data loss stats from clients #2010
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could use a bit more logging to my taste, but looks like a promising first draft.
2cbd9e3
to
756bc71
Compare
Codecov Report
@@ Coverage Diff @@
## master #2010 +/- ##
==========================================
+ Coverage 97.38% 97.41% +0.02%
==========================================
Files 206 207 +1
Lines 10136 10243 +107
==========================================
+ Hits 9871 9978 +107
Misses 222 222
Partials 43 43
Continue to review full report at Codecov.
|
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
ef1bc48
to
fae5f77
Compare
Signed-off-by: Yuri Shkuro <[email protected]>
this is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far. This PR assumes that clients will only talk to a single agent for the whole lifecycle of the client, which is a reasonable assumption. Once there's a PR ready to be tested on the client side, I'll run an end-to-end manual test.
@jpkrohling the working client version is already included here as dependency (jaegertracing/jaeger-client-go#482), and the functionality can be tested with all-in-one (since query service is self-tracing). |
Signed-off-by: Yuri Shkuro <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just figure out the debug-logging discussion and add the expected gauges/counters to a test, and it's ready to be merged.
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
@@ -25,22 +25,28 @@ import ( | |||
|
|||
// ProxyBuilder holds objects communicating with collector | |||
type ProxyBuilder struct { | |||
reporter reporter.Reporter | |||
reporter *reporter.ClientMetricsReporter | |||
manager configmanager.ClientConfigManager | |||
tchanRep *Reporter | |||
} | |||
|
|||
// NewCollectorProxy creates ProxyBuilder | |||
func NewCollectorProxy(builder *Builder, mFactory metrics.Factory, logger *zap.Logger) (*ProxyBuilder, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little bit irrelevant to data loss, we don't need to change it in this PR. But usually Builder.NewXXX means build a new XXX, seems more reasonable for NewCollectorProxy to create a CollectorProxy instead of ProxyBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the naming here has become weird overtime.
@@ -43,10 +43,6 @@ func (c *tchanCollectorClient) SubmitBatches(ctx thrift.Context, batches []*Batc | |||
} | |||
success, err := c.client.Call(ctx, c.thriftService, "submitBatches", &args, &resp) | |||
if err == nil && !success { | |||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems generated deletion looks weird. Why no logging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't help it, it's how the recent generator works. But the new code looks correct - propagating the error up, rather than introducing that odd override.
Also, low-level libraries are not supposed to log.
tchanRep: r, | ||
reporter: reporter.WrapWithMetrics(r, tchannelMetrics), | ||
manager: configmanager.WrapWithMetrics(tchannel.NewConfigManager(r.CollectorServiceName(), r.Channel()), tchannelMetrics), | ||
tchanRep: tchanRep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to export tchanRep? Is tchanRep used any where? If not, we could remove tchanRep from below struct:
type ProxyBuilder struct { reporter reporter.Reporter manager configmanager.ClientConfigManager tchanRep *Reporter }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's unfortunate side effect of the way constructors are defined. I think the Sampling server depends on the underlying tchannel that is being created here.
@@ -39,9 +39,16 @@ func NewCollectorProxy(builder *ConnBuilder, agentTags map[string]string, mFacto | |||
return nil, err | |||
} | |||
grpcMetrics := mFactory.Namespace(metrics.NSOptions{Name: "", Tags: map[string]string{"protocol": "grpc"}}) | |||
r1 := NewReporter(conn, agentTags, logger) | |||
r2 := reporter.WrapWithMetrics(r1, grpcMetrics) | |||
r3 := reporter.WrapWithClientMetrics(reporter.ClientMetricsReporterParams{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have a generic WrappedReporter? Essentially r2 & r3 are wrapped reporter with callbacks around EmitBatch and EmitZipkinBatch. It will be great if we can have chained wrapper, something like:
NewReporter(...).wrapWith().wrapWith().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- in what sense "generic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought it through, not sure if it's totally possible or not. Some rough idea as below:
type WrappedReporter interface {
EmitZipkinBatch(spans []*zipkincore.Span) (err error)
EmitBatch(batch *jaeger.Batch) (err error)
WrapWith(func onEmitZipkinBatch, func onEmitBatch) (*WrappedReporter, error)
}
We may need to solve the problem whether the callback is called before or after EmitBatch, seems like metrics.go & client_metrics.go do those operations in different order, one before while the other one after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what benefits would this provide? The two existing decorators have specific things they are doing before or after calling the underlying, there's little room to generalize there. And WrapWith as part of the interface smells like an anti-pattern due to unnecessary coupling, i.e. each of the two decorators does not need to know if it itself is being decorated further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I read the code, I felt confused when I first saw metrics.go and client_metrics.go, after spent some time reading, I understood they are reporters wrapped with different call backs.
Regarding detailed implementation of the callbacks, yes they will be very different and no need to be generalized. The generalization is for better readability and extendability I guess, it will be easy to wrap another callback around emit calls in the future instead of implement a new reader if necessary .
I agree, the decorators don't need to know if they will be decorated further or not. I just got the idea from chained promise from javascript, it will be good if we can support similar feature here. This way, you don't need:
r1 := NewReporter(conn, agentTags, logger)
r2 := reporter.WrapWithMetrics(r1, grpcMetrics)
r3 := reporter.WrapWithClientMetrics(...)
You just need:
NewReporter(...).wrapWith(//r2 functionalities).wrapWith(//r3 functinoalities)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, it just take me sometime to understand the difference between metrics.go (reporter wrapped with metrics) and client_metrics.go (reporter wrapped with client metrics). It will be great if we can generalize the wrapped reporter, if it's not feasible, we can merge the PR as of now.
I think I addressed @jpkrohling comments, so merging. |
Which problem is this PR solving?
Short description of the changes
jaegerThrift.Batch
, perclient-uuid
This is how new metrics look:
TODO