-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HTTP reporter with support for sending auth credentials #526
Added HTTP reporter with support for sending auth credentials #526
Conversation
This PR adds auth support for the Agent when communicating with the Collector. It does not add auth support for communication between the Client (Tracer) and the Agent. At the current state, I need a few clarifications, specially regarding the sampling strategies and baggage restriction features. It does not look like they are fully implemented and I didn't find them exposed via regular HTTP endpoint, so, I did not implement this part. It also requires some documentation, but basically, this is how one should start the Agent to send an OAuth token with every request:
When a batch can't be submitted, it prints the following log:
As far as I know, this deviates from the TChannel reporter, which fails silently. |
By the way: besides the tests included in this PR, I also did some manual testing. With a properly configured Keycloak and an Auth Proxy running on port 8180 (see blog post on the Jaeger blog, soon to be published), the
An auth token can be obtained via:
Then, the agent can be started as:
Finally, a target application can just be started as usual and the client will send spans via UDP as it would normally do and I confirmed that spans generated by this application were visible on the Query UI. |
|
||
var httpClient = &http.Client{Timeout: 2 * time.Second} | ||
|
||
// Reporter forwards received spans to central collector tier over TChannel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over TChannel?
"github.com/uber/jaeger/cmd/collector/app/zipkin" | ||
"github.com/uber/jaeger/thrift-gen/jaeger" | ||
"github.com/uber/jaeger/thrift-gen/zipkincore" | ||
tchanThrift "github.com/uber/tchannel-go/thrift" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second group
cmd/agent/app/flags.go
Outdated
@@ -56,6 +60,22 @@ func AddFlags(flags *flag.FlagSet) { | |||
"", | |||
"comma-separated string representing host:ports of a static list of collectors to connect to directly (e.g. when not using service discovery)") | |||
flags.String( | |||
scheme, | |||
"http", | |||
"protocol scheme to use when talking to the collector") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scheme for the current protocol (agent-collector)? Can be made that default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. If TChannel is over HTTP (I don't think it is), then it's certainly http
:) In any case, this is either http
or https
, as it's used only for the new reporter and will be ignored when the TChannel reporter is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am asking because this flag adds more confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a clarification at the end:
used only when auth-related properties are specified
Hopefully, this clears up the confusion.
Scratch that. It does complain when it can't submit a batch:
|
func (b *Builder) useTChannelReporter() bool { | ||
// if we don't have credentials, we use the tchannel reporter | ||
// if we have an auth token or a pair of username+password, we should use the http reporter | ||
return b.AuthToken == "" && (b.Username == "" || b.Password == "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to introduce these indirect conditions. We should have a cmd line argument telling which reporter to use (default to tchannel). Then HTTP reporter can in addition check the other params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A counter-argument is that we keep the actual implementation hidden behind the requirements: if we see auth data, we know we need to perform auth and we shall use whatever transport provides this feature. Once we change to gRPC, this distinction should disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's precisely the hidden magic that goes against Go's "no magic" principle - adding an argument changes which transport is used, which port needs to be configured, etc. I would much rather be explicit. When we implement gRPC we can revisit.
c.HostPort = defaultHTTPServerHostPort | ||
func (b *Builder) GetHTTPServer(r reporter.Reporter, mFactory metrics.Factory) *http.Server { | ||
// TODO: this manager is used for the sampling and baggage restrictions, not sure we need for this here: | ||
// is there a non-tchannel sampling/baggage restriction endpoint on the collector side? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is non-thrift endpoint for those.
func (b *Builder) GetHTTPServer(r reporter.Reporter, mFactory metrics.Factory) *http.Server { | ||
// TODO: this manager is used for the sampling and baggage restrictions, not sure we need for this here: | ||
// is there a non-tchannel sampling/baggage restriction endpoint on the collector side? | ||
// for now, we let it be nil for non-TChannel reporters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how it would work then, when clients request their configs, the nil mgr will cause a panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be done then? Should I just implement a dummy endpoint, returning some default data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like the weak spot of this PR - we're trying to hack just one flow between the agent and collector where in practice we need to replace the whole API with TLS-capable transport. Maybe we shouldn't do it piecemeal and just Do The Right Thing by implementing gRPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have misunderstood this back in November. Based on the last meeting, looks like we want gRPC in addition to this, but then, the problem reported on this comment would still exist, wouldn't it?
zipkinBatches = "zipkin" | ||
) | ||
|
||
type batchMetrics struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this different from tchannel reporter? seems like it could be reused, e.g. via reporter/metrics
package
|
||
authToken string | ||
username string | ||
password string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really a possibility that someone will be running the agent with basic auth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something. I understand that this reporter is to be used for the communication between the Agent and Collector. This is not for protecting the agent.
|
||
// Endpoint returns the endpoint used when communicating with the remote collector | ||
func (r *Reporter) Endpoint() string { | ||
// TODO: do we want to do client-side load balancing? Or use a retry logic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we do, if we care about performance. TChannel always sends to less-busy connection internally.
We could use LB from go-kit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be OK to have it in a next iteration, or is it required for this one? I would then create an issue and assign to myself.
@@ -57,6 +57,7 @@ type Reporter struct { | |||
peerListMgr *peerlistmgr.PeerListManager | |||
batchesMetrics map[string]batchMetrics | |||
logger *zap.Logger | |||
mFactory metrics.Factory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to store it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this is a leftover from a previous version.
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
I'm not up to date on what's expected from Travis. Looks like |
@yurishkuro : is the direction of this PR still the path we want to follow, or should I close this? |
My main concern is that this PR is affecting a single flow between the agent and collector, rather than holistically all flows. I am not a big fan of doing that, especially when it comes to security features. There are other ways people can secure connections between agent and collector, e.g. with ssh tunnels. My preference is we wait for gRPC switch. |
Is there a task already for this switch? I think @pavolloffay had interest in doing this and I could also work on it. In any case, I'm closing this one as I don't think we want to invest more time in it. |
I have created #673 |
Signed-off-by: Juraci Paixão Kröhling [email protected]