Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HTTP reporter with support for sending auth credentials #526

Closed
wants to merge 2 commits into from
Closed

Added HTTP reporter with support for sending auth credentials #526

wants to merge 2 commits into from

Conversation

jpkrohling
Copy link
Contributor

Signed-off-by: Juraci Paixão Kröhling [email protected]

@jpkrohling
Copy link
Contributor Author

This PR adds auth support for the Agent when communicating with the Collector. It does not add auth support for communication between the Client (Tracer) and the Agent.

At the current state, I need a few clarifications, specially regarding the sampling strategies and baggage restriction features. It does not look like they are fully implemented and I didn't find them exposed via regular HTTP endpoint, so, I did not implement this part.

It also requires some documentation, but basically, this is how one should start the Agent to send an OAuth token with every request:

go run cmd/agent/main.go --collector.host-port 192.168.178.20:8180 --collector.auth-token eyJhbGciOiJ...

When a batch can't be submitted, it prints the following log:

{"level":"error","ts":1510314302.2487879,"caller":"http/reporter.go:214","msg":"Could not submit jaeger batch","error":"failed to submit batch: 401 Unauthorized","stacktrace":"github.com/uber/jaeger/cmd/agent/app/reporter/http.(*Reporter).submitAndReport\n\t/mnt/storage/jpkroehling/Projects/src/github.com/uber/jaeger/cmd/agent/app/reporter/http/reporter.go:214\ngithub.com/uber/jaeger/cmd/agent/app/reporter/http.(*Reporter).EmitBatch\n\t/mnt/storage/jpkroehling/Projects/src/github.com/uber/jaeger/cmd/agent/app/reporter/http/reporter.go:200\ngithub.com/uber/jaeger/thrift-gen/jaeger.(*agentProcessorEmitBatch).Process\n\t/mnt/storage/jpkroehling/Projects/src/github.com/uber/jaeger/thrift-gen/jaeger/agent.go:137\ngithub.com/uber/jaeger/thrift-gen/jaeger.(*AgentProcessor).Process\n\t/mnt/storage/jpkroehling/Projects/src/github.com/uber/jaeger/thrift-gen/jaeger/agent.go:111\ngithub.com/uber/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\t/mnt/storage/jpkroehling/Projects/src/github.com/uber/jaeger/cmd/agent/app/processors/thrift_processor.go:110"}                               

As far as I know, this deviates from the TChannel reporter, which fails silently.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 99.644% when pulling 846ac77 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@jpkrohling
Copy link
Contributor Author

By the way: besides the tests included in this PR, I also did some manual testing. With a properly configured Keycloak and an Auth Proxy running on port 8180 (see blog post on the Jaeger blog, soon to be published), the all-in-one can be started like this (without exposing the agent ports):

docker run \
    -p 16686:16686 \
    -p 14268:14268 \
    --name=jaeger \
    jaegertracing/all-in-one:latest

An auth token can be obtained via:

curl \
    -X POST \
    -u instrumented-application:THE_SECRET \
    http://YOUR_IP:8080/auth/realms/jaeger/protocol/openid-connect/token \
    -d 'grant_type=client_credentials'

Then, the agent can be started as:

go run cmd/agent/main.go --collector.host-port 192.168.178.20:8180 --collector.auth-token eyJhbGciOiJ...

Finally, a target application can just be started as usual and the client will send spans via UDP as it would normally do and I confirmed that spans generated by this application were visible on the Query UI.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.4%) to 99.644% when pulling 846ac77 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 99.688% when pulling 9fbcc71 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 99.748% when pulling 9fbcc71 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.1%) to 99.851% when pulling 12db414 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.


var httpClient = &http.Client{Timeout: 2 * time.Second}

// Reporter forwards received spans to central collector tier over TChannel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

over TChannel?

"github.com/uber/jaeger/cmd/collector/app/zipkin"
"github.com/uber/jaeger/thrift-gen/jaeger"
"github.com/uber/jaeger/thrift-gen/zipkincore"
tchanThrift "github.com/uber/tchannel-go/thrift"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second group

@@ -56,6 +60,22 @@ func AddFlags(flags *flag.FlagSet) {
"",
"comma-separated string representing host:ports of a static list of collectors to connect to directly (e.g. when not using service discovery)")
flags.String(
scheme,
"http",
"protocol scheme to use when talking to the collector")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scheme for the current protocol (agent-collector)? Can be made that default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. If TChannel is over HTTP (I don't think it is), then it's certainly http :) In any case, this is either http or https, as it's used only for the new reporter and will be ignored when the TChannel reporter is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking because this flag adds more confusion

Copy link
Contributor Author

@jpkrohling jpkrohling Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a clarification at the end:

used only when auth-related properties are specified

Hopefully, this clears up the confusion.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 43e2944 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 43e2944 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@jpkrohling
Copy link
Contributor Author

As far as I know, this deviates from the TChannel reporter, which fails silently.

Scratch that. It does complain when it can't submit a batch:

{"level":"error","ts":1510323644.892156,"caller":"tchannel/reporter.go:133","msg":"Could not submit jaeger batch","error":"no peers available","stacktrace":"..."}

@jpkrohling jpkrohling changed the title Added HTTP reporter with support for sending auth credentials [WIP] Added HTTP reporter with support for sending auth credentials Nov 10, 2017
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 5eeaef3 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@jpkrohling jpkrohling changed the title [WIP] Added HTTP reporter with support for sending auth credentials Added HTTP reporter with support for sending auth credentials Nov 10, 2017
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 1c81420 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling dff2c16 on jpkrohling:JPK-AddedSenderWithAuth into a2ed9b8 on jaegertracing:master.

func (b *Builder) useTChannelReporter() bool {
// if we don't have credentials, we use the tchannel reporter
// if we have an auth token or a pair of username+password, we should use the http reporter
return b.AuthToken == "" && (b.Username == "" || b.Password == "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to introduce these indirect conditions. We should have a cmd line argument telling which reporter to use (default to tchannel). Then HTTP reporter can in addition check the other params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A counter-argument is that we keep the actual implementation hidden behind the requirements: if we see auth data, we know we need to perform auth and we shall use whatever transport provides this feature. Once we change to gRPC, this distinction should disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's precisely the hidden magic that goes against Go's "no magic" principle - adding an argument changes which transport is used, which port needs to be configured, etc. I would much rather be explicit. When we implement gRPC we can revisit.

c.HostPort = defaultHTTPServerHostPort
func (b *Builder) GetHTTPServer(r reporter.Reporter, mFactory metrics.Factory) *http.Server {
// TODO: this manager is used for the sampling and baggage restrictions, not sure we need for this here:
// is there a non-tchannel sampling/baggage restriction endpoint on the collector side?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is non-thrift endpoint for those.

func (b *Builder) GetHTTPServer(r reporter.Reporter, mFactory metrics.Factory) *http.Server {
// TODO: this manager is used for the sampling and baggage restrictions, not sure we need for this here:
// is there a non-tchannel sampling/baggage restriction endpoint on the collector side?
// for now, we let it be nil for non-TChannel reporters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how it would work then, when clients request their configs, the nil mgr will cause a panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be done then? Should I just implement a dummy endpoint, returning some default data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like the weak spot of this PR - we're trying to hack just one flow between the agent and collector where in practice we need to replace the whole API with TLS-capable transport. Maybe we shouldn't do it piecemeal and just Do The Right Thing by implementing gRPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have misunderstood this back in November. Based on the last meeting, looks like we want gRPC in addition to this, but then, the problem reported on this comment would still exist, wouldn't it?

zipkinBatches = "zipkin"
)

type batchMetrics struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this different from tchannel reporter? seems like it could be reused, e.g. via reporter/metrics package


authToken string
username string
password string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really a possibility that someone will be running the agent with basic auth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something. I understand that this reporter is to be used for the communication between the Agent and Collector. This is not for protecting the agent.


// Endpoint returns the endpoint used when communicating with the remote collector
func (r *Reporter) Endpoint() string {
// TODO: do we want to do client-side load balancing? Or use a retry logic?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we do, if we care about performance. TChannel always sends to less-busy connection internally.

We could use LB from go-kit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be OK to have it in a next iteration, or is it required for this one? I would then create an issue and assign to myself.

@@ -57,6 +57,7 @@ type Reporter struct {
peerListMgr *peerlistmgr.PeerListManager
batchesMetrics map[string]batchMetrics
logger *zap.Logger
mFactory metrics.Factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to store it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is a leftover from a previous version.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 100.0% when pulling 5ee16fa on jpkrohling:JPK-AddedSenderWithAuth into acfbf29 on jaegertracing:master.

@jpkrohling
Copy link
Contributor Author

I'm not up to date on what's expected from Travis. Looks like master is also failing, so, I don't think it's related to this PR.

@jpkrohling
Copy link
Contributor Author

@yurishkuro : is the direction of this PR still the path we want to follow, or should I close this?

@yurishkuro
Copy link
Member

is the direction of this PR still the path we want to follow, or should I close this?

My main concern is that this PR is affecting a single flow between the agent and collector, rather than holistically all flows. I am not a big fan of doing that, especially when it comes to security features. There are other ways people can secure connections between agent and collector, e.g. with ssh tunnels. My preference is we wait for gRPC switch.

@jpkrohling
Copy link
Contributor Author

My preference is we wait for gRPC switch.

Is there a task already for this switch? I think @pavolloffay had interest in doing this and I could also work on it.

In any case, I'm closing this one as I don't think we want to invest more time in it.

@jpkrohling jpkrohling closed this Feb 1, 2018
@ghost ghost removed the review label Feb 1, 2018
@pavolloffay pavolloffay mentioned this pull request Feb 1, 2018
@pavolloffay
Copy link
Member

I have created #673

@jpkrohling jpkrohling deleted the JPK-AddedSenderWithAuth branch July 28, 2021 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants