-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[m3db] Add Jaeger tracing to m3db #1506
Conversation
benraskin92
commented
Mar 28, 2019
@@ -137,6 +137,9 @@ type DBConfiguration struct { | |||
|
|||
// Write new series asynchronously for fast ingestion of new ID bursts. | |||
WriteNewSeriesAsync bool `yaml:"writeNewSeriesAsync"` | |||
|
|||
// Tracing configures opentracing. If not provided, tracing is disabled. | |||
Tracing instrument.TracingConfiguration `yaml:"tracing"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd make this a pointer type to make it optional to specify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine since we check if its a no-op in server.go
:
if _, ok := tracer.(opentracing.NoopTracer); ok {
logger.Info("tracing disabled for m3dbnode; set `tracing.backend` to enable")
}
@@ -40,7 +40,9 @@ func RegisterServer(channel *tchannel.Channel, service thrift.TChanServer, conte | |||
server := thrift.NewServer(channel) | |||
server.Register(service, thrift.OptPostResponse(postResponseFn)) | |||
server.SetContextFn(func(ctx xnetcontext.Context, method string, headers map[string]string) thrift.Context { | |||
ctxWithValue := xnetcontext.WithValue(ctx, contextKey, contextPool.Get()) | |||
xCtx := contextPool.Get() | |||
xCtx.SetGoContext(xnetcontext.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can replace this line with xCtx.SetGoContext(ctx)
@@ -66,6 +70,9 @@ const ( | |||
maxSegmentArrayPooledLength = 32 | |||
// Any pooled error slices that grow beyond this capcity will be thrown away. | |||
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096 | |||
|
|||
fetchTaggedSpanID = "fetch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just "fetchTagged"
5cb48e6
to
2af7e0d
Compare
@@ -11,7 +11,8 @@ fi | |||
|
|||
echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done" | |||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01 | |||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed | |||
docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed | |||
# docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me testing - will revert before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanna just add an environment variable for it? :P
@@ -211,6 +212,11 @@ if [[ "$AGGREGATOR_PIPELINE" = true ]]; then | |||
curl http://localhost:7206/api/v1/json/report -X POST -d '{"metrics":[{"type":"gauge","value":42,"tags":{"__name__":"foo_metric","foo":"bar"}}]}' | |||
fi | |||
|
|||
if [[ "$START_JAEGER" = true ]] ; then | |||
docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check to make sure localhost:16686
is available.
Codecov Report
@@ Coverage Diff @@
## master #1506 +/- ##
========================================
+ Coverage 71.7% 71.8% +<.1%
========================================
Files 938 938
Lines 76588 76693 +105
========================================
+ Hits 54974 55088 +114
+ Misses 18092 18083 -9
Partials 3522 3522
Continue to review full report at Codecov.
|
@@ -27,6 +27,9 @@ coordinator: | |||
idScheme: quoted | |||
|
|||
db: | |||
tracing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove before landing.
src/dbnode/x/opentracing/context.go
Outdated
@@ -0,0 +1,82 @@ | |||
// Copyright (c) 2019 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this once m3db/m3x#214 lands
@@ -0,0 +1,106 @@ | |||
// Copyright (c) 2019 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this once m3db/m3x#214 lands
docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger | ||
sleep 5 | ||
JAEGER_STATUS=$(curl -s -o /dev/null -w '%{http_code}' localhost:14269) | ||
if [ $JAEGER_STATUS -ne 204 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once jaegertracing/jaeger#1450 is fixed, we can change this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for posterity, could you put in a comment saying the same thing in the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a nit, but is there not an explicit way to state the dependency in docker-compose.yml
? https://docs.docker.com/compose/compose-file/ mentions depends_on
|
||
To start Jaeger, add the following to your `m3dbnode.yml` file under `db`: | ||
|
||
```yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we can't just include this by default? Does it break if the jaeger image isnt running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes- it'll break if Jaeger isn't running here:
localAgentHostPort: jaeger:6831
@@ -11,7 +11,8 @@ fi | |||
|
|||
echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done" | |||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01 | |||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed | |||
docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed | |||
# docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanna just add an environment variable for it? :P
@@ -66,6 +70,9 @@ const ( | |||
maxSegmentArrayPooledLength = 32 | |||
// Any pooled error slices that grow beyond this capcity will be thrown away. | |||
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096 | |||
|
|||
fetchTaggedSpanID = "fetch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just "fetchTagged"
goCtx, exists := ctx.GoContext() | ||
if exists { | ||
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, querySpanID) | ||
sp.LogFields( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this get sampled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, can we only go down this path if this request is being sampled? Is there a way to query whether this context is being sampled or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the default sampling rate is quite low for Jaeger.
) | ||
|
||
ctx.SetGoContext(spCtx) | ||
defer sp.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This finish only applies to the jaeger trace right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe so.
) | ||
|
||
ctx.SetGoContext(spCtx) | ||
defer sp.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to close the span? The context cant do that for us?
@@ -147,6 +151,13 @@ func Run(runOpts RunOptions) { | |||
os.Exit(1) | |||
} | |||
|
|||
// Currently, the zapLogger is being used for tracing only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come we have to use zap for tracing? is that just how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing Andrew wrote to create the tracer in m3x uses jaegerzap.NewLogger()
- also @prateek wants to move to using a zap logger anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'll try to get around to this soon. want all our code to use zap instead of m3/x/log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it is time to kill the x logger soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS we should really add the ZapLogger to instrument options (or pull the Zap logger from there? Does it not already get created in instrument options by default?).
} else { | ||
// setup tracer | ||
tracer, traceCloser, err = cfg.Tracing.NewTracer(serviceName, scope, zapLogger) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way where we can keep the tracing configuration in config by default and have it just not break anything if jaeger is not available? I don't know how the buffering / the library work internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so iiuc, traces are sent over UDP and will just silently fail if there is no Jaeger collector setup. However, if you don't have the Jaeger config at all, a no-op tracer is created which doesn't send any traces.
src/dbnode/storage/database.go
Outdated
|
||
goCtx, exists := ctx.GoContext() | ||
if exists { | ||
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, dbQuerySpanID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this span automatically get tied together with the span from the RPC layer?
Also could we add a method to our context that makes adding each span less redundant? Maybe something like:
sp, closer := ctx.StartSpan(
spanID,
opentracinglog.String("query", query.String()),
opentracinglog.String("namespace", namespace.String()),
opentracinglog.Int("limit", opts.Limit),
opentracinglog.Time("start", opts.StartInclusive),
opentracinglog.Time("start", opts.StartInclusive),
)
defer closer()
sp.LogFields()
...
and all of the ctx.GoContext()
if exists
etc boilerplate can get shoved into that helper method
That way its a little cleaner/shorter and the API allows you to add the logs that you have immediately right away and then if you need to add more later you can use the returned sp
object.
src/dbnode/storage/database.go
Outdated
if exists { | ||
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, dbQuerySpanID) | ||
sp.LogFields( | ||
opentracinglog.String("query", query.String()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add the query to the span at every level? Same for StartInclusive
and EndExclusive
.
maybe I don't know how tracing works but I thought you could just add that information in one span and it was would associated with the entire trace and you wouldn't need to do it at each layer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe logs are associated with just a single span, but I'll double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe logs are associated with just a single span, but I'll double check
So I think that's true, but I don't think that means we need to associate every span with the query. If we've already found the whole trace (multiple spans), we can get the query from the relevant span there. No point in logging it everywhere (and in fact, it's kind of confusing to do so, since that info isn't really available further down in the call stack, I assume)
opentracingutil.Time("end", time.Unix(0, req.RangeStart)), | ||
) | ||
|
||
ctx.SetGoContext(spCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems racey if multiple goroutines have reference to the same ctx
.
src/dbnode/server/server.go
Outdated
// setup tracer | ||
tracer, traceCloser, err = cfg.Tracing.NewTracer(serviceName, scope, zapLogger) | ||
if err != nil { | ||
logger.Fatalf("could not initialize tracing for m3dbnode: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to log an error, create a NOOP reporter, then continue.
6f827ca
to
86a9183
Compare
@benraskin92 let us know when this is good for another pass. This will be a great addition. |
0df089d
to
8ca8d2f
Compare
@@ -12,6 +12,9 @@ import: | |||
- package: github.com/m3db/stackadler32 | |||
version: bfebcd73ef6ffe0ee30489227f0330c39064b674 | |||
|
|||
- package: github.com/MichaelTJones/pcg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably remove this dependency - will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, we use it in m3db/src/x/sync/pooled_worker_pool.go
docker/jaeger/Dockerfile
Outdated
@@ -0,0 +1 @@ | |||
FROM jaegertracing/all-in-one:1.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this file? as in, can we just refer to the upstream image in the docker-compose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.. let me test.
To start Jaeger, you need to set the environment variable `START_JAEGER` to `true` when you run `start_m3.sh`. | ||
|
||
``` | ||
START_JAEGER=true ./start_m3.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be USE_JAEGER
instead? cause we do have to stop it in stop_m3.sh
jaeger: | ||
networks: | ||
- backend | ||
build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more a question: can you skip the build:
section here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears so..
@@ -2,6 +2,15 @@ db: | |||
logging: | |||
level: info | |||
|
|||
tracing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm are you going to leave this in? if so, why does the readme need the instructions for config updates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case someone wants to change the sampling rate, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.
I'm not sure I agree with that change; the most likely reason for that is invalid configuration, which it seems worthwhile to surface. Since it's analogous, do we handle failure to initialize tally etc in the same way? Would be good to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I definitely log the error, but it would be kind of bad if m3db couldn't start if your jaeger host was down for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough! My assumption was that the jaeger client would come up even if the host is down (and then just fail to log), and that it would error primarily in cases of misconfiguration. However, I could see the argument for not going down if jaeger for instance changes their configuration in a backwards incompatible way (though then the user upon upgrading will just have to notice that their traces are no longer working).
I'm good with this though; thanks for discussion!
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01 | ||
# docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - sorry will remove before i land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, please put a TODO(braskin)
or something else in case you're going to keep it around.
# Uncomment this to enable local jaeger tracing. See https://www.jaegertracing.io/docs/1.9/getting-started/ | ||
# for quick local setup (which this config will send data to). | ||
|
||
tracing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, should this be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, should it be in all the config files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - but instead of this, I'd rather just have an annotated config section like http://m3db.github.io/m3/query_engine/config/annotated_config.yaml. I can add them to all of the configs but I think that's overkill.
@@ -66,6 +69,9 @@ const ( | |||
maxSegmentArrayPooledLength = 32 | |||
// Any pooled error slices that grow beyond this capcity will be thrown away. | |||
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096 | |||
|
|||
fetchTaggedSpanID = "rpc_fetch_tagged" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do some standardisation of the names we use when creating spans. The ideal is to have a easy 1-1 mapping from name to the code, without requiring users to look at the code itself (to make it easy to read the trace output in jaeger).
I'd suggest a convention like: packageName.objectName.Method
or packageName.Method
in case there isn't an object. where packageName is abbreviated enough to make it clear without being too verbose. So in this case, it'd be tchannelthrift/node.Service.Query
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me read a bit about opentracing/jaeger naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about keeping these names, but including tags? for example, this one would still be called rpc_fetch_tagged
, but we could have a series of tags like:
method:query
package:node
component:tchannelthrift
@@ -66,6 +69,9 @@ const ( | |||
maxSegmentArrayPooledLength = 32 | |||
// Any pooled error slices that grow beyond this capcity will be thrown away. | |||
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096 | |||
|
|||
fetchTaggedSpanID = "rpc_fetch_tagged" | |||
querySpanID = "rpc_query" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another point (this one I'm not as sure of) - how would you feel about moving all these constants as exported in a separate package (say src/dbnode/tracepoint
). Would help discoverability. And also provide a nice single place to document the convention of naming these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with the tags approach, i'd say leave it in the individual files.
var sp opentracing.Span | ||
result, sp, err := s.query(tctx, req) | ||
|
||
if sp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, can you rewire the span creation to give you back a NoOpSpan
when we're not tracing? that way we'd be able to avoid if sp != nil
everywhere
|
||
ctx = tchannelthrift.Context(tctx) | ||
|
||
ctx, sp, ok = ctx.StartTraceSpan(querySpanID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, how about starting the span in the Query()
method. just to minimise the amount of observability code we're putting side by side with execution logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm why do we need to return ok from StartTraceSpan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We technically don't but I guess it's a little cleaner than doing a sp != nil
if err != nil { | ||
sp.LogFields(opentracinglog.Error(err)) | ||
} | ||
sp.Finish() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably easiest to follow the defer sp.Finish()
pattern. i.e. issue the defer right after constructing the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method since it is not in the super hot path that makes sense. If this was a hot high frequency path I would say avoiding defer is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave without the defer
to establish a pattern.
) | ||
|
||
ctx = tchannelthrift.Context(tctx) | ||
ctx, sp, ok = ctx.StartTraceSpan(tracepoint.FetchTaggedSpanID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to leave comment about avoiding the string casts and varargs slice alloc for the call to LogFields(...)
when trace is not being sampled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In low frequency methods it probably doesn't matter all that much, but we may as well establish the pattern so that in high frequency hot methods we're avoiding allocs by default with the pattern we establish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will leave this as is.
@@ -316,7 +320,7 @@ func (c *ctx) spanIsSampled(sp opentracing.Span) bool { | |||
func (c *ctx) StartTraceSpan(name string) (Context, opentracing.Span, bool) { | |||
goCtx, exists := c.GoContext() | |||
if !exists || c.checkedAndNotSampled { | |||
return c, nil, false | |||
return c, noopTracer.StartSpan(name), false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can change the signature to StartTraceSpan(name string) (Context, opentracing.Span)
to make this easier to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, as per earlier comment I left - knowing whether it's being sampled or not is important so for hot path code we can avoid doing anything that might cause an allocation unless the trace is absolutely being sampled. Hence the bool
is important.
QuerySpanID = "tchannelthrift/node.service.Query" | ||
|
||
// DBQueryIDs is the operation name for the db QueryIDs path | ||
DBQueryIDs = "storage.db.QueryIDs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: first two vars in this file end with ...SpanID
, the last two don't. Mind sticking to one convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably remove the ...SpanID
suffix, since someone will forget it at some point hah.
src/dbnode/tracepoint/tracepoint.go
Outdated
// If there isn't an object, use `packageName.method`. | ||
|
||
const ( | ||
// FetchTaggedSpanID is the operation name for the tchannelthrift FetchTagged path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: end comments with a period.
"fmt" | ||
"sort" | ||
"testing" | ||
"time" | ||
|
||
"github.com/m3db/m3/src/dbnode/tracepoint" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Unnecessary empty endline?
return index.QueryResult{}, | ||
xerrors.NewRetryableError(errIndexNotBootstrappedToRead) | ||
xerrors.NewRetryableError(err) | ||
} | ||
|
||
res, err := n.reverseIndex.Query(ctx, query, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe should also if err != nil && sp != nil { sp.LogFields(opentracinglog.Error(err)) }
here?
Updates Docker Add docker compose Update tests Update docs Log more errors Make tracing config a pointer Remove config Fix test Add to start_m3 Return 1 Comments Updates Fix tests Update m3x package Pin pcg package version Remove comments Update glide Address comments Update naming convention Address more comments remove build
a8b2713
to
2e4b90b
Compare
@@ -255,13 +259,35 @@ func (s *service) Bootstrapped(ctx thrift.Context) (*rpc.NodeBootstrappedResult_ | |||
} | |||
|
|||
func (s *service) Query(tctx thrift.Context, req *rpc.QueryRequest) (*rpc.QueryResult_, error) { | |||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need these allocs ahead of time now?
sp opentracing.Span | ||
) | ||
|
||
ctx = tchannelthrift.Context(tctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be:
ctx, sp := tchannelthrift.Context(tctx).StartTraceSpan(tracepoint.Query)
) | ||
|
||
ctx = tchannelthrift.Context(tctx) | ||
ctx, sp = ctx.StartTraceSpan(tracepoint.FetchTagged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
src/dbnode/server/server.go
Outdated
@@ -21,6 +21,7 @@ | |||
package server | |||
|
|||
import ( | |||
stdlib "context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: need a name that's clearer about defining the import "stdcontext" or "stdlibcontext" or "context" instead of "stdlib" please
src/dbnode/storage/database.go
Outdated
@@ -708,8 +712,21 @@ func (d *db) QueryIDs( | |||
query index.Query, | |||
opts index.QueryOptions, | |||
) (index.QueryResult, error) { | |||
var sp opentracing.Span | |||
ctx, sp = ctx.StartTraceSpan(tracepoint.DBQueryIDs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ctx, sp := ...
get :( here?
src/dbnode/storage/database_test.go
Outdated
@@ -21,13 +21,16 @@ | |||
package storage | |||
|
|||
import ( | |||
stdlib "context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
src/dbnode/storage/database_test.go
Outdated
"errors" | ||
"fmt" | ||
"sort" | ||
"sync" | ||
"testing" | ||
"time" | ||
|
||
"github.com/m3db/m3/src/dbnode/tracepoint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import order. interestingly, did metalint not pick this up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird - ill look into it
src/dbnode/storage/namespace_test.go
Outdated
@@ -21,12 +21,15 @@ | |||
package storage | |||
|
|||
import ( | |||
stdlib "context" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
src/dbnode/storage/namespace_test.go
Outdated
"errors" | ||
"fmt" | ||
"sync" | ||
"testing" | ||
"time" | ||
|
||
"github.com/m3db/m3/src/dbnode/tracepoint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
@@ -313,10 +317,10 @@ func (c *ctx) spanIsSampled(sp opentracing.Span) bool { | |||
return true | |||
} | |||
|
|||
func (c *ctx) StartTraceSpan(name string) (Context, opentracing.Span, bool) { | |||
func (c *ctx) StartSampledTraceSpan(name string) (Context, opentracing.Span, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add comments to exported functions. write a blurb bout using StartSampledTraceSpan v the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nvm, saw you have them in the interface def
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the convention was to just comment on the interface?
@@ -75,8 +75,12 @@ type Context interface { | |||
SetGoContext(stdctx.Context) | |||
|
|||
// StartTraceSpan starts a new span and returns a child ctx | |||
// if the span is being sampled. | |||
StartTraceSpan(name string) (Context, opentracing.Span, bool) | |||
StartTraceSpan(string) (Context, opentracing.Span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period after comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM