Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[m3db] Add Jaeger tracing to m3db #1506

Merged
merged 7 commits into from
Apr 10, 2019
Merged

[m3db] Add Jaeger tracing to m3db #1506

merged 7 commits into from
Apr 10, 2019

Conversation

benraskin92
Copy link
Collaborator

Screen Shot 2019-03-28 at 10 44 30 AM

@@ -137,6 +137,9 @@ type DBConfiguration struct {

// Write new series asynchronously for fast ingestion of new ID bursts.
WriteNewSeriesAsync bool `yaml:"writeNewSeriesAsync"`

// Tracing configures opentracing. If not provided, tracing is disabled.
Tracing instrument.TracingConfiguration `yaml:"tracing"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd make this a pointer type to make it optional to specify

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine since we check if its a no-op in server.go:

	if _, ok := tracer.(opentracing.NoopTracer); ok {
		logger.Info("tracing disabled for m3dbnode; set `tracing.backend` to enable")
	}

@@ -40,7 +40,9 @@ func RegisterServer(channel *tchannel.Channel, service thrift.TChanServer, conte
server := thrift.NewServer(channel)
server.Register(service, thrift.OptPostResponse(postResponseFn))
server.SetContextFn(func(ctx xnetcontext.Context, method string, headers map[string]string) thrift.Context {
ctxWithValue := xnetcontext.WithValue(ctx, contextKey, contextPool.Get())
xCtx := contextPool.Get()
xCtx.SetGoContext(xnetcontext.TODO())
Copy link
Collaborator

@prateek prateek Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can replace this line with xCtx.SetGoContext(ctx)

@@ -66,6 +70,9 @@ const (
maxSegmentArrayPooledLength = 32
// Any pooled error slices that grow beyond this capcity will be thrown away.
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096

fetchTaggedSpanID = "fetch"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just "fetchTagged"

@benraskin92 benraskin92 force-pushed the braskin/m3db_jaeger branch from 5cb48e6 to 2af7e0d Compare March 29, 2019 17:32
@@ -11,7 +11,8 @@ fi

echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done"
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed
docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed
# docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me testing - will revert before landing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanna just add an environment variable for it? :P

@@ -211,6 +212,11 @@ if [[ "$AGGREGATOR_PIPELINE" = true ]]; then
curl http://localhost:7206/api/v1/json/report -X POST -d '{"metrics":[{"type":"gauge","value":42,"tags":{"__name__":"foo_metric","foo":"bar"}}]}'
fi

if [[ "$START_JAEGER" = true ]] ; then
docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check to make sure localhost:16686 is available.

@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #1506 into master will increase coverage by <.1%.
The diff coverage is 72.3%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1506     +/-   ##
========================================
+ Coverage    71.7%   71.8%   +<.1%     
========================================
  Files         938     938             
  Lines       76588   76693    +105     
========================================
+ Hits        54974   55088    +114     
+ Misses      18092   18083      -9     
  Partials     3522    3522
Flag Coverage Δ
#aggregator 82.3% <ø> (ø) ⬆️
#cluster 85.7% <ø> (-0.1%) ⬇️
#collector 63.7% <ø> (ø) ⬆️
#dbnode 80.6% <72.1%> (ø) ⬆️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74% <ø> (+0.1%) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.5% <ø> (ø) ⬆️
#msg 74.9% <ø> (ø) ⬆️
#query 66.5% <100%> (ø) ⬆️
#x 85.7% <ø> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2228f13...491f5da. Read the comment docs.

@@ -27,6 +27,9 @@ coordinator:
idScheme: quoted

db:
tracing:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove before landing.

@benraskin92 benraskin92 marked this pull request as ready for review March 29, 2019 21:55
@benraskin92 benraskin92 changed the title [WIP] Add Jaeger tracing to m3db [m3db] Add Jaeger tracing to m3db Mar 29, 2019
@@ -0,0 +1,82 @@
// Copyright (c) 2019 Uber Technologies, Inc.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this once m3db/m3x#214 lands

@@ -0,0 +1,106 @@
// Copyright (c) 2019 Uber Technologies, Inc.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this once m3db/m3x#214 lands

docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger
sleep 5
JAEGER_STATUS=$(curl -s -o /dev/null -w '%{http_code}' localhost:14269)
if [ $JAEGER_STATUS -ne 204 ]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once jaegertracing/jaeger#1450 is fixed, we can change this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for posterity, could you put in a comment saying the same thing in the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a nit, but is there not an explicit way to state the dependency in docker-compose.yml? https://docs.docker.com/compose/compose-file/ mentions depends_on


To start Jaeger, add the following to your `m3dbnode.yml` file under `db`:

```yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't just include this by default? Does it break if the jaeger image isnt running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes- it'll break if Jaeger isn't running here:

localAgentHostPort: jaeger:6831

@@ -11,7 +11,8 @@ fi

echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done"
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed
docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed
# docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanna just add an environment variable for it? :P

@@ -66,6 +70,9 @@ const (
maxSegmentArrayPooledLength = 32
// Any pooled error slices that grow beyond this capcity will be thrown away.
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096

fetchTaggedSpanID = "fetch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just "fetchTagged"

goCtx, exists := ctx.GoContext()
if exists {
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, querySpanID)
sp.LogFields(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this get sampled?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, can we only go down this path if this request is being sampled? Is there a way to query whether this context is being sampled or not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the default sampling rate is quite low for Jaeger.

)

ctx.SetGoContext(spCtx)
defer sp.Finish()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This finish only applies to the jaeger trace right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe so.

)

ctx.SetGoContext(spCtx)
defer sp.Finish()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to close the span? The context cant do that for us?

@@ -147,6 +151,13 @@ func Run(runOpts RunOptions) {
os.Exit(1)
}

// Currently, the zapLogger is being used for tracing only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come we have to use zap for tracing? is that just how it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing Andrew wrote to create the tracer in m3x uses jaegerzap.NewLogger() - also @prateek wants to move to using a zap logger anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'll try to get around to this soon. want all our code to use zap instead of m3/x/log

Copy link
Collaborator

@robskillington robskillington Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it is time to kill the x logger soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS we should really add the ZapLogger to instrument options (or pull the Zap logger from there? Does it not already get created in instrument options by default?).

} else {
// setup tracer
tracer, traceCloser, err = cfg.Tracing.NewTracer(serviceName, scope, zapLogger)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way where we can keep the tracing configuration in config by default and have it just not break anything if jaeger is not available? I don't know how the buffering / the library work internally

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so iiuc, traces are sent over UDP and will just silently fail if there is no Jaeger collector setup. However, if you don't have the Jaeger config at all, a no-op tracer is created which doesn't send any traces.


goCtx, exists := ctx.GoContext()
if exists {
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, dbQuerySpanID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this span automatically get tied together with the span from the RPC layer?

Also could we add a method to our context that makes adding each span less redundant? Maybe something like:

sp, closer := ctx.StartSpan(
    spanID,
    opentracinglog.String("query", query.String()),
    opentracinglog.String("namespace", namespace.String()),
    opentracinglog.Int("limit", opts.Limit),
    opentracinglog.Time("start", opts.StartInclusive),
    opentracinglog.Time("start", opts.StartInclusive),
)
defer closer()
sp.LogFields()
...

and all of the ctx.GoContext() if exists etc boilerplate can get shoved into that helper method

That way its a little cleaner/shorter and the API allows you to add the logs that you have immediately right away and then if you need to add more later you can use the returned sp object.

if exists {
sp, spCtx = opentracingutil.StartSpanFromContext(goCtx, dbQuerySpanID)
sp.LogFields(
opentracinglog.String("query", query.String()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add the query to the span at every level? Same for StartInclusive and EndExclusive.

maybe I don't know how tracing works but I thought you could just add that information in one span and it was would associated with the entire trace and you wouldn't need to do it at each layer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe logs are associated with just a single span, but I'll double check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe logs are associated with just a single span, but I'll double check

So I think that's true, but I don't think that means we need to associate every span with the query. If we've already found the whole trace (multiple spans), we can get the query from the relevant span there. No point in logging it everywhere (and in fact, it's kind of confusing to do so, since that info isn't really available further down in the call stack, I assume)

opentracingutil.Time("end", time.Unix(0, req.RangeStart)),
)

ctx.SetGoContext(spCtx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems racey if multiple goroutines have reference to the same ctx.

// setup tracer
tracer, traceCloser, err = cfg.Tracing.NewTracer(serviceName, scope, zapLogger)
if err != nil {
logger.Fatalf("could not initialize tracing for m3dbnode: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to log an error, create a NOOP reporter, then continue.

@benraskin92 benraskin92 force-pushed the braskin/m3db_jaeger branch from 6f827ca to 86a9183 Compare April 6, 2019 00:13
@robskillington
Copy link
Collaborator

@benraskin92 let us know when this is good for another pass. This will be a great addition.

@benraskin92 benraskin92 force-pushed the braskin/m3db_jaeger branch from 0df089d to 8ca8d2f Compare April 8, 2019 15:22
@@ -12,6 +12,9 @@ import:
- package: github.com/m3db/stackadler32
version: bfebcd73ef6ffe0ee30489227f0330c39064b674

- package: github.com/MichaelTJones/pcg
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this dependency - will take a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, we use it in m3db/src/x/sync/pooled_worker_pool.go

@@ -0,0 +1 @@
FROM jaegertracing/all-in-one:1.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this file? as in, can we just refer to the upstream image in the docker-compose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.. let me test.

To start Jaeger, you need to set the environment variable `START_JAEGER` to `true` when you run `start_m3.sh`.

```
START_JAEGER=true ./start_m3.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be USE_JAEGER instead? cause we do have to stop it in stop_m3.sh

jaeger:
networks:
- backend
build:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more a question: can you skip the build: section here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears so..

@@ -2,6 +2,15 @@ db:
logging:
level: info

tracing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm are you going to leave this in? if so, why does the readme need the instructions for config updates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case someone wants to change the sampling rate, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.

I'm not sure I agree with that change; the most likely reason for that is invalid configuration, which it seems worthwhile to surface. Since it's analogous, do we handle failure to initialize tally etc in the same way? Would be good to be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I definitely log the error, but it would be kind of bad if m3db couldn't start if your jaeger host was down for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! My assumption was that the jaeger client would come up even if the host is down (and then just fail to log), and that it would error primarily in cases of misconfiguration. However, I could see the argument for not going down if jaeger for instance changes their configuration in a backwards incompatible way (though then the user upon upgrading will just have to notice that their traces are no longer working).

I'm good with this though; thanks for discussion!

docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01
# docker-compose -f docker-compose.yml up --build -d --renew-anon-volumes m3db_seed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - sorry will remove before i land.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, please put a TODO(braskin) or something else in case you're going to keep it around.

# Uncomment this to enable local jaeger tracing. See https://www.jaegertracing.io/docs/1.9/getting-started/
# for quick local setup (which this config will send data to).

tracing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, should this be commented out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, should it be in all the config files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - but instead of this, I'd rather just have an annotated config section like http://m3db.github.io/m3/query_engine/config/annotated_config.yaml. I can add them to all of the configs but I think that's overkill.

@@ -66,6 +69,9 @@ const (
maxSegmentArrayPooledLength = 32
// Any pooled error slices that grow beyond this capcity will be thrown away.
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096

fetchTaggedSpanID = "rpc_fetch_tagged"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do some standardisation of the names we use when creating spans. The ideal is to have a easy 1-1 mapping from name to the code, without requiring users to look at the code itself (to make it easy to read the trace output in jaeger).

I'd suggest a convention like: packageName.objectName.Method or packageName.Method in case there isn't an object. where packageName is abbreviated enough to make it clear without being too verbose. So in this case, it'd be tchannelthrift/node.Service.Query

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me read a bit about opentracing/jaeger naming conventions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about keeping these names, but including tags? for example, this one would still be called rpc_fetch_tagged, but we could have a series of tags like:

method:query
package:node
component:tchannelthrift

@@ -66,6 +69,9 @@ const (
maxSegmentArrayPooledLength = 32
// Any pooled error slices that grow beyond this capcity will be thrown away.
writeBatchPooledReqPoolMaxErrorsSliceSize = 4096

fetchTaggedSpanID = "rpc_fetch_tagged"
querySpanID = "rpc_query"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another point (this one I'm not as sure of) - how would you feel about moving all these constants as exported in a separate package (say src/dbnode/tracepoint). Would help discoverability. And also provide a nice single place to document the convention of naming these things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the tags approach, i'd say leave it in the individual files.

var sp opentracing.Span
result, sp, err := s.query(tctx, req)

if sp != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, can you rewire the span creation to give you back a NoOpSpan when we're not tracing? that way we'd be able to avoid if sp != nil everywhere


ctx = tchannelthrift.Context(tctx)

ctx, sp, ok = ctx.StartTraceSpan(querySpanID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, how about starting the span in the Query() method. just to minimise the amount of observability code we're putting side by side with execution logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm why do we need to return ok from StartTraceSpan ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically don't but I guess it's a little cleaner than doing a sp != nil

if err != nil {
sp.LogFields(opentracinglog.Error(err))
}
sp.Finish()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably easiest to follow the defer sp.Finish() pattern. i.e. issue the defer right after constructing the span.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method since it is not in the super hot path that makes sense. If this was a hot high frequency path I would say avoiding defer is a good idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave without the defer to establish a pattern.

)

ctx = tchannelthrift.Context(tctx)
ctx, sp, ok = ctx.StartTraceSpan(tracepoint.FetchTaggedSpanID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to leave comment about avoiding the string casts and varargs slice alloc for the call to LogFields(...) when trace is not being sampled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In low frequency methods it probably doesn't matter all that much, but we may as well establish the pattern so that in high frequency hot methods we're avoiding allocs by default with the pattern we establish.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave this as is.

@@ -316,7 +320,7 @@ func (c *ctx) spanIsSampled(sp opentracing.Span) bool {
func (c *ctx) StartTraceSpan(name string) (Context, opentracing.Span, bool) {
goCtx, exists := c.GoContext()
if !exists || c.checkedAndNotSampled {
return c, nil, false
return c, noopTracer.StartSpan(name), false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can change the signature to StartTraceSpan(name string) (Context, opentracing.Span) to make this easier to use

Copy link
Collaborator

@robskillington robskillington Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, as per earlier comment I left - knowing whether it's being sampled or not is important so for hot path code we can avoid doing anything that might cause an allocation unless the trace is absolutely being sampled. Hence the bool is important.

QuerySpanID = "tchannelthrift/node.service.Query"

// DBQueryIDs is the operation name for the db QueryIDs path
DBQueryIDs = "storage.db.QueryIDs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: first two vars in this file end with ...SpanID, the last two don't. Mind sticking to one convention.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably remove the ...SpanID suffix, since someone will forget it at some point hah.

// If there isn't an object, use `packageName.method`.

const (
// FetchTaggedSpanID is the operation name for the tchannelthrift FetchTagged path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: end comments with a period.

"fmt"
"sort"
"testing"
"time"

"github.com/m3db/m3/src/dbnode/tracepoint"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Unnecessary empty endline?

return index.QueryResult{},
xerrors.NewRetryableError(errIndexNotBootstrappedToRead)
xerrors.NewRetryableError(err)
}

res, err := n.reverseIndex.Query(ctx, query, opts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe should also if err != nil && sp != nil { sp.LogFields(opentracinglog.Error(err)) } here?

Benjamin Raskin added 2 commits April 9, 2019 09:44
Updates

Docker

Add docker compose

Update tests

Update docs

Log more errors

Make tracing config a pointer

Remove config

Fix test

Add to start_m3

Return 1

Comments

Updates

Fix tests

Update m3x package

Pin pcg package version

Remove comments

Update glide

Address comments

Update naming convention

Address more comments

remove build
@benraskin92 benraskin92 force-pushed the braskin/m3db_jaeger branch from a8b2713 to 2e4b90b Compare April 9, 2019 14:08
@@ -255,13 +259,35 @@ func (s *service) Bootstrapped(ctx thrift.Context) (*rpc.NodeBootstrappedResult_
}

func (s *service) Query(tctx thrift.Context, req *rpc.QueryRequest) (*rpc.QueryResult_, error) {
var (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need these allocs ahead of time now?

sp opentracing.Span
)

ctx = tchannelthrift.Context(tctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be:

ctx, sp := tchannelthrift.Context(tctx).StartTraceSpan(tracepoint.Query)

)

ctx = tchannelthrift.Context(tctx)
ctx, sp = ctx.StartTraceSpan(tracepoint.FetchTagged)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -21,6 +21,7 @@
package server

import (
stdlib "context"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: need a name that's clearer about defining the import "stdcontext" or "stdlibcontext" or "context" instead of "stdlib" please

@@ -708,8 +712,21 @@ func (d *db) QueryIDs(
query index.Query,
opts index.QueryOptions,
) (index.QueryResult, error) {
var sp opentracing.Span
ctx, sp = ctx.StartTraceSpan(tracepoint.DBQueryIDs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ctx, sp := ... get :( here?

@@ -21,13 +21,16 @@
package storage

import (
stdlib "context"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

"errors"
"fmt"
"sort"
"sync"
"testing"
"time"

"github.com/m3db/m3/src/dbnode/tracepoint"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import order. interestingly, did metalint not pick this up?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird - ill look into it

@@ -21,12 +21,15 @@
package storage

import (
stdlib "context"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

"errors"
"fmt"
"sync"
"testing"
"time"

"github.com/m3db/m3/src/dbnode/tracepoint"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

@@ -313,10 +317,10 @@ func (c *ctx) spanIsSampled(sp opentracing.Span) bool {
return true
}

func (c *ctx) StartTraceSpan(name string) (Context, opentracing.Span, bool) {
func (c *ctx) StartSampledTraceSpan(name string) (Context, opentracing.Span, bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add comments to exported functions. write a blurb bout using StartSampledTraceSpan v the other one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nvm, saw you have them in the interface def

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the convention was to just comment on the interface?

@@ -75,8 +75,12 @@ type Context interface {
SetGoContext(stdctx.Context)

// StartTraceSpan starts a new span and returns a child ctx
// if the span is being sampled.
StartTraceSpan(name string) (Context, opentracing.Span, bool)
StartTraceSpan(string) (Context, opentracing.Span)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: period after comment.

Copy link
Collaborator

@prateek prateek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benraskin92 benraskin92 merged commit 08cbbef into master Apr 10, 2019
@justinjc justinjc deleted the braskin/m3db_jaeger branch November 27, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants