Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[m3db] Add Jaeger tracing to m3db #1506

Merged
merged 7 commits into from
Apr 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 16 additions & 18 deletions glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 5 additions & 2 deletions glide.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ import:
- package: github.com/m3db/stackadler32
version: bfebcd73ef6ffe0ee30489227f0330c39064b674

- package: github.com/MichaelTJones/pcg
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably remove this dependency - will take a look.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah, we use it in m3db/src/x/sync/pooled_worker_pool.go

version: df440c6ed7ed8897ac98a408365e5e89c7becf1a

- package: github.com/willf/bitset
version: e553b05586428962bf7058d1044519d87ca72d74

Expand Down Expand Up @@ -191,10 +194,10 @@ import:

# START_JAEGER_DEPS
- package: github.com/uber/jaeger-lib
version: ^1.5.0
version: ^2.0.0

- package: github.com/uber/jaeger-client-go
version: ~2.15.0
version: ~2.16.0

- package: github.com/opentracing-contrib/go-stdlib
# Pin this on recommendation of the repo (no stable release yet). Still arguably better than rewriting
Expand Down
29 changes: 26 additions & 3 deletions scripts/development/m3_stack/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,42 @@ This docker-compose file will setup the following environment:

## Usage

Use the `start_m3.sh` and `stop_m3.sh` scripts
Use the `start_m3.sh` and `stop_m3.sh` scripts.

## Grafana

Use Grafana by navigating to `http://localhost:3000` and using `admin` for both the username and password. The M3DB dashboard should already be populated and working.

## Jaeger

To start Jaeger, you need to set the environment variable `USE_JAEGER` to `true` when you run `start_m3.sh`.

```
USE_JAEGER=true ./start_m3.sh
```

To modify the sampling rate, etc. you can modify the following in your `m3dbnode.yml` file under `db`:

```yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't just include this by default? Does it break if the jaeger image isnt running?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes- it'll break if Jaeger isn't running here:

localAgentHostPort: jaeger:6831

tracing:
backend: jaeger
jaeger:
reporter:
localAgentHostPort: jaeger:6831
sampler:
type: const
param: 1
```

Use Jaeger by navigating to `http://localhost:16686`.

## Prometheus

Use Prometheus by navigating to `http://localhost:9090`
Use Prometheus by navigating to `http://localhost:9090`.

## Increasing Load

Load can easily be increased by modifying the `prometheus.yml` file to reduce the scrape interval to `1s`
Load can easily be increased by modifying the `prometheus.yml` file to reduce the scrape interval to `1s`.

## Containers Hanging / Unresponsive

Expand Down
15 changes: 15 additions & 0 deletions scripts/development/m3_stack/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,5 +109,20 @@ services:
networks:
- backend
image: m3grafana:latest
jaeger:
networks:
- backend
image: jaegertracing/all-in-one:1.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n1

environment:
- COLLECTOR_ZIPKIN_HTTP_PORT=9411
ports:
- "0.0.0.0:5775:5775/udp"
- "0.0.0.0:6831:6831/udp"
- "0.0.0.0:6832:6832/udp"
- "0.0.0.0:5778:5778"
- "0.0.0.0:16686:16686"
- "0.0.0.0:14268:14268"
- "0.0.0.0:14269:14269"
- "0.0.0.0:9411:9411"
networks:
backend:
9 changes: 9 additions & 0 deletions scripts/development/m3_stack/m3dbnode.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@ db:
logging:
level: info

tracing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm are you going to leave this in? if so, why does the readme need the instructions for config updates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case someone wants to change the sampling rate, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here.

I'm not sure I agree with that change; the most likely reason for that is invalid configuration, which it seems worthwhile to surface. Since it's analogous, do we handle failure to initialize tally etc in the same way? Would be good to be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I definitely log the error, but it would be kind of bad if m3db couldn't start if your jaeger host was down for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! My assumption was that the jaeger client would come up even if the host is down (and then just fail to log), and that it would error primarily in cases of misconfiguration. However, I could see the argument for not going down if jaeger for instance changes their configuration in a backwards incompatible way (though then the user upon upgrading will just have to notice that their traces are no longer working).

I'm good with this though; thanks for discussion!

backend: jaeger
jaeger:
reporter:
localAgentHostPort: jaeger:6831
sampler:
type: const
param: 1

metrics:
prometheus:
handlerPath: /metrics
Expand Down
16 changes: 16 additions & 0 deletions scripts/development/m3_stack/start_m3.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,19 @@ if [[ "$FORCE_BUILD" = true ]] ; then
fi

echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done"

# need to start Jaeger before m3db or else m3db will not be able to talk to the Jaeger agent.
if [[ "$USE_JAEGER" = true ]] ; then
docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger
sleep 3
# rely on 204 status code until https://github.com/jaegertracing/jaeger/issues/1450 is resolved.
JAEGER_STATUS=$(curl -s -o /dev/null -w '%{http_code}' localhost:14269)
if [ $JAEGER_STATUS -ne 204 ]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once jaegertracing/jaeger#1450 is fixed, we can change this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for posterity, could you put in a comment saying the same thing in the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a nit, but is there not an explicit way to state the dependency in docker-compose.yml? https://docs.docker.com/compose/compose-file/ mentions depends_on

echo "Jaeger could not start"
return 1
fi
fi

docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed
docker-compose -f docker-compose.yml up $DOCKER_ARGS prometheus01
Expand Down Expand Up @@ -211,6 +224,9 @@ if [[ "$AGGREGATOR_PIPELINE" = true ]]; then
curl http://localhost:7206/api/v1/json/report -X POST -d '{"metrics":[{"type":"gauge","value":42,"tags":{"__name__":"foo_metric","foo":"bar"}}]}'
fi

if [[ "$USE_JAEGER" = true ]] ; then
echo "Jaeger UI available at localhost:16686"
fi
echo "Prometheus available at localhost:9090"
echo "Grafana available at localhost:3000"
echo "Run ./stop.sh to shutdown nodes when done"
4 changes: 4 additions & 0 deletions src/cmd/services/m3dbnode/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import (
"github.com/m3db/m3/src/x/config/hostid"
"github.com/m3db/m3/src/x/instrument"
xlog "github.com/m3db/m3/src/x/log"
"github.com/m3db/m3/src/x/opentracing"

"github.com/coreos/etcd/embed"
"github.com/coreos/etcd/pkg/transport"
Expand Down Expand Up @@ -140,6 +141,9 @@ type DBConfiguration struct {

// Proto contains the configuration specific to running in the ProtoDataMode.
Proto *ProtoConfiguration `yaml:"proto"`

// Tracing configures opentracing. If not provided, tracing is disabled.
Tracing *opentracing.TracingConfiguration `yaml:"tracing"`
}

// InitDefaultsAndValidate initializes all default values and validates the Configuration.
Expand Down
51 changes: 50 additions & 1 deletion src/cmd/services/m3dbnode/config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,19 @@
package config

import (
"context"
"io/ioutil"
"os"
"testing"

"github.com/m3db/m3/src/dbnode/environment"
xtest "github.com/m3db/m3/src/x/test"
"github.com/m3db/m3/src/query/util/logging"
xconfig "github.com/m3db/m3/src/x/config"
xtest "github.com/m3db/m3/src/x/test"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/uber-go/tally"
yaml "gopkg.in/yaml.v2"
)

Expand Down Expand Up @@ -300,6 +303,9 @@ db:
hashing:
seed: 42
writeNewSeriesAsync: true

tracing:
backend: jaeger
`

func TestConfiguration(t *testing.T) {
Expand Down Expand Up @@ -624,6 +630,19 @@ func TestConfiguration(t *testing.T) {
seed: 42
writeNewSeriesAsync: true
proto: null
tracing:
serviceName: ""
backend: jaeger
jaeger:
serviceName: ""
disabled: false
rpc_metrics: false
tags: []
sampler: null
reporter: null
headers: null
baggage_restrictions: null
throttler: null
coordinator: null
`

Expand Down Expand Up @@ -796,3 +815,33 @@ func TestNewEtcdEmbedConfig(t *testing.T) {

assert.Equal(t, "existing", embedCfg.ClusterState)
}

func TestNewJaegerTracer(t *testing.T) {
fd, err := ioutil.TempFile("", "config_jaeger.yaml")
require.NoError(t, err)
defer func() {
assert.NoError(t, fd.Close())
assert.NoError(t, os.Remove(fd.Name()))
}()

_, err = fd.Write([]byte(testBaseConfig))
require.NoError(t, err)

// Verify is valid
var cfg Configuration
err = xconfig.LoadFile(&cfg, fd.Name(), xconfig.Options{})
require.NoError(t, err)

logging.InitWithCores(nil)
ctx := context.Background()
zapLogger := logging.WithContext(ctx)
defer zapLogger.Sync()

testScope := tally.NewTestScope("test_scope", map[string]string{"app": "test"})
tracer, closer, err := cfg.DB.Tracing.NewTracer("m3dbnode", testScope, zapLogger)
require.NoError(t, err)
defer closer.Close()

// Verify tracer gets created
require.NotNil(t, tracer)
}
6 changes: 6 additions & 0 deletions src/dbnode/config/m3dbnode-local-etcd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,9 @@ db:
initialCluster:
- hostID: m3db_local
endpoint: http://127.0.0.1:2380

# un-comment the lines below to enable Jaeger tracing. See https://www.jaegertracing.io/docs/1.9/getting-started/
# for quick local setup (which this config will send data to).

# tracing:
# backend: jaeger
4 changes: 3 additions & 1 deletion src/dbnode/network/server/tchannelthrift/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,9 @@ func RegisterServer(channel *tchannel.Channel, service thrift.TChanServer, conte
server := thrift.NewServer(channel)
server.Register(service, thrift.OptPostResponse(postResponseFn))
server.SetContextFn(func(ctx xnetcontext.Context, method string, headers map[string]string) thrift.Context {
ctxWithValue := xnetcontext.WithValue(ctx, contextKey, contextPool.Get())
xCtx := contextPool.Get()
xCtx.SetGoContext(ctx)
ctxWithValue := xnetcontext.WithValue(ctx, contextKey, xCtx)
return thrift.WithHeaders(ctxWithValue, headers)
})
}
Expand Down
Loading