-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[m3db] Add Jaeger tracing to m3db #1506
Changes from all commits
c78bc94
2e4b90b
575fbec
6f6b0f7
8a2de85
77b1994
0eb3104
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,19 +9,42 @@ This docker-compose file will setup the following environment: | |
|
||
## Usage | ||
|
||
Use the `start_m3.sh` and `stop_m3.sh` scripts | ||
Use the `start_m3.sh` and `stop_m3.sh` scripts. | ||
|
||
## Grafana | ||
|
||
Use Grafana by navigating to `http://localhost:3000` and using `admin` for both the username and password. The M3DB dashboard should already be populated and working. | ||
|
||
## Jaeger | ||
|
||
To start Jaeger, you need to set the environment variable `USE_JAEGER` to `true` when you run `start_m3.sh`. | ||
|
||
``` | ||
USE_JAEGER=true ./start_m3.sh | ||
``` | ||
|
||
To modify the sampling rate, etc. you can modify the following in your `m3dbnode.yml` file under `db`: | ||
|
||
```yaml | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason we can't just include this by default? Does it break if the jaeger image isnt running? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes- it'll break if Jaeger isn't running here:
|
||
tracing: | ||
backend: jaeger | ||
jaeger: | ||
reporter: | ||
localAgentHostPort: jaeger:6831 | ||
sampler: | ||
type: const | ||
param: 1 | ||
``` | ||
|
||
Use Jaeger by navigating to `http://localhost:16686`. | ||
|
||
## Prometheus | ||
|
||
Use Prometheus by navigating to `http://localhost:9090` | ||
Use Prometheus by navigating to `http://localhost:9090`. | ||
|
||
## Increasing Load | ||
|
||
Load can easily be increased by modifying the `prometheus.yml` file to reduce the scrape interval to `1s` | ||
Load can easily be increased by modifying the `prometheus.yml` file to reduce the scrape interval to `1s`. | ||
|
||
## Containers Hanging / Unresponsive | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -109,5 +109,20 @@ services: | |
networks: | ||
- backend | ||
image: m3grafana:latest | ||
jaeger: | ||
networks: | ||
- backend | ||
image: jaegertracing/all-in-one:1.9 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. n1 |
||
environment: | ||
- COLLECTOR_ZIPKIN_HTTP_PORT=9411 | ||
ports: | ||
- "0.0.0.0:5775:5775/udp" | ||
- "0.0.0.0:6831:6831/udp" | ||
- "0.0.0.0:6832:6832/udp" | ||
- "0.0.0.0:5778:5778" | ||
- "0.0.0.0:16686:16686" | ||
- "0.0.0.0:14268:14268" | ||
- "0.0.0.0:14269:14269" | ||
- "0.0.0.0:9411:9411" | ||
networks: | ||
backend: |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,15 @@ db: | |
logging: | ||
level: info | ||
|
||
tracing: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm are you going to leave this in? if so, why does the readme need the instructions for config updates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed the logic so that if it errors creating the tracer, we just log it and default to a noop - so it is safe to leave it in here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case someone wants to change the sampling rate, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fair enough. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with that change; the most likely reason for that is invalid configuration, which it seems worthwhile to surface. Since it's analogous, do we handle failure to initialize tally etc in the same way? Would be good to be consistent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I definitely log the error, but it would be kind of bad if m3db couldn't start if your jaeger host was down for some reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough! My assumption was that the jaeger client would come up even if the host is down (and then just fail to log), and that it would error primarily in cases of misconfiguration. However, I could see the argument for not going down if jaeger for instance changes their configuration in a backwards incompatible way (though then the user upon upgrading will just have to notice that their traces are no longer working). I'm good with this though; thanks for discussion! |
||
backend: jaeger | ||
jaeger: | ||
reporter: | ||
localAgentHostPort: jaeger:6831 | ||
sampler: | ||
type: const | ||
param: 1 | ||
|
||
metrics: | ||
prometheus: | ||
handlerPath: /metrics | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,19 @@ if [[ "$FORCE_BUILD" = true ]] ; then | |
fi | ||
|
||
echo "Bringing up nodes in the background with docker compose, remember to run ./stop.sh when done" | ||
|
||
# need to start Jaeger before m3db or else m3db will not be able to talk to the Jaeger agent. | ||
if [[ "$USE_JAEGER" = true ]] ; then | ||
docker-compose -f docker-compose.yml up $DOCKER_ARGS jaeger | ||
sleep 3 | ||
# rely on 204 status code until https://github.com/jaegertracing/jaeger/issues/1450 is resolved. | ||
JAEGER_STATUS=$(curl -s -o /dev/null -w '%{http_code}' localhost:14269) | ||
if [ $JAEGER_STATUS -ne 204 ]; then | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once jaegertracing/jaeger#1450 is fixed, we can change this logic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: for posterity, could you put in a comment saying the same thing in the file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps a nit, but is there not an explicit way to state the dependency in |
||
echo "Jaeger could not start" | ||
return 1 | ||
fi | ||
fi | ||
|
||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3coordinator01 | ||
docker-compose -f docker-compose.yml up $DOCKER_ARGS m3db_seed | ||
docker-compose -f docker-compose.yml up $DOCKER_ARGS prometheus01 | ||
|
@@ -211,6 +224,9 @@ if [[ "$AGGREGATOR_PIPELINE" = true ]]; then | |
curl http://localhost:7206/api/v1/json/report -X POST -d '{"metrics":[{"type":"gauge","value":42,"tags":{"__name__":"foo_metric","foo":"bar"}}]}' | ||
fi | ||
|
||
if [[ "$USE_JAEGER" = true ]] ; then | ||
echo "Jaeger UI available at localhost:16686" | ||
fi | ||
echo "Prometheus available at localhost:9090" | ||
echo "Grafana available at localhost:3000" | ||
echo "Run ./stop.sh to shutdown nodes when done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably remove this dependency - will take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, we use it in
m3db/src/x/sync/pooled_worker_pool.go