Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jaeger/opentracing tracing to m3query #1321

Merged
merged 4 commits into from
Feb 21, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/operational_guide/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Metrics

TODO

## Logs

TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh I don't see much harm in having stubs. I switched it to "TODO: document how to retrieve metrics for M3DB components." so that people don't think the metrics themselves are TODO.


## Tracing

M3DB is integrated with [opentracing](https://opentracing.io/) to provide
insight into query performance and errors.

### Configuration
Currently, only https://www.jaegertracing.io/ is supported as a backend.
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe only [Jaeger](https://www.jaegertracing.io/) is supported as a tracing backend.


To enable it, set tracing.backend to "jaeger":

```
tracing:
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
backend: jaeger # enables jaeger with default configs
jaeger:
# optional configuration for jaeger -- see
# https://github.com/jaegertracing/jaeger-client-go/blob/master/config/config.go#L37
# for options
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
...
```

Jaeger can be run locally with docker as described in
https://www.jaegertracing.io/docs/1.9/getting-started/.

The default configuration will report traces via udp to localhost:6831;
using the all-in-one jaeger container, they will be accessible at

http://localhost:16686


#### Alternative backends

If you'd like additional backends, we'd love to support them!

File an issue against M3 and we can work with you on how best to add
the backend. The first time's going to be a little rough--opentracing
unfortunately doesn't support Go plugins (yet--see
https://github.com/opentracing/opentracing-go/issues/133), and `glide`'s
update model means that adding dependencies directly will update
*everything*, which isn't ideal for an isolated dependency change.
These problems are all solvable though,
and we'll work with you to make it happen!

### Use cases

Note: all URLs assume a local jaeger setup as described in Jaeger's
[docs](https://www.jaegertracing.io/docs/1.9/getting-started/).


#### Finding slow queries

To find prom queries longer than <threshold>, filter for `minDuration >= <threshold>` on
`operation="GET /api/v1/query_range"`.

Sample query:
http://localhost:16686/search?end=1548876672544000&limit=20&lookback=1h&maxDuration&minDuration=1ms&operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&start=1548873072544000
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved

#### Finding queries with errors

Search for `error=true` on `operation="GET /api/v1/query_range"`
http://localhost:16686/search?operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&tags=%7B%22error%22%3A%22true%22%7D

#### Finding 500 (Internal Server Error) responses

Search for `http.status_code=500`.

http://localhost:16686/search?limit=20&lookback=24h&maxDuration&minDuration&operation=GET%20%2Fapi%2Fv1%2Fquery_range&service=m3query&start=1548802430108000&tags=%7B"http.status_code"%3A"500"%7D
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
51 changes: 38 additions & 13 deletions glide.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 16 additions & 2 deletions glide.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
package: github.com/m3db/m3
import:
- package: github.com/m3db/m3x
version: e98ec326dd7b4d4bc97390c709dacf73d49a0bfc
version: b66c9c466c4726e3c9b47b1f837abbbe0f14be81
vcs: git
subpackages:
- checked
Expand Down Expand Up @@ -64,7 +64,7 @@ import:
version: f85c78b1dd998214c5f2138155b320a4a43fbe36

- package: github.com/opentracing/opentracing-go
version: 855519783f479520497c6b3445611b05fc42f009
version: 1.0.2

- package: github.com/spaolacci/murmur3
version: 9f5d223c60793748f04a9d5b4b4eacddfc1f755d
Expand Down Expand Up @@ -201,6 +201,20 @@ import:
subpackages:
- capnslog

# START_JAEGER_DEPS
- package: github.com/uber/jaeger-lib
version: ^1.5.0

- package: github.com/uber/jaeger-client-go
version: ^2.7.0

- package: github.com/opentracing-contrib/go-stdlib
# Pin this on recommendation of the repo (no stable release yet). Still arguably better than rewriting
# the same code.
version: 77df8e8e70b403c6b13c0fffaa4867c9044ff4e9

# END_JAEGER_DEPS

# To avoid conflicting packages not resolving the latest GRPC
- package: google.golang.org/grpc
version: ~1.7.3
Expand Down
3 changes: 3 additions & 0 deletions src/cmd/services/m3query/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ type Configuration struct {
// Metrics configuration.
Metrics instrument.MetricsConfiguration `yaml:"metrics"`

// Tracing configures opentracing. If not provided, tracing is disabled.
Tracing instrument.TracingConfiguration `yaml:"tracing"`
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved

// Clusters is the DB cluster configurations for read, write and
// query endpoints.
Clusters m3.ClustersStaticConfiguration `yaml:"clusters"`
Expand Down
10 changes: 8 additions & 2 deletions src/query/api/v1/handler/prometheus/native/read.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,12 @@ import (
"github.com/m3db/m3/src/query/executor"
"github.com/m3db/m3/src/query/models"
"github.com/m3db/m3/src/query/ts"
"github.com/m3db/m3/src/query/util/httperrors"
"github.com/m3db/m3/src/query/util/logging"
xhttp "github.com/m3db/m3/src/x/net/http"
opentracingutil "github.com/m3db/m3/src/query/util/opentracing"

opentracingext "github.com/opentracing/opentracing-go/ext"
opentracinglog "github.com/opentracing/opentracing-go/log"
"github.com/uber-go/tally"
"go.uber.org/zap"
)
Expand Down Expand Up @@ -124,7 +127,7 @@ func (h *PromReadHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {

result, params, respErr := h.ServeHTTPWithEngine(w, r, h.engine)
if respErr != nil {
xhttp.Error(w, respErr.Err, respErr.Code)
httperrors.ErrorWithReqID(w, r, respErr.Err, respErr.Code)
return
}

Expand Down Expand Up @@ -167,6 +170,9 @@ func (h *PromReadHandler) ServeHTTPWithEngine(

result, err := read(ctx, engine, h.tagOpts, w, params)
if err != nil {
sp := opentracingutil.SpanFromContextOrRoot(ctx)
sp.LogFields(opentracinglog.Error(err))
opentracingext.Error.Set(sp, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you have sp.Finish() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, no; that function is intended to extract the span from context if it exists, and return a noop/dummy if not. In the first case, the calling function is already in charge of calling Finish(); in the second, it shouldn't be needed. Does that make sense?

I updated the function to be opentracingutil.SpanFromContextOrNoop, making it return a noop span, which should ensure that it doesn't need to be closed even if called with a context without a span.

logger.Error("unable to fetch data", zap.Error(err))
h.promReadMetrics.fetchErrorsServer.Inc(1)
return nil, emptyReqParams, &RespError{Err: err, Code: http.StatusInternalServerError}
Expand Down
12 changes: 12 additions & 0 deletions src/query/api/v1/handler/prometheus/native/read_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ import (
"github.com/m3db/m3/src/query/models"
"github.com/m3db/m3/src/query/parser/promql"
"github.com/m3db/m3/src/query/ts"
opentracingutil "github.com/m3db/m3/src/query/util/opentracing"

opentracinglog "github.com/opentracing/opentracing-go/log"
)

func read(
Expand All @@ -45,6 +48,15 @@ func read(
ctx, cancel := context.WithTimeout(reqCtx, params.Timeout)
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
defer cancel()

sp := opentracingutil.SpanFromContextOrRoot(ctx)
sp.LogFields(
opentracinglog.String("params.query", params.Query),
opentracingutil.Time("params.start", params.Start),
opentracingutil.Time("params.end", params.End),
opentracingutil.Time("params.now", params.Now),
opentracingutil.Duration("params.step", params.Step),
)
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved

opts := &executor.EngineOptions{}
// Detect clients closing connections
handler.CloseWatcher(ctx, cancel, w)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ import (
"github.com/m3db/m3/src/query/api/v1/handler/prometheus"
"github.com/m3db/m3/src/query/executor"
"github.com/m3db/m3/src/query/models"
"github.com/m3db/m3/src/query/util/httperrors"
"github.com/m3db/m3/src/query/util/logging"
xhttp "github.com/m3db/m3/src/x/net/http"

"go.uber.org/zap"
)
Expand Down Expand Up @@ -69,7 +69,7 @@ func (h *PromReadInstantHandler) ServeHTTP(w http.ResponseWriter, r *http.Reques
logger := logging.WithContext(ctx)
params, rErr := parseInstantaneousParams(r, h.timeoutOpts)
if rErr != nil {
xhttp.Error(w, rErr.Inner(), rErr.Code())
httperrors.ErrorWithReqID(w, r, rErr, rErr.Code())
return
}

Expand All @@ -80,7 +80,7 @@ func (h *PromReadInstantHandler) ServeHTTP(w http.ResponseWriter, r *http.Reques
result, err := read(ctx, h.engine, h.tagOpts, w, params)
if err != nil {
logger.Error("unable to fetch data", zap.Error(err))
xhttp.Error(w, err, http.StatusBadRequest)
httperrors.ErrorWithReqID(w, r, rErr, http.StatusBadRequest)
return
}

Expand Down
15 changes: 12 additions & 3 deletions src/query/api/v1/httpd/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ package httpd
import (
"encoding/json"
"errors"
"fmt"
"net/http"
_ "net/http/pprof" // needed for pprof handler registration
"time"
Expand Down Expand Up @@ -52,6 +53,8 @@ import (
"github.com/m3db/m3/src/x/net/http/cors"

"github.com/gorilla/mux"
"github.com/opentracing-contrib/go-stdlib/nethttp"
"github.com/opentracing/opentracing-go"
"github.com/uber-go/tally"
)

Expand Down Expand Up @@ -102,13 +105,19 @@ func NewHandler(
) (*Handler, error) {
r := mux.NewRouter()

// apply middleware. Just CORS for now, but we could add more here as needed.
withMiddleware := &cors.Handler{
// apply middleware.
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
withMiddleware := http.Handler(&cors.Handler{
Handler: r,
Info: &cors.Info{
"*": true,
},
}
})

// add jaeger tracing to our endpoints
andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
withMiddleware = nethttp.Middleware(opentracing.GlobalTracer(), withMiddleware,
nethttp.OperationNameFunc(func(r *http.Request) string {
return fmt.Sprintf("%s %s", r.Method, r.URL.Path)
}))

var timeoutOpts = &prometheus.TimeoutOpts{}
if embeddedDbCfg == nil || embeddedDbCfg.Client.FetchTimeout == nil {
Expand Down
6 changes: 6 additions & 0 deletions src/query/config/m3query-dev-etcd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,9 @@ writeWorkerPoolPolicy:

tagOptions:
idScheme: quoted

andrewmains12 marked this conversation as resolved.
Show resolved Hide resolved
# Uncomment this to enable local jaeger tracing. See https://www.jaegertracing.io/docs/1.9/getting-started/
# for quick local setup (which this config will send data to).

# tracing:
# backend: jaeger
Loading