-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(localenv): add trace collection (with Tempo) in local playground #2816
Conversation
# Conflicts: # localenv/telemetry/README.md # localenv/telemetry/docker-compose.yml # localenv/telemetry/grafana/provisioning/dashboards/default.yaml # localenv/telemetry/grafana/provisioning/dashboards/example.json # localenv/telemetry/grafana/provisioning/datasources/datasources.yaml # localenv/telemetry/otel-collector-config.yaml # packages/backend/src/telemetry/service.ts
✅ Deploy Preview for brilliant-pasca-3e80ec canceled.
|
instrumentations: [ | ||
new UndiciInstrumentation(), | ||
new HttpInstrumentation(), | ||
new PgInstrumentation(), | ||
new GraphQLInstrumentation({ | ||
mergeItems: true, | ||
ignoreTrivialResolveSpans: true, | ||
ignoreResolveSpans: true | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UndiciInstrumentation: Auto-instruments node fetch
module. Useful for us to track requests made with the open payments client, since it uses fetch under the hood.
HttpInstrumentation: Auto-instruments http and https modules. Useful to track client requests made with axios and allows to track received requests via http servers. (e.g. you would be able to see the how happy-life-bank handles each request).
PgInstrumentation: auto-instruments all of our postgres queries.
GraphQLInstrumentation: auto-instruments our GraphQL APIs, particularly resolvers. The settings are to reduce the amount of noise we get. Otherwise, we see spans for every single field that we resolve. We can adjust if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are always instrumenting these, regardless of enableTelemetryTraces
. Is that OK? I mean the important thing is that we're not sending them anywhere, so maybe it's fine. But I wonder if this could have a performance impact or cause any unnecessary concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there shouldn't be much of a difference at all, it would probably be safer to put this behind the flag.
I moved the code behind the flags, tested with them enabled & disabled, and everything seems to work. It also works if we make telemetry non optional and try to increment a counter while everything is disabled - it just does a noop.
if (Config.enableTelemetryTraces) { | ||
for (const url of Config.openTelemetryTraceCollectorUrls) { | ||
const traceExporter = new OTLPTraceExporter({ | ||
url | ||
}) | ||
|
||
tracerProvider.addSpanProcessor(new BatchSpanProcessor(traceExporter)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seperate config variables, to differentiate between metrics and traces. (IMO we want metrics on by default, but not traces).
@@ -59,4 +59,4 @@ COPY --from=builder /home/rafiki/packages/backend/dist ./packages/backend/dist | |||
COPY --from=builder /home/rafiki/packages/token-introspection/dist ./packages/token-introspection/dist | |||
COPY --from=builder /home/rafiki/packages/backend/knexfile.js ./packages/backend/knexfile.js | |||
|
|||
CMD ["node", "/home/rafiki/packages/backend/dist/index.js"] | |||
CMD ["node", "-r", "/home/rafiki/packages/backend/dist/telemetry/index.js", "/home/rafiki/packages/backend/dist/index.js"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because auto-instrumentation essentially needs to wrap over multiple modules, everything needs to be loaded as the very first thing before app startup:
https://opentelemetry.io/docs/languages/js/getting-started/nodejs/#setup
otherwise, auto-instrumentation will not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just turn this CMD call pnpm start
or something, but maybe its ok like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, what happens if we dont include the --require
option? some error? silently fail or maybe coincidentally work sometime?
dont have a very strong opinion on the pnpm start command vs this but leaning towards this. since we may only want to start it this way in this context as opposed to Dockerfile.dev, from host machine, etc. Could always have different start commands in that case I guess though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just "fails" silently, and doesn't generate traces. For example, if the pg module gets loaded before this postgres auto instrumentation, it just won't pick up those traces.
If you uncomnent the debugger in the telemetry file, you'd be able to see those warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are merge conflicts with the telemetry service refactor but generally looks good to me. Spun it up and viewed the traces locally. Just had the one open-ended comment here: #2816 (comment)
Ping me when ready for re-review
# Conflicts: # localenv/telemetry/grafana/provisioning/dashboards/example.json # packages/backend/src/telemetry/service.ts
07096a8
to
00baa73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some dangling comments to be cleaned up. Was able to get the trace logs in the Grafana dashboard on my machine as well.
import { UndiciInstrumentation } from '@opentelemetry/instrumentation-undici' | ||
|
||
// debug logger: | ||
// diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this one, but its actually quite useful to keep for seeing whether the telemetry service is working at all, particularly the connection between the backend service and the collector itself. Otherwise need to dig into docs to find this.
might be a rare exception to keeping comments in
Changes proposed in this pull request
I added a small panel in the example Grafana dashboard that shows some of the longer traces than can be explored.
Context
Fixes #2808
Checklist
fixes #number