Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open telemetry support #803

Merged
merged 1 commit into from
Jun 11, 2021
Merged

Open telemetry support #803

merged 1 commit into from
Jun 11, 2021

Conversation

BrynCooke
Copy link
Contributor

This PR adds instrumentation to emit the following spans:

  • federation.request - total time to server a request.
  • federation.validate - time to validate the federated query.
  • federation.plan - time to plan the federated query.
  • federation.execute - the total time executing query.
  • federation.fetch - time fetching data from a subgraph.
  • federation.postprocessing - time to render the response from all the subgraph data.

The instrumentation looks a little messy due to the requirement to set error status on spans on exception, but also if errors are added to the response.

@apollo-cla
Copy link

@BrynCooke: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

@BrynCooke BrynCooke force-pushed the opentelemetry_support branch 2 times, most recently from 1d64468 to f34d876 Compare June 9, 2021 16:51
Copy link
Member

@trevor-scheer trevor-scheer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great! I don't have any majorly blocking issues, and I think my couple larger suggestions are optional/food for thought anyway.

One separate comment I will make is that consistency in style would be nice. Things that come to mind are indentation, bracket spacing, semicolons, and newlines. Note that this is explicitly not enforced on this repo and it's certainly not correct everywhere, either, so do with that what you will. It's not something I'll go around and leaving nitpick comments about or block on.

gateway-js/package.json Outdated Show resolved Hide resolved
gateway-js/src/executeQueryPlan.ts Outdated Show resolved Hide resolved
gateway-js/src/__tests__/gateway/opentelemetry.test.ts Outdated Show resolved Hide resolved
gateway-js/src/__tests__/gateway/opentelemetry.test.ts Outdated Show resolved Hide resolved
gateway-js/src/__tests__/gateway/opentelemetry.test.ts Outdated Show resolved Hide resolved
Comment on lines +1103 to +1109
catch (err) {
span.setStatus({ code:SpanStatusCode.ERROR });
throw err;
}
finally {
span.end();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try/catch/finally pattern is pretty repetitive, particularly in the catch/finally bits. It gets a bit harder to stomach with the nesting / longer try blocks.

I'm on the fence about suggesting this still but worth surfacing the idea since maybe something even better will come out of it. I don't feel strongly about this one (and not even sure of its correctness or completeness, just pseudo-ing something).

async function wrapInSpan<T>(
  tracer: Tracer,
  spanName: string,
  do: (span: Span) => Promise<T>,
  onError?: (err) => Promise<T>
): Promise<T> {
  return tracer.startActiveSpan(spanName, async (span) => {
    try {
      return do(span);
    } catch (err) {
      span.setStatus({ code:SpanStatusCode.ERROR });
      if (onError) return onError(err);
      throw err;
    } finally {
      span.end();
    }
  };
}

// ...inside execution
return wrapInSpan(
  tracer,
  OpenTelemetrySpanNames.EXECUTE,
  async (span) => {
  // execution logic
  },
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering about such a helper function. It seems strange that they didn't bake this in to the OpenTelemetry API.
startActiveSpan actually takes a bunch more parameters, and I realised that I should have been using one of these overloads at least once.

Let's leave it for now, but agree it's not pretty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sgtm!

Comment on lines 10 to 30
const reconstructed: any = [];
const spanMap = new Map<ReadableSpan | undefined, any>();
spanMap.set(undefined, reconstructed);
spans.forEach((s) => spanMap.set(s, {name: s.name, attributes:s.attributes, children: [], status: s.status}));
spans.forEach((s) => {
const spanMirror = spanMap.get(s);
const parentSpanMirror = spanMap.get(
spans.find((s2) => s2.spanContext().spanId === s.parentSpanId)
);
if(Array.isArray(parentSpanMirror)) {
parentSpanMirror.push(spanMirror);
}
else {
parentSpanMirror.children.push(spanMirror);
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some comments here? I think we're building a tree of spans where nested spans are added to a span-like object under the children property, which gives it the tree shape. I just get a bit mixed up with the mirror stuff in the process.

If I understand what's happening correctly, maybe we could build a map of parentSpanId => childrenSpanId[] and walk that map starting with the undefined root? You can take this with a grain of salt...this really just trades complexity from the data construction over to the data printing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments and also simplified the algorithm.
At least part of the issue was that I had named things badly. Hopefully it looks better now.

@trevor-scheer
Copy link
Member

I'm less familiar with the nuances of lockfiles generated by npm@7, but I would expect a diff caused by the move of the devDependencies. This change would be represented after running an npm install now that they're in the correct place within the package.json root. Can you please give that a run and see if there's any change?

Copy link
Member

@trevor-scheer trevor-scheer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from my package-lock.json concern, this looks ready to me!

@@ -4,7 +4,7 @@

> The changes noted within this `vNEXT` section have not been released yet. New PRs and commits which introduce changes should include an entry in this `vNEXT` section as part of their development. When a release is being prepared, a new header will be (manually) created below and the appropriate changes within that release will be moved into the new section.

- _Nothing yet! Stay tuned!_
- OpenTelemetry support PR TBA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update this with the PR # and link. I think the description is fine but if you feel there's anything worth adding here that's great too.

@BrynCooke
Copy link
Contributor Author

Thanks for all the feedback @trevor-scheer !

This PR adds instrumentation to emit the following spans:
* federation.request - total time to server a request.
* federation.validate - time to validate the federated query.
* federation.plan - time to plan the federated query.
* federation.execute - the total time executing query.
* federation.fetch - time fetching data from a subgraph.
* federation.postprocessing - time to render the response from all the subgraph data.

The instrumentation looks a little messy due to the requirement to set error status on spans on exception, but also if errors are added to the response.
@BrynCooke BrynCooke merged commit 7c5d2b9 into main Jun 11, 2021
@BrynCooke BrynCooke deleted the opentelemetry_support branch June 11, 2021 14:58
trevor-scheer added a commit that referenced this pull request Jun 11, 2021
…ort"

This reverts commit 7c5d2b9, reversing
changes made to 4482d9c.

Temporarily reverting to relocate onto a release branch which will
be merged in the near future. See branch release-gateway-0.29.0.
trevor-scheer added a commit that referenced this pull request Jun 11, 2021
Revert "Revert "Merge pull request #803 from apollographql/opentelemetry_support""

This reverts commit 01d0518.
@trevor-scheer trevor-scheer added this to the @apollo/gateway v0.29.0 milestone Jun 11, 2021
BrynCooke added a commit that referenced this pull request Jun 25, 2021
This PR adds instrumentation to emit the following spans:

federation.request - total time to server a request.
federation.validate - time to validate the federated query.
federation.plan - time to plan the federated query.
federation.execute - the total time executing query.
federation.fetch - time fetching data from a subgraph.
federation.postprocessing - time to render the response from all the subgraph data.
The instrumentation looks a little messy due to the requirement to set error status on spans on exception, but also if errors are added to the response.

It is a reissue of the following PRs
#803
#828
trevor-scheer pushed a commit that referenced this pull request Jun 29, 2021
This PR adds instrumentation to emit the following spans:

federation.request - total time to server a request.
federation.validate - time to validate the federated query.
federation.plan - time to plan the federated query.
federation.execute - the total time executing query.
federation.fetch - time fetching data from a subgraph.
federation.postprocessing - time to render the response from all the subgraph data.
The instrumentation looks a little messy due to the requirement to set error status on spans on exception, but also if errors are added to the response.

It is a reissue of the following PRs
#803
#828
trevor-scheer added a commit that referenced this pull request Jun 29, 2021
* Add OpenTelemetry support

This PR adds instrumentation to emit the following spans:

federation.request - total time to server a request.
federation.validate - time to validate the federated query.
federation.plan - time to plan the federated query.
federation.execute - the total time executing query.
federation.fetch - time fetching data from a subgraph.
federation.postprocessing - time to render the response from all the subgraph data.
The instrumentation looks a little messy due to the requirement to set error status on spans on exception, but also if errors are added to the response.

It is a reissue of the following PRs
#803
#828

* WIP on opentelemetry docs

* First run at OT docs ready for review

* Incorporate feedback from @BrynCooke

* Add note to update @apollo/gateway

* Release

 - [email protected]
 - @apollo/[email protected]
 - @apollo/[email protected]

* Update changelog for publish

Co-authored-by: bryncooke <[email protected]>
Co-authored-by: Stephen Barlow <[email protected]>
Co-authored-by: Stephen Barlow <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants