GraphQL Debugger Performance #322

danstarns · 2024-05-28T13:09:24Z

GraphQL Debugger Performance

This issue tracks the progress of reporting a potential performance issue with the standard OpenTelemetry lib.

Using GraphQL debugger introduces significant latency, primarily because it wraps a GraphQL resolver with logic that interacts with standard OpenTelemetry (OTEL) libraries.

We investigated the potential overhead caused by this resolver wrapping and identified several ways to improve performance on our end, including:

Despite these improvements, our benchmarks still show significant overhead when using standard OpenTelemetry, and even more so with our middleware.

How do we see the performance?

In the process of debugging performance and assessing the impact of our work, we created a few benchmarks to demonstrate our case. Initially, we forked graphql-crystal/benchmarks to our own repository, rocket-connect/benchmarks, and began to modify it to only target the JS runtimes and GraphQL servers that came with it.

We saw an impact coming from OpenTelemetry when implementing the yoga-otel benchmark. By simply using the standard OTEL libraries 'raw' and creating a span inside a GraphQL resolver, we also observed the performance issue. Our investigation revealed that the performance issue was not specifically with GraphQL debugger, in how we are wrapping the resolvers and storing various attributes, but it was, in fact, an issue with the usage of the standard OTEL libraries.

The benchmark used the standard OpenTelemetry libraries within the resolver to create a span:

const resolvers = {
  Query: {
    hello: (root, args, context, info) => {
      const tracer = opentelemetry.trace.getTracer("example-tracer");
      const span = tracer.startSpan("say-hello");
      span.setAttribute("hello-to", "world");
      span.setAttribute("query", JSON.stringify(info.operation));
      span.addEvent("invoking resolvers");
      span.end();
      return "world";
    },
  },
};

Resulting in an increase in latency by up to 100%.

Given our findings, we first moved the benchmarks into the monorepo rocket-connect/graphql-debugger/benchmarks, where they are invoked on each commit to the main branch. Additionally, we created an isolated repository, rocket-connect/otel-js-server-benchmarks, to demonstrate the performance impact of using OTEL inside basic node http and express endpoints.

Extracts

Initial Finding

This extract comes from our initial fork rocket-connect/benchmarks, where we discovered that just using OTEL in isolation, without debugger, massively impacted the performance of yoga, taking latency from 15.33ms to 35.39ms and requests from 13kps to 5.7kps.

Move to monorepo

After our findings in our initial work, we moved the benchmarks to the graphql debugger monorepo rocket-connect/graphql-debugger/benchmarks, where you can see a better view of all graphql js runtimes with and without OpenTelemetry. This also enabled us to iterate on the performance impact we did have, resulting in reducing the latency of yoga-debugger from 92.52ms to 52.72ms and increasing requests from 2.1kps to 3.8kps.

Isolate OpenTelemetry benchmarks

Finally, given that our initial work indicated the problem was isolated to using OTEL libraries and propagated from our middleware, we decided to move beyond GraphQL and demonstrate the same examples using standard Node HTTP versus Express rocket-connect/otel-js-server-benchmarks. Our results show that adding just a few lines of OTEL code to your HTTP or Express handler will result in a significant reduction in the performance of your API. For example, a basic http endpoint operating at 6.26ms latency more than triples the average time to 22.03ms when OTEL is added, rendering it unusable for any production setting.

The text was updated successfully, but these errors were encountered:

danstarns mentioned this issue May 28, 2024

Performance rocket-connect/otel-js-server-benchmarks#1

Open

danstarns changed the title ~~Performance~~ GraphQL Debugger Performance May 29, 2024

danstarns mentioned this issue May 29, 2024

performance: high latency when using otel libs inside common endpoints open-telemetry/opentelemetry-js#4741

Closed

danstarns mentioned this issue Jun 9, 2024

feat: change to batch processor #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphQL Debugger Performance #322

GraphQL Debugger Performance #322

danstarns commented May 28, 2024 •

edited

Loading

GraphQL Debugger Performance #322

GraphQL Debugger Performance #322

Comments

danstarns commented May 28, 2024 • edited Loading

GraphQL Debugger Performance

What is the issue?

How do we see the performance?

Extracts

Initial Finding

Move to monorepo

Isolate OpenTelemetry benchmarks

danstarns commented May 28, 2024 •

edited

Loading