Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry: By default high HTTP span name cardinality for HTTP server spans #26216

Closed
calohmn opened this issue Jun 18, 2022 · 9 comments · Fixed by #26356 or #26871
Closed

OpenTelemetry: By default high HTTP span name cardinality for HTTP server spans #26216

calohmn opened this issue Jun 18, 2022 · 9 comments · Fixed by #26356 or #26871
Assignees
Labels
area/tracing kind/bug Something isn't working
Milestone

Comments

@calohmn
Copy link

calohmn commented Jun 18, 2022

Describe the bug

This is a follow-up of #16952.
When using the quarkus-opentelemetry extension and observing Vert.x HTTP server request spans, the default is still to see span names containing the full request URI path. This means you will possibly get a whole range of span names like

/products/myProductId123
/products/myProductId234
/someWeirdPathTryingToExploitSomeVulnerability?someParamX

The OTEL spec declares such high cardinality span names as unsuitable. In practice, this means the span selection combobox in the Jaeger UI becomes unusable.

#17676 was supposed to fix this by introducing span names with parameterized paths (e..g /products/{productId}), but there are quite some steps involved to fully prevent the high cardinality span names.

To prevent the issue

  • A "match-all" Vert.x Router route has to be added. That route must either have a name set or ctx.request().routed(someName) has to be invoked in its handler. This prevents full request URI span names for any unsupported request paths (that lead to a 404 response).
  • The "match-all" route also needs a failure handler, so that invalid HTTP requests don't show up in tracing with their full request paths. (But you want to ensure that the failure handler route name isn't used as span name for all requests where a specific route already got matched - you want the specific route name as span name then (see example code below).)
  • [EDIT: If the application uses the quarkus-micrometer dependency], the Vert.x HTTP server metrics have to be enabled by setting quarkus.micrometer.binder.http-server.enabled=true.
    Without that, no parameterized paths get used.
Example matchAll route
        final Route matchAllRoute = router.route();
        matchAllRoute.failureHandler(ctx -> {
            if (ctx.get(KEY_MATCH_ALL_ROUTE_APPLIED) == null) {
                // handler of matchAllRoute not applied, meaning this request is invalid/failed from the start;
                // ensure span name is set to fixed string instead of the request path
                ctx.request().routed(MATCH_ALL_ROUTE_NAME);
            }
            ctx.next();
        });
        matchAllRoute.handler(ctx -> {
            // ensure span name is set to fixed string instead of the request path
            ctx.request().routed(MATCH_ALL_ROUTE_NAME);
            ctx.put(KEY_MATCH_ALL_ROUTE_APPLIED, true);
            ctx.next();
        });

Summary

FMPOV, the issue here is that

  • the default span name is still the full request URI path, if no named route gets applied (or ctx.request().routed(name) isn't invoked).
  • with quarkus-micrometer being used and without Vert.x HTTP server metrics enabled, still the full request URI paths get used as span names for all requests

Expected behavior

I would expect the HTTP request method name (GET, POST, PUT, etc.) to be used as span name if Vert.x HTTP server metrics are not enabled and also if no request route name got set/applied (i.e. no named route got matched and ctx.request().routed(name) wasn't invoked).

It seems to me, this can be implemented by just letting the route(HttpRequest) method in HttpInstrumenterVertxTracer.ServerAttributesExtractor always return null.

Actual behavior

High cardinality span names are used by default.

How to Reproduce?

Make HTTP requests to quarkus-opentelemetry application having Vert.x HTTP server endpoint.

  • Make requests with unsupported paths, leading to 404 responses.
  • Make invalid HTTP requests (e.g. with printf 'GET invalid HTTP/1.0\r\n\r\n' | nc -q 3 localhost 80).
    Observe span names in Jaeger UI.
  • EDIT: Add quarkus-micrometer dependency and make requests to be handled by routes with parameterized paths.

Observe span names.

Output of uname -a or ver

Linux be6z00br-vm 5.4.0-117-generic #132-Ubuntu SMP Thu Jun 2 00:39:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Output of java -version

openjdk 17.0.3 2022-04-19

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.8.3

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@calohmn calohmn added the kind/bug Something isn't working label Jun 18, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Jun 18, 2022

/cc @radcortez

@radcortez radcortez self-assigned this Jun 20, 2022
@radcortez
Copy link
Member

Can you please provide a reproducer?

I've tried a simple Quarkus Application with a Vert.x Router with a template path and it works as expected. We even have an integration test for it:

@Test
void spanPath() {
given()
.get("/hello/{name}", "Naruto")
.then()
.statusCode(HTTP_OK)
.body(equalTo("hello Naruto"));
await().atMost(5, TimeUnit.SECONDS).until(() -> getSpans().size() == 1);
List<Map<String, Object>> spans = getSpans();
assertEquals(1, spans.size());
assertEquals(SERVER.toString(), spans.get(0).get("kind"));
assertEquals("/hello/:name", spans.get(0).get("name"));
assertEquals(HTTP_OK, ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_STATUS_CODE.toString()));
assertEquals(HttpMethod.GET.toString(), ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_METHOD.toString()));
assertEquals("/hello/:name", ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_ROUTE.toString()));
}

I've also checked Jaeger and the names were showing correctly.

This does not require the Metrics Extension to be available. The span name is only set on the span end because the route is only determined and selected after we start the span. Maybe you are running into a scenario where the span name is not being correctly updated.

@radcortez radcortez added the triage/needs-reproducer We are waiting for a reproducer. label Jun 23, 2022
@calohmn
Copy link
Author

calohmn commented Jun 24, 2022

(I've updated the issue description above - see "How to reproduce".)

Test concerning handling of "status 404" requests (with no explicit failure handler route getting matched):
In the HelloRouterTest class, such a test can be added:

    @Test
    void requestWith404Response() {
        given()
                .get("/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php")
                .then()
                .statusCode(HTTP_NOT_FOUND);

        await().atMost(5, TimeUnit.SECONDS).until(() -> getSpans().size() == 1);
        List<Map<String, Object>> spans = getSpans();
        assertEquals(1, spans.size());

        assertEquals("HTTP GET", spans.get(0).get("name"));
    }

Test concerning invalid HTTP requests
It seems, tests involving the handling of invalid HTTP requests cannot be done with rest-assured (rest-assured/rest-assured#644), so a different kind of unit test would be needed. See "How to reproduce" above for a manual test.

Test concerning parameterized paths:
Route names with parameterized paths don't get set as span names if the quarkus-micrometer dependency is used and quarkus.micrometer.binder.http-server.enabled is false.
So, to reproduce with the HelloRouterTest class, these dependencies have to be added to the quarkus-integration-test-opentelemetry-vertx module:

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-micrometer</artifactId>
</dependency>
<dependency>
    <groupId>org.jboss.spec.javax.ws.rs</groupId>
    <artifactId>jboss-jaxrs-api_2.1_spec</artifactId>
</dependency>

Running the spanPath() method then fails:

Expected :/hello/:name
Actual   :/hello/Naruto

@radcortez
Copy link
Member

Thank you. I guess I've got confused by reading the title and the description which are actually 3 separate things (even if related).

I've fixed the 404 span name by setting the span name to /*. This seems to be how OTel is doing it with other libraries.

When the Micrometer extension is available, we fall back to its Metrics (including the resolution of the route name). We now force the OTel Vertx Metrics registration even if Micrometer is available and disabled.

I'm just not sure if we can do something about invalid/bad requests. For instance, it is perfectly valid for the implementing application to return a 400 from an existent endpoint, and in that case, you want the span name to match the route. We do return an HTTP GET when there is an Unhandled exception in Vertx router

@radcortez radcortez removed the triage/needs-reproducer We are waiting for a reproducer. label Jun 24, 2022
@calohmn
Copy link
Author

calohmn commented Jun 27, 2022

I've fixed the 404 span name by setting the span name to /*. This seems to be how OTel is doing it with other libraries.
[...]
it is perfectly valid for the implementing application to return a 400 from an existent endpoint, and in that case, you want the span name to match the route.

I think, you can equally say that it is valid for an application to return a 404 from an existent endpoint.
For a /products/myNonExistingProductId request, I would rather want to see a /products/:productId span name.

Concerning the unit tests, I would expect that with a router.get("/hello/:name") route definition in HelloRouter adapted like this:

        router.get("/hello/:name").handler(rc -> {
            final String name = rc.pathParam("name");
            if ("Nessie".equalsIgnoreCase(name)) {
                rc.response().setStatusCode(404).end("not found");
            } else {
                rc.response().end("hello " + name);
            }
        });

the following test should pass:

    @Test
    void spanPathForMatchedRoute404() {
        given()
                .get("/hello/{name}", "Nessie")
                .then()
                .statusCode(HTTP_NOT_FOUND)
                .body(equalTo("not found"));

        await().atMost(5, TimeUnit.SECONDS).until(() -> getSpans().size() == 1);
        List<Map<String, Object>> spans = getSpans();
        assertEquals(1, spans.size());

        assertEquals(SERVER.toString(), spans.get(0).get("kind"));
        assertEquals("/hello/:name", spans.get(0).get("name"));
        assertEquals(HTTP_NOT_FOUND, ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_STATUS_CODE.toString()));
        assertEquals(HttpMethod.GET.toString(), ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_METHOD.toString()));
        assertEquals("/hello/:name", ((Map<?, ?>) spans.get(0).get("attributes")).get(HTTP_ROUTE.toString()));
    }

I'm just not sure if we can do something about invalid/bad requests.

To me it looks like the above 404 case and the 400 case can both be solved by just letting the route(HttpRequest) method in HttpInstrumenterVertxTracer.ServerAttributesExtractor always return null (instead of returning request.uri()).

@radcortez
Copy link
Member

I think, you can equally say that it is valid for an application to return a 404 from an existent endpoint.
For a /products/myNonExistingProductId request, I would rather want to see a /products/:productId span name.

We can change it to use the route if one exists even for a 404.

To me it looks like the above 404 case and the 400 case can both be solved by just letting the route(HttpRequest) method in HttpInstrumenterVertxTracer.ServerAttributesExtractor always return null (instead of returning request.uri()).

I guess that we can remove that code there, but just because it will be updated later when the route is actually determined by the HttpRouteHolder.updateHttpRoute.

@radcortez
Copy link
Member

To me it looks like the above 404 case and the 400 case can both be solved by just letting the route(HttpRequest) method in HttpInstrumenterVertxTracer.ServerAttributesExtractor always return null (instead of returning request.uri()).

I guess that we can remove that code there, but just because it will be updated later when the route is actually determined by the HttpRouteHolder.updateHttpRoute.

For now, I won't change this, because we need it for the DropNamesSampler. I'll have to figure this out and I prefer to handle it in a separate PR.

@calohmn
Copy link
Author

calohmn commented Jul 18, 2022

For now, I won't change this, because we need it for the DropNamesSampler. I'll have to figure this out and I prefer to handle it in a separate PR.

Right, the exclusion of spans for health endpoint requests would need to be handled differently, then. I wonder if it wouldn't be possible to not create these spans in the first place, instead of dropping them afterwards.

In any case, regarding the handling of the invalid HTTP request spans, I've created #26776 to track this (also because there are cases for which the "matchAll" route failure handler workaround doesn't work).

@radcortez
Copy link
Member

Right, the exclusion of spans for health endpoint requests would need to be handled differently, then. I wonder if it wouldn't be possible to not create these spans in the first place, instead of dropping them afterwards.

Not sure if this is possible, because Spans are handled in a very low-level API (to catch all http requests), meaning that there is no knowing if the endpoint is the health one (aside from looking into the url).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment