Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements parsing of http route and uses it by default for span name #602

Merged
merged 1 commit into from
Feb 22, 2018

Conversation

codefromthecrypt
Copy link
Member

@codefromthecrypt codefromthecrypt commented Jan 31, 2018

"http.route" is an expression such as "/items/:itemId" representing an application endpoint. This adds support for parsing this, employing it as the default span name policy when present. A route based span name looks like "/users/{userId}", "not_found" or "redirected". More below:

Http Route

The http route is an expression such as /items/:itemId representing an
application endpoint. HttpAdapter.route() parses this from a response,
returning the route that matched the request, empty if no route matched,
or null if routes aren't supported. This value is either used to create
a tag "http.route" or as an input to a span naming function.

Http route cardinality

The http route groups similar requests together, so results in limited
cardinality, often a better choice for a span name vs the http method.

For example, the route /users/{userId}, matches /users/25f4c31d and
/users/e3c553be. If a span name function used the http path instead,
it could DOS-style attack vector on your span name index, as it would
grow unbounded vs /users/{userId}. Even if different frameworks use
different formats, such as /users/[0-9a-f]+ or /users/:userId, the
cardinality is still fixed with regards to request count.

The http route can be "" (empty) on redirect or not-found. If you use
http route for metrics, coerce empty to constants like "redirected" or
"not_found" with the http status. Knowing the difference between not
found and redirected can be a simple intrusion detection signal. The
default span name policy uses constants when a route isn't known for
reasons including sharing the span name as a metrics correlation field.

Supporting HttpAdapter.route(response)

Although the route is associated with the request, not the response,
it is parsed from the response object. The reason is that many server
implementations process the request before they can identify the route.

Instrumentation authors implement support via extending HttpAdapter
accordingly. There are a few patterns which might help.

Callback with non-final type

When a framework uses an callback model, you are in control of the type
being parsed. If the response type isn't final, simply subclass it with
the route data.

For example, if Spring MVC, it would be this:

    // check for a wrapper type which holds the template
    handler = HttpServerHandler.create(httpTracing, new HttpServletAdapter() {
      @Override public String template(HttpServletResponse response) {
        return response instanceof HttpServletResponseWithTemplate
            ? ((HttpServletResponseWithTemplate) response).route : null;
      }

      @Override public String toString() {
        return "WebMVCAdapter{}";
      }
    });

--snip--

// when parsing the response, scope the route from the request

    Object template = request.getAttribute(BEST_MATCHING_PATTERN_ATTRIBUTE);
    if (route != null) {
      response = new HttpServletResponseWithTemplate(response, route.toString());
    }
    handler.handleSend(response, ex, span);

Callback with final type

If the response type is final, you may be able to make a copy and stash
the route as a synthetic header. Since this is a copy of the response,
there's no chance a user will receive this header in a real response.

Here's an example for Play, where the header "brave-http-template" holds
the route temporarily until the parser can read it.

    result.onComplete {
      case Failure(t) => handler.handleSend(null, t, span)
      case Success(r) => {
        // add a synthetic header if there was a routing path set
        var resp = template.map(t => r.withHeaders("brave-http-template" -> t)).getOrElse(r)
        handler.handleSend(resp, null, span)
      }
    }
--snip--
    override def template(response: Result): String =
      response.header.headers.apply("brave-http-template")

Common mistakes

For grouping to work, we want routes that are effectively the same, to
in fact be the same. Here are a couple things on that.

  • Always start with a leading slash
    • This allows you to differentiate the root path from empty (no route)
    • This prevents accidental partitioning like users/:userId from /users/:userId
  • Take care not to duplicate slashes
    • When joining nested paths, avoid writing templates like /nested//users/:userId
    • The ITHttpServer test will catch some of this

Fixes openzipkin/zipkin#1874

@codefromthecrypt
Copy link
Member Author

@jplock do you think it makes sense to add jersey2 support to brave (which then your dropwizard project could use instead of normal jaxrs?) If so, we could get the http template as it isn't available at the jax-rs level ex https://github.com/micrometer-metrics/micrometer/blob/9ae63906cc78d44c444dbb43340b5b287d6b645e/micrometer-jersey2/src/main/java/io/micrometer/jersey2/server/DefaultJerseyTagsProvider.java

@jplock
Copy link
Member

jplock commented Jan 31, 2018

Yea we might have more control over things at the jersey level than just jaxrs

@codefromthecrypt
Copy link
Member Author

so revamped this based on some experience doing this with vert.x and spring web mvc. TL;DR: seems better to process the template in the response phase even if it is about the request.

here's why:

  • request parsing should happen reliably as otherwise if you have an
    error, you don't have good context for it. Deferring request parsing
    can interfere
  • if you have something at request time, but need it at response,
    propagating just a string representing the template is relatively safe
    (vs holding onto the routing object)
  • metrics from spans can only happen after response anyway, so they don't care
  • it is less confusing to have one place to parse than two (request/response)

So.. yeah thinking it is more than reasonable for the template
property to be parsed from the response.

@codefromthecrypt
Copy link
Member Author

next steps: verify this with play, ratpack and jersey2 (otherwise the api might be crap but don't know how yet)

@@ -68,6 +82,16 @@ public void afterCompletion(HttpServletRequest request, HttpServletResponse resp
Span span = tracer.currentSpan();
if (span == null) return;
((SpanInScope) request.getAttribute(SpanInScope.class.getName())).close();
handler.handleSend(response, ex, span);
Object template = request.getAttribute(BEST_MATCHING_PATTERN_ATTRIBUTE);
if (template == null) { // skip thread-local overhead if there's no attribute
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps I know this stuff is mostly copy/paste between here and vert.x (except here the attribute is an object not a string). I'm letting that pressure build before generalizing code

@codefromthecrypt codefromthecrypt force-pushed the http-template branch 2 times, most recently from 906aa20 to ef3cac0 Compare February 12, 2018 09:56
@codefromthecrypt
Copy link
Member Author

Here's an example setting the span name to the http template (when using spring MVC on an endpoint with a path variable)

screen shot 2018-02-12 at 6 06 31 pm

@codefromthecrypt codefromthecrypt force-pushed the http-template branch 2 times, most recently from 534a916 to af5ef03 Compare February 12, 2018 14:37
handler.handleSend(response, ex, span);
}

static class HttpServletResponseWithTemplate extends HttpServletResponseWrapper {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @llinder @hyleung When an http library provides a wrapper, we get a big benefit, which is the ability to stash things through response-shaped holes. You can look at the note on vert.x for the opposite (not saying vert.x is bad, just that wrappers are safer for us to use)

cc @cescoffier

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Ratpack has wrappers out-of-the-box, but iirc we had to introduce wrapper types to work around Ratpack's types (e.g. we don't really get a "Request", but a RequestSpec - so we have to wrap it in something that is more request-like).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very confident this api is fine for ratpack hyleung/ratpack-zipkin@bb49801

* We could also define a custom composite type like ResponseWithTemplate. However, this would
* interfere with people using "instanceof" in http samplers or parsers: they'd have to depend on a
* brave type. The least impact means was to use a thread local, as eventhough this costs a little,
* it prevents revision lock or routine use of custom types.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self "prevents version lock"

@codefromthecrypt
Copy link
Member Author

status update: verified this integrates cleanly with ratpack hyleung/ratpack-zipkin@bb49801

next step: check out play and jersey2

@codefromthecrypt
Copy link
Member Author

synthetic header approach should work with play (even if I can't get it to work right now.. this is more about me and scala+play vs this change)

    result.onComplete {
      case Failure(t) => handler.handleSend(null, t, span)
      case Success(r) => {
        // add a synthetic header if there was a routing path set
        var resp = template.map(t => r.withHeaders("brave-http-template" -> t)).getOrElse(r)
        handler.handleSend(resp, null, span)
      }
    }
--snip--
    override def template(response: Result): String =
      response.header.headers.apply("brave-http-template")

@codefromthecrypt codefromthecrypt changed the title WIP Begins support of http.template tag WIP Begins support of http.route tag Feb 17, 2018
@codefromthecrypt
Copy link
Member Author

Updated description based on various inputs. cc @rakyll @jkschneider

PS besides notes above (ex need to complete jersey2 prior to merge), I'd like to make a default span namer (which handles redirect etc). However, this requires knowing the difference between no matched route, and the framework doesn't support http routing. Needs some thought and advice welcome.

cc also @llinder @hyleung

@codefromthecrypt
Copy link
Member Author

added an issue about jersey support #611

@jkschneider
Copy link

@adriancole Careful about instrumenting WebMVC requests with a HandlerInterceptor because they are never called when a servlet filter intervenes earlier (e.g. spring-security blocks access to a resource).

Also, in the case of WebMVC, the code that sets BEST_PATTERN_MATCHING_ATTRIBUTE does not normalize away extra '/' in the URI. That is the purpose of this nonsense. I'd suggest this is something to test for in other implementations as well.

Sorry if this is a repeat of something you've heard before :)

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Feb 18, 2018 via email

@codefromthecrypt
Copy link
Member Author

@jkschneider opened
#612 about the advice on impact of ordering. This should be commodity knowledge as it is indeed very very important

@codefromthecrypt
Copy link
Member Author

working on a slash deduper..

@codefromthecrypt
Copy link
Member Author

slash thing wasn't needed afterall. I figured out what jersey is doing

* <p>Matched templates are pairs of (resource path, method path). This code skips redundant
* slashes from either source caused by Path("/") or implicitly by Path("").
*/
@Override public String route(ContainerResponse response) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkschneider fyi. feel free to steal this.. managed to get to the bottom of why there were double-slashing. jersey actually normalizes "" to "/" btw. This is a lot more efficient in benches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a commit.. maybe wait until merged :)

@codefromthecrypt
Copy link
Member Author

did some special casing of route parsing as it can be expensive (started about 3 ops/microsecond). Now, pretty good.

Benchmark                                                            Mode  Cnt   Score   Error   Units
TracingApplicationEventListenerAdapterBenchmarks.parseRoute         thrpt   15  80.678 ± 0.923  ops/us
TracingApplicationEventListenerAdapterBenchmarks.parseRoute_nested  thrpt   15  20.599 ± 0.238  ops/us


### Http route cardinality
The http route groups similar requests together, so results in limited
cardinality, often better choice for a span name vs the http method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

often a better choice...


For example, the route `/users/{userId}`, matches `/users/25f4c31d` and
`/users/e3c553be`. If a span name function used the http path instead,
it could DOS-style attack vector on your span name index, as it would

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could result in a DOS-style...

different formats, such as `/users/[0-9a-f]+` or `/users/:userId`, the
cardinality is still fixed with regards to request count.

The http route can can be empty on redirect or not-found. If you are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can can


The http route can can be empty on redirect or not-found. If you are
using the route for metrics, coerce empty to constants like "redirected"
or "not_found" using the http status. For example, the default span name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, tThe default...

### Supporting HttpAdapter.route(response)

Eventhough the route is associated with the request, not the response,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event though

implementations process the request before they can identify the route.

Instrumentation authors implement support via extending HttpAdapter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extending HttpAdapter


#### Callback with non-final type
When a framework uses an callback model, you are in control of the type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses an callback

the route data.

For example, if spring MVC, it would be this:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in sSpring MVC

@codefromthecrypt codefromthecrypt changed the title WIP Begins support of http.route tag Implements parsing of http route and uses it by default for span name Feb 22, 2018
codefromthecrypt pushed a commit to openzipkin/zipkin-api that referenced this pull request Feb 22, 2018
See openzipkin/zipkin#1874
See also brave's recent pull request for more docs we might want to import:
openzipkin/brave#602
@codefromthecrypt codefromthecrypt merged commit 1c85f40 into master Feb 22, 2018
@codefromthecrypt codefromthecrypt deleted the http-template branch February 22, 2018 08:36
codefromthecrypt pushed a commit to openzipkin/zipkin-api that referenced this pull request Feb 22, 2018
See openzipkin/zipkin#1874
See census's definition census-instrumentation/opencensus-specs#49
See also brave's recent pull request for more docs we might want to import:
openzipkin/brave#602
codefromthecrypt pushed a commit to openzipkin/zipkin-api that referenced this pull request Feb 22, 2018
See openzipkin/zipkin#1874
See census's definition census-instrumentation/opencensus-specs#49
See also brave's recent pull request for more docs we might want to import:
openzipkin/brave#602
@codefromthecrypt
Copy link
Member Author

#621 about how we handle names when the status is 200 and we didn't match a route

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate an http.route tag
4 participants