Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle a fragment in a client-side URI properly #4789

Merged
merged 5 commits into from
Apr 7, 2023

Conversation

trustin
Copy link
Member

@trustin trustin commented Mar 28, 2023

Motivation:

When a user sends a request whose path contains a fragment (e.g. #foo),
Armeria behaves inconsistently depending on whether a user specified an
absolute URI in :path or not.

On an absolute URI, we rely on URI for parsing, which takes a fragment
into account. Otherwise, we use PathAndQuery, which doesn't treat
a fragment as a fragment and just normalizes # into %2A as a part of
path and query.

Modifications:

  • Evolve PathAndQuery into RequestTarget that is capable of parsing
    and normalizing a :path header.
    • RequestTarget now understands a fragment as well as an absolute URI.
    • Added RequestTargetForm
    • When normalizing a client-side path, RequestTarget doesn't clean up
      consecutive slashes anymore, e.g. foo///bar is not normalized into
      foo/bar on the client side.
  • Replaced path, query and fragment fields and parameters with
    RequestTarget where applicable, including:
    • RequestContext implementations
    • AbstractRequestContextBuilder and its subtypes
    • UserClient and its subtypes
    • RoutingContext implementations
      • Removed RoutingStatus.INVALID_PATH because RequestTarget always
        ensures that the path is valid now.
  • Split the path cache metrics into client-side and server-side ones
    • Old meter name: armeria.server.parsed.path.cache
    • New meter names:
      • armeria.path.cache{type=client}
      • armeria.path.cache{type=server}
  • HttpClientDelegate now makes sure ctx.request() == req to prevent
    the loophole where a decorator can send a request with an invalid path.
    • A user must call ctx.updateRequest() to validate the request first.
  • Renamed PathParsingBenchmark to RequestTargetBenchmark
    • Added client-side benchmarks
    • Fixed the incorrect JVM system property name

Result:

  • Armeria client now handles the fragment part of URI consistently.
  • (Defect) Closed the loophole that allowed a decorator to send a different
    request than ctx.request().
    • A decorator now must call ctx.updateRequest() when it replaces
      the current request.
  • (Improvement) Armeria client doesn't normalize consecutive slashes in a
    client request path anymore, giving a user freedom to send such a request.
    • Note: Please make sure that your server or service handles consecutive
      slashes (e.g. foo//bar) properly before an upgrade. Armeria server
      always cleans up such a path for you, so you don't need to worry.
  • (Breaking) RoutingStatus.INVALID_PATH has been removed because
    Armeria doesn't leak a request with an invalid state into router.
  • (Breaking) The signatures of UserClient.execute() have been changed.
  • (Breaking) The names of path cache meters have been changed.
    • Old meter name: armeria.server.parsed.path.cache
    • New meter names:
      • armeria.path.cache{type=client}
      • armeria.path.cache{type=server}

@trustin trustin added the defect label Mar 28, 2023
@trustin trustin added this to the 1.23.0 milestone Mar 28, 2023
@trustin trustin changed the title Handle a fragment in a client-side URI Handle a fragment in a client-side URI properly Mar 28, 2023
@trustin trustin force-pushed the fragment_in_path branch 18 times, most recently from 123224c to 6e104ea Compare March 31, 2023 10:05
@trustin
Copy link
Member Author

trustin commented Mar 31, 2023

RequestTarget's performance seems on-par with PathAndQuery. Just a tad bit slower probably because of more fields and an extra instanceof I guess?

Benchmark                                               Mode  Cnt         Score        Error  Units
PathParsingBenchmark.normal                            thrpt    5  30720979.413 ± 127357.209  ops/s
PathParsingBenchmark.normal_cacheDisabled              thrpt    5   8843342.871 ± 186193.957  ops/s
PathParsingBenchmark.cachedAndNotCached                thrpt    5   5462692.049 ±  80067.659  ops/s
PathParsingBenchmark.cachedAndNotCached_cacheDisabled  thrpt    5   4470356.389 ±  95833.338  ops/s

Benchmark                                          Mode  Cnt         Score         Error  Units
RequestTargetBenchmark.serverCached               thrpt    5  28021809.278 ±   98146.038  ops/s
RequestTargetBenchmark.serverUncached             thrpt    5   9018793.212 ± 4693846.562  ops/s
RequestTargetBenchmark.serverCachedAndUncached    thrpt    5   6285996.690 ± 4742762.253  ops/s
RequestTargetBenchmark.serverUncachedAndUncached  thrpt    5   4105219.973 ±  152538.098  ops/s
RequestTargetBenchmark.clientCached               thrpt    5  28319352.718 ±  143199.318  ops/s
RequestTargetBenchmark.clientUncached             thrpt    5   8859216.436 ± 5989037.889  ops/s
RequestTargetBenchmark.clientCachedAndUncached    thrpt    5   6312324.673 ± 4464064.829  ops/s
RequestTargetBenchmark.clientUncachedAndUncached  thrpt    5   4521083.153 ±   77946.419  ops/s
  • normal = serverCached
  • normal_cacheDisabled = serverUncached
  • cachedAndNotCached = serverCachedAndUncached
  • cachedAndNotCached_cacheDisabled = serverUncachedAndUncached

@trustin trustin force-pushed the fragment_in_path branch 2 times, most recently from ba1a2be to 6c2b215 Compare March 31, 2023 10:49
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was actually renamed from PathParsingBenchmark.

@@ -166,6 +167,7 @@ final class HttpClientFactory implements ClientFactory {
this.options = options;

clientDelegate = new HttpClientDelegate(this, addressResolverGroup);
RequestTargetCache.registerClientMetrics(meterRegistry);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split the cache into server- and client-side ones because they parse request targets differently (absolute forms and fragments notably).

@trustin trustin force-pushed the fragment_in_path branch 2 times, most recently from 4b5bc64 to 4ebeb31 Compare March 31, 2023 12:27
Comment on lines +67 to +72
if (req != ctx.request()) {
return earlyFailedResponse(
new IllegalStateException("ctx.request() does not match the actual request; " +
"did you forget to call ctx.updateRequest() in your decorator?"),
ctx, req);
}
Copy link
Member Author

@trustin trustin Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By making sure req == ctx.request(), we don't need to validate req.path anymore below because ctx.updateRequest() or ctx.newDerivedContext() validated it already.

@trustin trustin requested review from jrhee17 and minwoox as code owners March 31, 2023 14:24
@trustin
Copy link
Member Author

trustin commented Mar 31, 2023

I need to add more test cases to DefaultRequestTargetTest, but this pull request is ready for reviews!

@@ -151,6 +151,7 @@ public HttpResponse execute(ClientRequestContext ctx, HttpRequest req) throws Ex
final ContentPreviewer requestContentPreviewer =
contentPreviewerFactory.requestContentPreviewer(ctx, req.headers());
req = setUpRequestContentPreviewer(ctx, req, requestContentPreviewer, requestPreviewSanitizer);
ctx.updateRequest(req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

@trustin trustin force-pushed the fragment_in_path branch 2 times, most recently from 9788547 to 42dc604 Compare April 4, 2023 08:46
Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good.

@trustin trustin force-pushed the fragment_in_path branch from 6be84c6 to 6b9739b Compare April 5, 2023 08:17
@trustin trustin force-pushed the fragment_in_path branch from 1c84dbc to 40bf116 Compare April 5, 2023 09:36
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup and organization 👍 The new changes look much cleaner. Thanks ! 🙇 👍 🚀

return newDerivedContext(id, req, rpcReq, newHeaders, sessionProtocol(), endpoint, newPath);
if (reqTarget.form() != RequestTargetForm.ABSOLUTE) {
// Not an absolute URI.
return new DefaultClientRequestContext(this, id, req, rpcReq, endpoint, null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) This is probably unrelated to this PR since the old behavior was also like this, but I'm wondering if we should also update the path here as well

final HttpRequest newReq = req.withHeaders(req.headers().toBuilder().path(reqTarget.pathAndQuery()));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can leave it as it is for now because

  • there's currently no way for the caller to tell whether the request they specified has been normalized or not. We need to make more complex change for that, which is probably beyond the scope of this PR.
  • there's performance penalty for rebuilding the headers and its enclosing request.

if (req == null) {
throw connectionError(PROTOCOL_ERROR, "received a DATA Frame for an unknown stream: %d",
streamId);
if (encoder == null || encoder.findStream(streamId) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of this check 😅 Looks really nice 👍

Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. 👍
Let me review again after adding the guard logic to the updateRequest.

.. which was effectively impossible to use an absolute-form target
anyway because of the check in `HttpClientDelegate`
@trustin trustin force-pushed the fragment_in_path branch from 24f18aa to 288899e Compare April 6, 2023 09:26
Copy link
Contributor

@minwoox minwoox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now more intuitive. Thanks!

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🙇‍♂️ 👍

@minwoox minwoox merged commit ce03ff4 into line:main Apr 7, 2023
@minwoox
Copy link
Contributor

minwoox commented Apr 7, 2023

Thanks a lot! 🙇

@trustin trustin deleted the fragment_in_path branch April 18, 2023 06:07
TheWeaVer pushed a commit to TheWeaVer/armeria that referenced this pull request May 31, 2023
Motivation:

When a user sends a request whose path contains a fragment (e.g. `#foo`),
Armeria behaves inconsistently depending on whether a user specified an
absolute URI in `:path` or not.

On an absolute URI, we rely on `URI` for parsing, which takes a fragment
into account. Otherwise, we use `PathAndQuery`, which doesn't treat
a fragment as a fragment and just normalizes `#` into `%2A` as a part of
path and query.

Modifications:

- Evolve `PathAndQuery` into `RequestTarget` that is capable of parsing
  and normalizing a `:path` header.
  - `RequestTarget` now understands a fragment as well as an absolute URI.
  - Added `RequestTargetForm`
  - When normalizing a client-side path, `RequestTarget` doesn't clean up
    consecutive slashes anymore, e.g. `foo///bar` is *not* normalized into
    `foo/bar` on the client side.
- Replaced `path`, `query` and `fragment` fields and parameters with
   `RequestTarget` where applicable, including:
  - `RequestContext` implementations
  - `AbstractRequestContextBuilder` and its subtypes
  - `UserClient` and its subtypes
  - `RoutingContext` implementations
    - Removed `RoutingStatus.INVALID_PATH` because `RequestTarget` always
      ensures that the path is valid now.
- Split the path cache metrics into client-side and server-side ones
  - Old meter name: `armeria.server.parsed.path.cache`
  - New meter names:
    - `armeria.path.cache{type=client}`
    - `armeria.path.cache{type=server}`
- `HttpClientDelegate` now makes sure `ctx.request() == req` to prevent
  the loophole where a decorator can send a request with an invalid path.
  - A user must call `ctx.updateRequest()` to validate the request first.
- Renamed `PathParsingBenchmark` to `RequestTargetBenchmark`
  - Added client-side benchmarks
  - Fixed the incorrect JVM system property name

Result:

- Armeria client now handles the fragment part of URI consistently.
- (Defect) Closed the loophole that allowed a decorator to send a different
  request than `ctx.request()`.
  - A decorator now must call `ctx.updateRequest()` when it replaces
    the current request.
- (Improvement) Armeria client doesn't normalize consecutive slashes in a
  client request path anymore, giving a user freedom to send such a request.
  - Note: Please make sure that your server or service handles consecutive
    slashes (e.g. `foo//bar`) properly before an upgrade. Armeria server
    always cleans up such a path for you, so you don't need to worry.
- (Breaking) `RoutingStatus.INVALID_PATH` has been removed because
  Armeria doesn't leak a request with an invalid state into router.
- (Breaking) The signatures of `UserClient.execute()` have been changed.
- (Breaking) The names of path cache meters have been changed.
  - Old meter name: `armeria.server.parsed.path.cache`
  - New meter names:
    - `armeria.path.cache{type=client}`
    - `armeria.path.cache{type=server}`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants