Efficient telemetry for requests that are rejected at Kestrel layer. #33949

avparuch · 2021-06-29T22:52:52Z

When running on Kestrel, requests typically fall into two buckets:

Requests that pass through Kestrel and reach the application middleware.
Requests that are rejected within Kestrel for various reasons and never reach the application layer.

This is a feature request to enable efficient telemetry/logging for the 2nd category to increase observability into such failures.

Let's take a look at IIS/Http.Sys's approach to such failures:

In the case of Http.Sys, from https://learning.oreilly.com/library/view/internet-information-services/9780735624412/ch15s06.html, "HTTPERR log records all errors that are not handed off to a valid worker process, typically responses to clients, connection time-outs, and orphaned requests." In other words, requests that are not serviced by the application/rejected at the Http.Sys layer are logged.

Schema of a typical Http.Sys error log looks like this : https://docs.microsoft.com/en-us/troubleshoot/aspnet/error-logging-http-apis#format-of-the-http-api-error-logs

Similarly, in IIS, it is possible to configure FailedRequestTracing to log requests that fail in IIS layer and never reach the application middleware : https://www.jeremymorgan.com/blog/iis-devops/failed-request-tracing-iis/

A concrete example of where such logging is useful:

When Kestrel receives an authority form request where the request start line and host header are mismatched, Kestrel throws the following exception and returns a 400.0 (Stacktrace shows "google.com because that's the Host header I sent, point is valid for any mismatch). This request does not enter the application middleware and there is no observability for this error (unless Trace logging is turned on).

Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Invalid Host header: 'google.com'
   at Microsoft.AspNetCore.Server.Kestrel.Core.KestrelBadHttpRequestException.Throw(RequestRejectionReason reason, String detail)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.ValidateNonOriginHostHeader(String hostText)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.EnsureHostHeaderExists()
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1Connection.TryParseRequest(ReadResult result, Boolean& endConnection)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)

Specifically, this particular error is coming from this line : https://github.com/dotnet/aspnetcore/blob/be5ba495b4872fa055385463da655ae4c4318898/src/Servers/Kestrel/Core/src/Internal/Http/Http1Connection.cs#L606-607

Repro:

Download ASP.NET Core sample app and fire up Kestrel in Visual Studio.
Use any client (I used wrk2), send a authority form request to Kestrel with mismatched HostHeader and request start line as shown below.
Your client will observe a 400.0 Status Code, but on the Server side, since the request does not enter the application middleware, there is no logging of this error.

The above is just one example, there are other examples (please let me know if you'd want me to provide more examples) where requests are terminated/rejected within Kestrel code and not handed off to application middleware.

The ask is to have functionality, where requests that are not serviced by the application middleware are logged efficiently. This telemetry will be hugely beneficial to critical large-scale distributed services.

The text was updated successfully, but these errors were encountered:

joaocpaiva · 2021-06-30T01:30:20Z

This feature would be useful for multiple services. Having visibility in the logs pipeline to requests that are rejected by Kestrel, is very important for telemetry, debuggability and diagnostics.

While migrating to Kestrel, we found a few regressions (e.g. headers unsupported encoding) that were not detected ahead of time because there were no signs of them in the logs.

halter73 · 2021-06-30T21:12:53Z

Do you want an EventSource event(s), specific log event(s) (or category) or something else? Would we want events for anything that looks like requests but don't get to middleware for whatever reason (e.g. invalid host header, unexpected character when parsing, etc...)? Do you want something specific for just invalid host headers?

We do currently have the ConnectionBadRequest log for all invalid requests on an otherwise healthy connection (Note: Middleware might still have run for a something logged as ConnectionBadRequest if there was an invalid request body. For anything earlier, middleware is not run.), and RequestProcessingError for I/O errors during request processing. Starting in .NET 6, both will be under the new Microsoft.AspNetCore.Server.Kestrel.BadRequests logger category.

We could consider trying to formalize Kestrel's internal RequestRejectionReasons as part of this but I'm hesitant given we've gotten by without doing so this far. A lot of the reasons are specific to a certain versions of HTTP and not something you'd ever expect to see from non-buggy HTTP clients. We've found that logging a message describing the problem is sufficient for most use cases. I don't know why any app-level logic would care about things like whether the client sent a BadChunkSuffix or BadChunkSizeData. This might be interesting for diagnostic purposes, but not much else.

In any case, what details would you want for these events? The less specific the event, the harder it will be to always provide details, but maybe some of the details could be optional.

avparuch · 2021-06-30T23:23:25Z

@halter73 EventSource event or a callback would be great. Regarding the schema for these error events, the Http.sys error log schema is a pretty good starting point. https://docs.microsoft.com/en-us/troubleshoot/aspnet/error-logging-http-apis#format-of-the-http-api-error-logs

Any requests that are not handed off to the worker process (for various reasons) are all logged in the above schema. Most of the schema in the above link is self-explanatory, except for "Reason phrase". (possible values for "Reason phrase" are listed below, (not exhaustive)) :

Verb
Connection_Dropped
Timer_ConnectionIdle
BadRequest
NotFound
Timer_EntityBody
Timer_MinBytesPerSecond
URL
Header
Client_Reset
Connection_Abandoned_By_ReqQueue
Hostname
RequestLength
Forbidden
Timer_HeaderWait
URL_Length
LengthRequired
Version_N/S
Timer_SslRenegotiation
N/A
Request_Cancelled
N/I
Disabled
Number
Connection_Dropped_List_Full
QueueFull

Link to reasons and the explanation : https://docs.microsoft.com/en-us/troubleshoot/aspnet/error-logging-http-apis#kinds-of-errors-that-the-http-api-logs

The Http.Sys error schema does not translate as-is to Kestrel obviously, but I believe the below schema would contain what I'm looking for in a error log for requests/connections that are not handed off to the application middleware. As you mentioned, for some of the requests (connections), these could be well be optional/empty.

TimeStamp, ClientIP, Client Port, ServerIP, Server Port, Host Header, Verb, Protocol, Uri, StatusCode, Reason.

shirhatti · 2021-07-01T00:50:38Z

Based on our offline discussion:

Let's look at emitting new events from KestrelEventSource for failure conditions. We'll introduce a new DTO for logging that'll have the fields based on @avparuch's suggestion

TimeStamp, ClientIP, Client Port, ServerIP, Server Port, Server Name (from SNI), Host Header, Verb, Protocol, Uri, StatusCode, Reason.

Reason phrase is probably just the stringified version of the RequestRejectionReason enum
Fields in the DTO may be null (or have a default/sentinel value).
We'll re-emit all the fields we emit in the ConnectionStart event.
We may have duplication between Events raised here and those observed by middleware. We're not making any efforts to de-dupe.

Open questions:

Is adding an extra parameter to the WriteEvent call considered breaking? We'd rely on this behavior to add more info to logs in the future
How would the W3CLogger consume this? Via a custom EventListener?

Given that this doesn't require new API surface, we can consider doing this in 6.0 after the preview7 release.

halter73 · 2021-07-01T01:01:00Z

What do we want to name the event? RequestRejected?

We may have duplication between Events raised here and those observed by middleware. We're not making any efforts to de-dupe.

I think this is okay, but I also think that only raising these invents if we don't invoke middleware is okay. The only difference is whether we raise the event for request body errors. Only raising the event for requests that are rejected before reaching middleware might make it easier to consume from something like W3CLogger.

JamesNK · 2021-07-01T01:05:44Z

What does DTO mean here? EventSource only supports primitive types. We will just have a list of primitive type arguments for the event source method.

Reason phrase is probably just the stringified version of the RequestRejectionReason enum

As long as it has enough information to communicate to the log reader about why the request was rejected. Might need a field for additional detail (which I'm sure will be ripe for abuse over the years from now 😄).

JamesNK · 2021-07-01T01:09:40Z

If a request makes it to middleware, does that mean it is accepted? So would that mean this event would only fire for requests rejected before middleware?

Also, if a request makes it to middleware then it is simple enough to write some middleware that catches any exceptions and logs them. An event like this isn't as important once the request has made it that far.

Tratcher · 2021-07-01T15:49:32Z

What do we want to name the event? RequestRejected?

Or more general BadRequest?

If a request makes it to middleware, does that mean it is accepted? So would that mean this event would only fire for requests rejected before middleware?

Also, if a request makes it to middleware then it is simple enough to write some middleware that catches any exceptions and logs them. An event like this isn't as important once the request has made it that far.

A request can make it to middleware and still hit a few bad request conditions like bad chunked formatting in the body. Yes those should be reported to the application as an Exception from Stream.ReadAsync, though intercepting that at all points in the app might require wrapping the stream.

BrennanConroy · 2021-07-07T20:50:18Z

Triage: We can add a rejected reason phrase with failure details to a new feature and then pass the feature through a DiagnosticSource

Tratcher added area-runtime partner Partner ask feature-kestrel labels Jun 29, 2021

BrennanConroy added this to the 6.0-rc1 milestone Jul 7, 2021

BrennanConroy assigned shirhatti Jul 7, 2021

adityamandaleeka assigned adityamandaleeka and unassigned shirhatti Jul 14, 2021

adityamandaleeka mentioned this issue Jul 28, 2021

Add feature and DiagnosticSource event for rejected request info #34783

Merged

adityamandaleeka closed this as completed in #34783 Aug 10, 2021

ghost locked as resolved and limited conversation to collaborators Sep 9, 2021

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient telemetry for requests that are rejected at Kestrel layer. #33949

Efficient telemetry for requests that are rejected at Kestrel layer. #33949

avparuch commented Jun 29, 2021 •

edited

Loading

joaocpaiva commented Jun 30, 2021

halter73 commented Jun 30, 2021 •

edited

Loading

avparuch commented Jun 30, 2021 •

edited

Loading

shirhatti commented Jul 1, 2021

halter73 commented Jul 1, 2021

JamesNK commented Jul 1, 2021 •

edited

Loading

JamesNK commented Jul 1, 2021 •

edited

Loading

Tratcher commented Jul 1, 2021

BrennanConroy commented Jul 7, 2021

Efficient telemetry for requests that are rejected at Kestrel layer. #33949

Efficient telemetry for requests that are rejected at Kestrel layer. #33949

Comments

avparuch commented Jun 29, 2021 • edited Loading

joaocpaiva commented Jun 30, 2021

halter73 commented Jun 30, 2021 • edited Loading

avparuch commented Jun 30, 2021 • edited Loading

shirhatti commented Jul 1, 2021

halter73 commented Jul 1, 2021

JamesNK commented Jul 1, 2021 • edited Loading

JamesNK commented Jul 1, 2021 • edited Loading

Tratcher commented Jul 1, 2021

BrennanConroy commented Jul 7, 2021

avparuch commented Jun 29, 2021 •

edited

Loading

halter73 commented Jun 30, 2021 •

edited

Loading

avparuch commented Jun 30, 2021 •

edited

Loading

JamesNK commented Jul 1, 2021 •

edited

Loading

JamesNK commented Jul 1, 2021 •

edited

Loading