Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling and logging in HTTP::Server #9115

Merged
merged 16 commits into from
Apr 23, 2020

Conversation

waj
Copy link
Member

@waj waj commented Apr 17, 2020

Currently HTTP::Server logs (or at least it tries: #9067) every error, including those produced by a sudden client disconnection. These changes aim to:

  • Log everything through Log api
  • Ignore errors raised when a client connection is lost
  • Don't write back headers (with status 500) or error messages when there was some content sent already

Changes are made to HTTP::Server::RequestProcessor, HTTP::ErrorHandler and HTTP::LogHandler. The last one used to receive an optional IO to customize where to write the logs, so this could be seen as a breaking change. Although not so big, I think.

fixes #9067

@waj waj added kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:feature topic:stdlib:networking topic:stdlib:log labels Apr 17, 2020
bcardiff
bcardiff previously approved these changes Apr 17, 2020
Copy link
Member

@bcardiff bcardiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo and an optional comment. LGTM

spec/std/http/server/response_spec.cr Outdated Show resolved Hide resolved
src/http/server/handlers/log_handler.cr Show resolved Hide resolved
@waj waj force-pushed the fix/http-server-error-handling branch from 1f01dd3 to ee5213f Compare April 17, 2020 19:55
spec/std/http/server/request_processor_spec.cr Outdated Show resolved Hide resolved
spec/std/http/server/request_processor_spec.cr Outdated Show resolved Hide resolved
spec/std/http/server/request_processor_spec.cr Outdated Show resolved Hide resolved
spec/std/http/server/handlers/error_handler_spec.cr Outdated Show resolved Hide resolved
spec/std/http/server/handlers/error_handler_spec.cr Outdated Show resolved Hide resolved
src/http/server/handlers/log_handler.cr Outdated Show resolved Hide resolved
error.puts "Unhandled exception on HTTP::Handler"
ex.inspect_with_backtrace(error)
unless response.closed?
Log.error(exception: ex) { "Unhandled exception on HTTP::Handler" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this play with HTTP::ErrorHandler? In case the error handler triggers, there won't be a log entry, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, HTTP::ErrorHandler swallows all the exceptions already. This is used when HTTP::Server is used without handlers or when something more severe happens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you only use ErrorHandler you get less than without it (with respect to error logging) and you need to include LogHandler to go back to the behaviour that was already provided by plain RequestProcessor. This seems a bit awkward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's currently explained in the ErrorHandler doc. Maybe we could remove the logging from RequestProcessor so it doesn't get less 😆

The way I see it, the handler at RequestProcessor is there as last resort when no middleware are used. Would you combine ErrorHandler and LogHandler in the same class? Or remove it altogether and have always the same behavior at RequestProcessor?

Anyway, I didn't change that in this PR and these classes are getting as old as Crystal I think. Let's open a RFC or a separate PR to modify this stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove ErrorHandler?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrorHandler is responsible for sending 500 to the client in case of an unhandled exception and, if desired, send also the backtrace. Right now is sending just plain text, but we could make a nice Crystal branded page displaying the error and some other context information by default. Web frameworks probably override this or create their own error handlers but I think it's good to have something usable in the std for applications not using one of those frameworks.

Actually I think that the problem that @straight-shoota describes could be solved if we use LogHandler as the first handler and let ErrorHandler to let the exception bubble up. So, if LogHandler is not being used, the error is still handled by RequestProcessor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did was sneak peek how rack solves this issue and do something similar. Effectively, LogHandler must be the first handler to work properly and it doesn't handle or logs any error. It just logs requests. The ErrorHandler, if used, will send appropriate responses to the client if possible, and also log the errors.

So now, if you only use LogHandler the requests are logged, and the errors are still handled by the RequestProcessor (but not sent to the client).

If you only use ErrorHandler the errors are sent to the client and logged, just like the RequestProcessor do, although we might want to make it more detailed in the future. Another reason I found to log errors from here is that I think it's more clear that the exception is logged before the request as it happens when LogHander is used alone.

@io.puts "#{context.request.method} #{context.request.resource} - Unhandled exception:"
e.inspect_with_backtrace(@io)
unless context.response.closed?
Log.error(exception: e) { "#{context.request.method} #{context.request.resource} - Unhandled exception:" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly different story but also worth fixing: In case Log.info or elapsed_text fails, this rescue block would be triggered. I don't think that's expected because it's not an error related to the HTTP application, but the internal logging setup. The rescue should just cover call_next(context), not even Time.measure (which is also an internal detail and failing that is not relevant to the HTTP protocol).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I also noticed that but I was trying to focus just on the logging and not handling client disconnections for this PR.
I'll make the change anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, solved on this PR

src/http/server/handlers/log_handler.cr Show resolved Hide resolved
@waj
Copy link
Member Author

waj commented Apr 17, 2020

There should be an initializer accepting a Log instance so users can use a custom configured one

Why? That's the point of the new logging framework, that provides flexibility to change the backend for each source. Otherwise, would you provide flexibility to change any logger in the std?

@straight-shoota
Copy link
Member

straight-shoota commented Apr 17, 2020

You can easily have different HTTP::Server instances running side by side. And I would like to treat the log messages from a server that serves say an internal JSON API differently than from a server that serves public HTML requests. I don't want all log entries mangled at http.server but separate log sources at myapp.public_http and myapp.json_api.
This could technically be implemented as a context-based filter. But using a different log source is more approriate. Context-based filtering could handle different vhosts for example.

Such a flexibility is certainly not necessary for typical loggers. But LogHandler is an explicit tool to configure log handling for an HTTP application and it should offer some options for customizability.
This handler is for logging interactions of an HTTP application, not the HTTP::Server implementation.

@waj waj dismissed bcardiff’s stale review April 18, 2020 05:08

Going back to make some changes

@waj waj force-pushed the fix/http-server-error-handling branch from 325f38d to 390f106 Compare April 22, 2020 22:26
@waj
Copy link
Member Author

waj commented Apr 22, 2020

I think I covered all the comments and changes requested. I tested it a lot with a "real" app in different scenarios and the results feel reasonable to me.
I also added the remote address and HTTP version to the log entries to make it more similar to other frameworks.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have three (!) different locations where status information about a request gets logged. That's a total mess.

Maybe we need to consider the whole approach to these handlers, see my comment on log_handler_spec.cr

src/http/server/handlers/log_handler.cr Outdated Show resolved Hide resolved
end
(log_io.to_s =~ %r(GET / - Unhandled exception:)).should be_truthy
match_logs(logs,
{:info, %r(^- - GET / HTTP/1.1 - 200 \(\d+(\.\d+)?[mµn]s\)$)}
Copy link
Member

@straight-shoota straight-shoota Apr 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logged status code is plainly wrong here. The server actually responds with 500, not 200 and the log must reflect that.
I'm not sure what's the best way to make this accurate. The LogHandler probably needs to differentiate whether the handler chain was successfully completed or an exception was raised. The exception part would then need to duplicate some of the logic from RequestProcessor to tell the expected response status.

Another option would be to integrate LogHandler directly into RequestProcessor. This would avoid duplication of the error handling and centralize logging to a single place. It would obviously need to be configurable.

The purpose of ErrorHandler should also be considered in this regard. It seems quite useless when its only effect (with default verbose: false) is to change this:

I, [2020-04-22T15:21:41.548787000Z #5279]   INFO -- crystal-run-server.tmp:http.server: 127.0.0.1:38650 - GET / HTTP/1.1 - 200 (74.30µs)
E, [2020-04-22T15:21:41.548973000Z #5279]  ERROR -- crystal-run-server.tmp:http.server: Unhandled exception on HTTP::Handler -- Exception: foo

To this:

E, [2020-04-22T15:31:56.224456000Z #5412]  ERROR -- crystal-run-server.tmp:http.server: Unhandled exception -- Exception: foo
I, [2020-04-22T15:31:56.224721000Z #5412]   INFO -- crystal-run-server.tmp:http.server: 127.0.0.1:50044 - GET / HTTP/1.1 - 500 (376.90µs)

This continues the thread from #9115 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realized the problem with the status codes. That happens when only LogHandler is used. In Rack the log handler writes logs without handling errors so when this happens the request is not even logged. I think it's better (not ideal though) to log the wrong status code than not logging anything at all.

The purpose of ErrorHandler is handling errors, showing them to the client and writing the logs. I think the issue is the placement of the LogHandler and the fact that both handlers must be used at the same time to work properly. Rack also has the same "problem" but it setups both handlers automatically by default. Maybe we could do that but probably this could be better if the design is changed a little bit. Maybe LogHandler cannot be a regular HTTP::Handler and it should be a strategy instead, plugged in the RequestProcessor.

@waj
Copy link
Member Author

waj commented Apr 23, 2020

We can't have three (!) different locations where status information about a request gets logged. That's a total mess.

Well, I agree just 50%. It's not actually status information that it's logged in three places. Request status is logged just in one place: LogHandler. Is the error logging that it's duplicated between LogHandler and ErrorHandler

Maybe we need to consider the whole approach to these handlers, see my comment on log_handler_spec.cr

I agree and I've been discussing this for more than 1h with @bcardiff at the phone and I've been moving around the code several times without settling for any option. I was trying to keep the Rack design but it's not possible to "make it right" this way. As I said above, the alternative is make the LogHandler not a regular Handler but something that's plugged to the RequestProcessor.

Anyway, I think this is an improvement, and actually a fix, over the design that we currently have. That again, it's not "our" design, but a translation of the Rack design. I'll open a RFC with a proposal to make this different but for now we can merge this that solves issues we currently have without changing the design too much.

@waj waj force-pushed the fix/http-server-error-handling branch from 24e6fa8 to 5c9e79c Compare April 23, 2020 14:20
@waj waj merged commit bd9d5a2 into crystal-lang:master Apr 23, 2020
@waj waj deleted the fix/http-server-error-handling branch April 23, 2020 18:43
@RX14
Copy link
Contributor

RX14 commented Apr 24, 2020

I'd prefer to have the logging built into the server, and not a handler. We already have existing ways to configure logging levels, and I think a handler ends up duplicating that.

@waj
Copy link
Member Author

waj commented Apr 25, 2020

Yes, maybe not embedded because I’d like to keep it configurable in case anyone want to log differently, but definitely not as a handler. I’ll return to this once I close more priority things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:feature topic:stdlib:log topic:stdlib:networking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP::Server errors are silently ignored
4 participants