Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server config option to spawn handlers detached from client disconnect behavior #701

Merged
merged 13 commits into from
Jun 15, 2023

Conversation

jgallagher
Copy link
Contributor

Adds a new default_handler_disposition field to both DropshotConfig and ServerConfig. The default value for this (in DropshotConfig's Default impl) is HandlerDisposition::CancelOnDisconnect, which matches the current behavior of dropshot: if a client disconnects, the future returned by the endpoint handler is dropped (and therefore cancelled). If the config is instead HandlerDisposition::DetachFromClient, all endpoint handler futures are tokio::spawn()'d, and they will therefore always run to completion.

I phrased this field as default_handler_disposition under the assumption that we'd (eventually) provide a way for individual endpoints to override this choice, but am very much not wed to the naming here - any suggestions are welcome.

@jgallagher jgallagher requested review from davepacheco and ahl June 12, 2023 15:26
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a change like this, I think it'd be useful to test-integrate it into Omicron before we land it here. What do you think? (I'm afraid you're going to wind up undoing some changes I did recently to remove ..Default::default() from a whole bunch of ConfigDropshot structures. Sorry about that.)

For the record: I confirmed that the current behavior is expected from hyper. I wanted to confirm that because it does seem counter-intuitive to a bunch of us and if we're going so far out of our way to do things differently, I wanted to make sure we weren't just holding it wrong. (I'm not sure why it's the behavior and I remain frustrated that cancellation appears to be undocumented in the canonical Rust and async/await references.)

dropshot/src/config.rs Outdated Show resolved Hide resolved
dropshot/src/config.rs Outdated Show resolved Hide resolved
dropshot/src/config.rs Outdated Show resolved Hide resolved
dropshot/src/api_description.rs Show resolved Hide resolved
// Ignore errors from `send()`: if it fails, it's because the
// outer `http_request_handle` future was cancelled while
// waiting for this result.
_ = tx.send(result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?

Given that we're not, is it at least worth warning in this case? (The answer might be "no", but it seems potentially useful to learn this.)

Come to think of it: I feel like it would be useful to to get a "warn"-level log message any time a client disconnects while we're running a request handler. What do you think? (That can definitely be deferred, too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a silly question, but why not use the Future's output as provided by awaiting on the spawn handle?

I... have no idea. I think it's just habit to use oneshot channels like this; normally when spawning I think I'm not in a position where I can await the future directly.

I do think adding a warning here would be handy, and I think easy? I'll give it a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logging (both here and around the .unwrap() below) in efef6f9.

dropshot/src/server.rs Outdated Show resolved Hide resolved
Comment on lines 438 to 441
// We weren't cancelled: mem::forget() drop_counter so its drop impl
// doesn't run. This leaks a reference to our drop_count, but we're in a
// unit test so won't worry about it.
mem::forget(drop_counter);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two new tests feel more complicated than they need to be (caveat: I'm not sure about that!). Part of it is that feels a little weird to test Drop so explicitly when it's really a proxy for some other behavior that's what we really care about. Especially given that we're using this out to bypass the normal Drop behavior.

One implementation would be to just panic at this point. Of course, that wouldn't let you test that when not cancelled it runs to completion, so I guess we could be worried that it's not panicking by accident.

How about this: what if you had a handler with two explicit counters: one bumped before receiving the message from the main task and one bumped after. I think you could use the same handler and ServerContext. And I think you could drive them the same way? You could have a helper function that's given a value for the config, constructs a server with this handler, starts it, runs one request to normal completion, then starts another request but cancels it, and then shuts down the server and waits for it to shut down. Then it could return the final counter values. Besides hopefully being simpler, an advantage would be that we know we're testing exactly the same thing in both cases.

I haven't fleshed this out any more than that so it's definitely possible there's something I'm missing. The code here looks correct. If you think what's here is a better path, that's fine with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at simplifying them in 8946d44 - I didn't go with the two counters idea, but I think what I did makes what is being tested much more obvious. LMK what you think.

dropshot/tests/test_config.rs Outdated Show resolved Hide resolved
dropshot/tests/test_config.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I mentioned above but I think got lost: I think it could be quite valuable if, when using CancelOnDisconnect we emitted a warning message when the Future was being dropped while in a request handler. This can definitely be a follow-on change (and not asking you to do that) but I figured I'd mention it. The easiest way I can think to do this is to store a bit (AtomicBool?) that starts false, gets set to true after we finish the request handler, and then create something in the same scope with a Drop handler (maybe scope_guard can help here?) that checks if the boolean is false? Seems kind of cheesy though.

CHANGELOG.adoc Outdated Show resolved Hide resolved
Comment on lines 895 to 896
error!(request_log, "handler panicked");
panic!("handler panicked before sending response");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to handle this. It'd be nice to report the error in these cases (or else is it lost completely?). I'm not clear that we can because we don't have anything usefully printable.

This example is interesting:
https://docs.rs/tokio/latest/tokio/task/struct.JoinError.html#examples-1

Maybe that's the way to go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that looks great. It's a little more work to get the error out; added in 204c111

@jgallagher
Copy link
Contributor Author

One thing I mentioned above but I think got lost: I think it could be quite valuable if, when using CancelOnDisconnect we emitted a warning message when the Future was being dropped while in a request handler. This can definitely be a follow-on change (and not asking you to do that) but I figured I'd mention it. The easiest way I can think to do this is to store a bit (AtomicBool?) that starts false, gets set to true after we finish the request handler, and then create something in the same scope with a Drop handler (maybe scope_guard can help here?) that checks if the boolean is false? Seems kind of cheesy though.

I added a TODO to this effect, but yeah, I'm not sure how best to do this either at the moment. With the default changing it doesn't seem super urgent at the moment.

It occurred to me as I was writing this that this might be the first runtime dependency on tokio in Dropshot. I think when we wrote the bulk of Dropshot, we just assumed consumers were using tokio, but I don't know if we depended in any way on tokio being the executor. I'm not sure it's possible to use a different executor today. I don't think it's worth spending much time on this now but at some point we should probably figure out what our dependency really is and document that better.

(Lifting this up since github resolved the conversation when I took your CHANGELOG edit) I don't think that's true: we're using at least tokio::net::{TcpListener, TcpStream} and tokio_rustls; I assume both of those require a tokio executor?

@davepacheco
Copy link
Collaborator

(Lifting this up since github resolved the conversation when I took your CHANGELOG edit) I don't think that's true: we're using at least tokio::net::{TcpListener, TcpStream} and tokio_rustls; I assume both of those require a tokio executor?

That's possible. It's not obvious to me that they would.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants