Buffer should destructure to Vec when single-referenced #1064

pfmooney · 2023-12-18T22:20:32Z

Until now, Buffer offered no means of extracting the internal Vec<u8> when, say, a Volume::read() operation had completed. This made it impossible for Crucible consumers to reuse the Vec buffer, forcing an otherwise unnecessary allocation.

There are a few other changes I believe need to be made before consumers like Propolis can start reusing their buffer Vec, rather than reallocating after every request, but I believe this is a step in the right direction for that.

Until now, Buffer offered no means of extracting the internal `Vec<u8>` when, say, a Volume::read() operation had completed. This made it impossible for Crucible consumers to reuse the `Vec` buffer, forcing an otherwise unnecessary allocation.

jmpesp · 2023-12-21T15:23:22Z

upstairs/src/block_req.rs

+        // Dropping a BlockRes without issuing a completion would mean the
+        // associated waiter would be stuck waiting forever for a result.


I like this check! But: instead of asserting, would a send_err be appropriate?

It could be. I'll leave that decision up to y'all. Not dealing with a BlockRes before it's dropped is definitely a programmer error (vs operational error), but since it's not being enforced by the compiler, I can appreciate not wanting an assertion. That said, having an assert helped me track down where such drops were occurring.

Some additional context: propolis has been using a similar pattern for its block Request abstraction.

Sorry for my ignorance here,
Who is dropping the BlockRes in this situation? Is it the crucible upstairs side that would
have done the drop?

Yeah, it would be upstairs which could drop a BlockRes rather than properly "disposing" of it by calling send_ok or send_err.

I like panic for programmer errors (and, yeah, I benefit so often by catching myself).

upstairs/src/lib.rs

jmpesp · 2023-12-21T15:25:32Z

upstairs/src/lib.rs

+        // Any BlockReqs or GuestWork which remains pending on this Guest when
+        // it is dropped should be issued an error completion.  This avoids
+        // dropping BlockRes BlockRes instances prior to completion (which
+        // results in a panic).


upstairs/src/upstairs.rs

jmpesp

I deployed and tested this, it looks good! Thanks @pfmooney

#1064 added new logic to the `BlockReq`: if the notification channel was dropped without a reply having been sent, Crucible would panic. After a bunch of discussion, I no longer think this extra complexity is worth it, and would like to remove it. ## Why did we do this in the first place? In general, we expect a `BlockReq` to _always_ be answered by Crucible. If one is dropped without a reply being sent, it indicates a programmer error. This behavior was inspired by similar logic in Propolis (https://github.com/oxidecomputer/propolis/blob/85dc6db36709981c1d35b0651a18e20caee0b73f/lib/propolis/src/block/mod.rs#L176-L182). ## Propolis and Crucible are different The `Request` in Propolis is built around a `device::TrackingMarker`. If it is dropped without a reply, then the listening side will **wait forever**; this is a potential liveness issue, so turning it into a panic is sensible. The `BlockReq` in Crucible is built around a `tokio::sync::oneshot` channel. If it is dropped without a reply, then the `oneshot::Receiver` will return immediately with an error. In fact, we handle that case in our `BlockReqWaiter`, translating this error into `CrucibleError::RecvDisconnected` (https://github.com/oxidecomputer/crucible/blob/e813da3f/upstairs/src/block_req.rs#L67). In other words, we're _already_ handling the case of a `BlockReq` being dropped without a reply (by reporting an error). There is no potential liveness issue. (This also means that the block comment in `BlockRes::drop`, https://github.com/oxidecomputer/crucible/blob/e813da3f/upstairs/src/block_req.rs#L39-L40 is incorrect) It's arguably better to panic upon programmer error, versus going through normal error-handling paths (which defer handling to the caller of the `Guest` API). However... ## False positives during program exit There is one time when a `BlockReq` can be dropped without a reply: during program exit (or scope exit, for unit tests). As far as I can tell, this panic **hasn't caught any true programmer errors**. Instead, it's found a bunch of cases where `BlockReq` are dropped during exit, and we've had to add `Drop` implementations to prevent it from firing. The original PR added `impl Drop for Guest` (https://github.com/oxidecomputer/crucible/blob/e813da3f/upstairs/src/lib.rs#L2424-L2446), but it missed a sneaky case where `BlockReq` were stored in the `UpstairsState`; a later PR added `impl Drop for Upstairs` (https://github.com/oxidecomputer/crucible/blob/e813da3f/upstairs/src/upstairs.rs#L194-L207) to avoid panics during certain unit tests. ## Future false positives This check also makes it easy to accidentally create false positives when writing new code. The `offload-write-encryption` branch (#1066) moves some `BlockReq`s into a thread pool to perform encryption outside of the main task. However, because we didn't add special handling for program exit, this caused a panic at shutdown. ## This is hard to do by accident Remember what we're actually trying to protect against: someone dropping a `BlockReq` without replying. Because we've annotated relevant functions with `#[must_use]`, this would require either ignoring warnings or explicitly discarding the `BlockReq`. It's hard to do by accident! ## Summary - This `panic!` guards against something that's hard to do by accident - If we **do** make this mistake, it's reported as a distinct error to the caller - Unlike Propolis, there's no liveness concern - So far, 100% of issues it discovered are false positives at program exit - It is easy to introduce similar false positives when writing new code - New false positives will manifest as panics during program exit - If we introduce new false positives, we will not discover the issue until runtime (and it may be rare, because Tokio does not destroys tasks in a deterministic order) ## Proposal - Remove the `panic!` - Remove the `Drop` implementations which work around it at exit - Add logging for this case: - Fire a DTrace probe if a `BlockReq` is dropped without a reply - Log when we see a `CrucibleError::RecvError`

pfmooney added 2 commits December 18, 2023 15:36

Rework how requests/responses flow through system

3a1dbec

jmpesp requested changes Dec 21, 2023

View reviewed changes

Review feedback

f14975b

jmpesp approved these changes Dec 22, 2023

View reviewed changes

pfmooney merged commit fc7a1ad into oxidecomputer:main Dec 22, 2023
18 checks passed

pfmooney deleted the buffer-bits branch December 22, 2023 19:56

This was referenced Jan 2, 2024

test-up-encrypted has become flaky #1077

Closed

Offload write encryption #1066

Merged

Remove panic-on-drop logic #1085

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer should destructure to Vec when single-referenced #1064

Buffer should destructure to Vec when single-referenced #1064

pfmooney commented Dec 18, 2023

jmpesp Dec 21, 2023

pfmooney Dec 21, 2023

pfmooney Dec 21, 2023

leftwo Dec 21, 2023

pfmooney Dec 21, 2023

leftwo Dec 22, 2023

jmpesp Dec 21, 2023

jmpesp left a comment

		// Dropping a BlockRes without issuing a completion would mean the
		// associated waiter would be stuck waiting forever for a result.

Buffer should destructure to Vec when single-referenced #1064

Buffer should destructure to Vec when single-referenced #1064

Conversation

pfmooney commented Dec 18, 2023

jmpesp Dec 21, 2023

Choose a reason for hiding this comment

pfmooney Dec 21, 2023

Choose a reason for hiding this comment

pfmooney Dec 21, 2023

Choose a reason for hiding this comment

leftwo Dec 21, 2023

Choose a reason for hiding this comment

pfmooney Dec 21, 2023

Choose a reason for hiding this comment

leftwo Dec 22, 2023

Choose a reason for hiding this comment

jmpesp Dec 21, 2023

Choose a reason for hiding this comment

jmpesp left a comment

Choose a reason for hiding this comment