-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read-only region clone attempting to contact expunged downstairs #7209
Comments
Clone source is selected here: omicron/nexus/src/app/sagas/region_snapshot_replacement_start.rs Lines 402 to 440 in 37c7f18
|
When performing region snapshot replacement, the associated start saga chose the request's region snapshot as the clone source, but if that region snapshot was backed by an expunged dataset then it may be gone. This commit adds logic to choose another clone source, either another region snapshot from the same snapshot, or one of the read-only regions for that snapshot. Basic sanity tests were added for ensuring that region replacements and region snapshot replacements resulting from expungement can occur. It was an oversight not to originally include these! Fixes oxidecomputer#7209
Is it possible that the region replacements that failed are using up disk space in crucible on rack2? We've noticed some of the crucible used bytes are getting higher in the recent weeks, e.g., on sled 11
Looking at the crucible zone that corresponds to
The actual usage of non-bogus regions should be around 1.6 TiB if I sum up all of the entries in "created" state
If the failed regions are indeed eating up disk space, we may need to find a way to clean them up. (This issue affects rack2 only.) |
I think you're on to something. Looking at the same gimlet in rack2:
|
One of the common sharp edges of sagas is that the compensating action of a node does _not_ run if the forward action fails. Said another way, for this node: EXAMPLE -> "output" { + forward_action - forward_action_undo } If `forward_action` fails, `forward_action_undo` is never executed. Forward actions are therefore required to be atomic, in that they either fully apply or don't apply at all. Sagas with nodes that ensure multiple regions exist cannot be atomic because they can partially fail (for example: what if only 2 out of 3 ensures succeed?). In order for the compensating action to be run, it must exist as a separate node that has a no-op forward action: EXAMPLE_UNDO -> "not_used" { + noop - forward_action_undo } EXAMPLE -> "output" { + forward_action } The region snapshot replacement start saga will only ever ensure that a single region exists, so one might think they could get away with a single node that combines the forward and compensating action - you'd be mistaken! The Crucible agent's region ensure is not atomic in all cases: if the region fails to create, it enters the `failed` state, but is not deleted. Nexus must clean these up. Fixes an issue that Angela saw where failed regions were taking up disk space in rack2 (oxidecomputer#7209). Separate work will be needed to clean those up, where this commit simply stops the accumulation.
One of the common sharp edges of sagas is that the compensating action of a node does _not_ run if the forward action fails. Said another way, for this node: EXAMPLE -> "output" { + forward_action - forward_action_undo } If `forward_action` fails, `forward_action_undo` is never executed. Forward actions are therefore required to be atomic, in that they either fully apply or don't apply at all. Sagas with nodes that ensure multiple regions exist cannot be atomic because they can partially fail (for example: what if only 2 out of 3 ensures succeed?). In order for the compensating action to be run, it must exist as a separate node that has a no-op forward action: EXAMPLE_UNDO -> "not_used" { + noop - forward_action_undo } EXAMPLE -> "output" { + forward_action } The region snapshot replacement start saga will only ever ensure that a single region exists, so one might think they could get away with a single node that combines the forward and compensating action - you'd be mistaken! The Crucible agent's region ensure is not atomic in all cases: if the region fails to create, it enters the `failed` state, but is not deleted. Nexus must clean these up. Fixes an issue that Angela saw where failed regions were taking up disk space in rack2 (#7209). This commit also includes an omdb command for finding these orphaned regions and optionally cleaning them up.
When performing region snapshot replacement, the associated start saga chose the request's region snapshot as the clone source, but if that region snapshot was backed by an expunged dataset then it may be gone. This commit adds logic to choose another clone source, either another region snapshot from the same snapshot, or one of the read-only regions for that snapshot. Basic sanity tests were added for ensuring that region replacements and region snapshot replacements resulting from expungement can occur. It was an oversight not to originally include these! Rn order to support these new sanity tests, the simulated pantry has to fake activating volumes in the background. This commit also refactors the simulated Pantry to have one Mutex around an "inner" struct instead of many Mutexes. Fixes oxidecomputer#7209
The following region snapshot replacement is not making forward progress:
Looking at a corresponding
region-snapshot-replacement-start
saga invocation shows that thenew_region_ensure
node is failing:This step is responsible for creating a new cloned read-only region that will be used to replace the region snapshot. It uses the region snapshot's region as the source of the clone, but if that region is on an expunged dataset then this clone will never succeed. Tracing through to the crucible agent logs shows this:
This sled is gone:
The fix for this is to choose any other region that shares the snapshot id which is not expunged. Each region snapshot that shares that snapshot id will be exactly the same and can be used as a clone source.
The text was updated successfully, but these errors were encountered: