Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Let start saga handle unwinding from sled agent instance PUT errors (#…
…4682) Remove `Nexus::handle_instance_put_result`. In its place, make Nexus instance routines that invoke sled agent instance PUT endpoints decide how to handle their own errors, and be more explicit about the specific kinds of errors these operations can produce. Use this flexibility to allow the instance start and migrate sagas handle failure to start a new instance (or to start a migration target) by unwinding instead of having to reckon with callee-defined side effects of failing a call to sled agent. Other callers continue to do what `handle_instance_put_result` did. Improve some tests: - Add a test variation to reproduce #4662. To support this, teach the simulated sled agent to let callers inject failure into calls to ensure an instance's state. - Fix up a bit of simulated sled agent logic that was unfaithful to the real sled agent's behavior and that caused the new test to pass when it should have failed. - Make sure that start saga tests that unwind explicitly verify that unwinding the saga doesn't leak provisioning counters. Tests: Cargo tests including the new start saga variation; smoke tested instance start/stop/reboot on a dev cluster. Fixes #4662.
- Loading branch information