Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Add tests for instance_create unwind safety #1843

Merged
merged 25 commits into from
Nov 2, 2022

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Oct 20, 2022

Builds on #1835 and oxidecomputer/steno#67

Adds a test for the "instance create" saga's unwind safety by:

  • Using a saga which attempts to create a network interface, allocate an external IP, and attach a disk
  • Failing after every single intermediate node in the saga
  • Verifying that no intermediate resources remain after any of the attempted executions

@@ -25,7 +25,7 @@ serde_json = "1.0"
serde_with = "2.0.1"
slog = { version = "2.5", features = [ "max_level_trace", "release_max_level_debug" ] }
smf = "0.2"
steno = "0.2"
steno = { git = "https://github.com/oxidecomputer/steno", branch = "node-access" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I'd use a new rev of steno instead before merging. Same elsewhere in this PR

let params = new_test_params(&opctx, project_id);
let dag = create_saga_dag::<SagaInstanceCreate>(params).unwrap();

for i in 0..dag.get_node_count() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "more automated" version of what I tried doing in #1835 - I can definitely consolidate the testing here, and create a "saga unwind tester", if that would be useful.

I basically see three phases:

  1. Setup the saga. Create any orgs / projects / disks / etc needed to run the saga.
  2. Iterate over every single node in the DAG, injecting failures at each one. Log which ones we're working on, to make it clear what went wrong if an error occurs. Run the saga.
  3. Assert that no detritus remains, and that the system is in the same state before / after saga execution.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks great. I look forward to this going back so I can do the same things
with the volume sagas.

Not actually. There is definitely still a dependency, but the "test
setup" part can be re-used with Nexus tests without causing any real
circular dependency issues.

See #1835 (comment)
for an example of one such issue.

Frankly the rest of nexus-test-utils probably *should* continue to be
refactored such that there is no real dependency on Nexus; doing so
will reduce the risk of "mis-matched versions of Nexus" classes of
errors in the future.

This required the following changes:

- Move `nexus`'s config information into `common/src/nexus_config.rs`,
  so that it can be used by `nexus-test-utils`.
- Make `nexus-test-utils` operate on a generic version of `NexusServer`,
  exposed by a new crate named `nexus-test-interface`. This crate can
  contain any information used to operate Nexus during shared test
  setup, without actually requiring a full dependency on `nexus`.
- Note, adding this generic required making `ControlPlaneTestContext`
  generic, which ended up bubbling up to a lot of test setup functions,
  including the one used in the `#[nexus_test]` macro...
- ... So, make it possible to customize which version of `NexusServer`
  is used by `#[nexus_test]`. This means callers can supply their *own*
  version of `NexusServer`, either from their own crate (see: Nexus
  unit tests) or the version from the `omicron_nexus` crate (see: Nexus
  integration tests).
Base automatically changed from disk-create-idempotency to main November 2, 2022 17:21
@smklein smklein merged commit 25ffded into main Nov 2, 2022
@smklein smklein deleted the instance-create-saga-tests branch November 2, 2022 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants