Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for inspecting Saga DAG #67

Merged
merged 7 commits into from
Oct 30, 2022
Merged

Add methods for inspecting Saga DAG #67

merged 7 commits into from
Oct 30, 2022

Conversation

smklein
Copy link
Contributor

@smklein smklein commented Oct 20, 2022

This PR adds some methods for making DAG access easier.

For example, to test "injecting failures on each node in a saga", one could iterate on nodes from 0..dag.get_node_count(), and invoke the SEC's saga_inject_error method.

Additionally, to query for the name / label information of that node, a caller could invoke dag.get_node_name(...) or dag.get_node_label(...).

I've validated that these methods provide useful info in oxidecomputer/omicron#1843 - let me know if you want me to add more specific tests for anything.

Cargo.lock Outdated Show resolved Hide resolved
src/dag.rs Outdated
//
// This constant helps callers iterate over the saga DAG while safely
// ignoring these indices.
const START_AND_END_NODES: usize = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried this abstraction exposes implementation details that could reasonably change in the future. Dumb examples: we don't have to have "start" and "end" nodes, and we could choose to insert other kinds of internal nodes.

What about providing an iterator over the nodes instead? We could have the iterator skip various internal nodes if we want (e.g., "start", "end").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this constant is not pub - I just wanted to be really explicit about the magic "2" number.

If you'd prefer for me to expose an iterator, I can, but it seems kinda equivalent to say "you can access indices [0, N)" (where "N" is what we expose for node count") vs "here is an iterator that returns the integers [0, N)". Either way, the thing exposed in the pub interface does not expose the start / end nodes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call about the const not being pub. That does help.

It still seems a little brittle to me. I think this change creates an implicit dependency that saga_dag.start_node == saga_dag.graph.node_count() - 2 and saga_dag.end_node == saga_dag.graph.node_count() - 1, as well as that there are no other internal nodes that we might want to hide from users. Previously we've avoided the assumption about the node indexes -- that's why saga_dag have start_node and end_node fields (rather than assuming what indexes those nodes would be). The indexes these nodes get depends on what order you add them to the graph, which I think changed relatively recently.

Relatedly, I suspect we probably should hide SubsagaStart nodes from users: they're not actions and they don't produce any output (and I don't think it makes sense for them to fail). This has the advantage that the name wouldn't need to be an Option. All of the user-visible node types have names.

It feels like an iterator would be a lot cleaner but maybe I'm missing a downside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in cf98df1 I implemented this as an iterator and updated the test. LMK what you think!

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot!

InternalNode is only sort of exposed today. I don't think it's supposed to be pub and it doesn't appear exported by src/lib.rs so I think a consumer couldn't name the type produced by this iterator. That seems not ideal. We also don't really want to expose it -- "internal" here does reflect "internal to Steno", as opposed to Node.

I'm not sure what'd be better though, short of a whole other "node" type here.

@smklein
Copy link
Contributor Author

smklein commented Oct 28, 2022

I like this a lot!

InternalNode is only sort of exposed today. I don't think it's supposed to be pub and it doesn't appear exported by src/lib.rs so I think a consumer couldn't name the type produced by this iterator. That seems not ideal. We also don't really want to expose it -- "internal" here does reflect "internal to Steno", as opposed to Node.

I'm not sure what'd be better though, short of a whole other "node" type here.

  • I went ahead and made InternalNode pub(crate), so it shouldn't be exposed outside steno.
  • I went ahead and made a NodeEntry type, which just wraps a &InternalNode, and exposes similar methods.

@smklein smklein merged commit 6661416 into main Oct 30, 2022
@smklein smklein deleted the node-access branch October 30, 2022 16:03
smklein added a commit that referenced this pull request Nov 1, 2022
smklein added a commit that referenced this pull request Nov 2, 2022
smklein added a commit to oxidecomputer/omicron that referenced this pull request Nov 2, 2022
Builds on #1835 and oxidecomputer/steno#67

Adds a test for the "instance create" saga's unwind safety by:
- Using a saga which attempts to create a network interface, allocate an external IP, and attach a disk
- Failing after *every single intermediate node* in the saga
- Verifying that no intermediate resources remain after any of the attempted executions
smklein added a commit that referenced this pull request Dec 28, 2022
Provides a simple API for instructing a node to "execute twice".

This provides a "bare-minimum" helper API for testing idempotency within a saga. When combined with #67 - which was used to test unwind safety - it should be possible to test that all actions / undo actions within a saga are idempotent, at least across being called twice.

Part of #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants