Add methods for inspecting Saga DAG #67

smklein · 2022-10-20T21:44:10Z

This PR adds some methods for making DAG access easier.

For example, to test "injecting failures on each node in a saga", one could iterate on nodes from 0..dag.get_node_count(), and invoke the SEC's saga_inject_error method.

Additionally, to query for the name / label information of that node, a caller could invoke dag.get_node_name(...) or dag.get_node_label(...).

I've validated that these methods provide useful info in oxidecomputer/omicron#1843 - let me know if you want me to add more specific tests for anything.

Cargo.lock

davepacheco · 2022-10-20T22:45:19Z

src/dag.rs

+//
+// This constant helps callers iterate over the saga DAG while safely
+// ignoring these indices.
+const START_AND_END_NODES: usize = 2;


I'm a little worried this abstraction exposes implementation details that could reasonably change in the future. Dumb examples: we don't have to have "start" and "end" nodes, and we could choose to insert other kinds of internal nodes.

What about providing an iterator over the nodes instead? We could have the iterator skip various internal nodes if we want (e.g., "start", "end").

To be clear, this constant is not pub - I just wanted to be really explicit about the magic "2" number.

If you'd prefer for me to expose an iterator, I can, but it seems kinda equivalent to say "you can access indices [0, N)" (where "N" is what we expose for node count") vs "here is an iterator that returns the integers [0, N)". Either way, the thing exposed in the pub interface does not expose the start / end nodes.

Good call about the const not being pub. That does help.

It still seems a little brittle to me. I think this change creates an implicit dependency that saga_dag.start_node == saga_dag.graph.node_count() - 2 and saga_dag.end_node == saga_dag.graph.node_count() - 1, as well as that there are no other internal nodes that we might want to hide from users. Previously we've avoided the assumption about the node indexes -- that's why saga_dag have start_node and end_node fields (rather than assuming what indexes those nodes would be). The indexes these nodes get depends on what order you add them to the graph, which I think changed relatively recently.

Relatedly, I suspect we probably should hide SubsagaStart nodes from users: they're not actions and they don't produce any output (and I don't think it makes sense for them to fail). This has the advantage that the name wouldn't need to be an Option. All of the user-visible node types have names.

It feels like an iterator would be a lot cleaner but maybe I'm missing a downside?

Okay, in cf98df1 I implemented this as an iterator and updated the test. LMK what you think!

davepacheco

I like this a lot!

InternalNode is only sort of exposed today. I don't think it's supposed to be pub and it doesn't appear exported by src/lib.rs so I think a consumer couldn't name the type produced by this iterator. That seems not ideal. We also don't really want to expose it -- "internal" here does reflect "internal to Steno", as opposed to Node.

I'm not sure what'd be better though, short of a whole other "node" type here.

smklein · 2022-10-28T20:14:13Z

I like this a lot!

InternalNode is only sort of exposed today. I don't think it's supposed to be pub and it doesn't appear exported by src/lib.rs so I think a consumer couldn't name the type produced by this iterator. That seems not ideal. We also don't really want to expose it -- "internal" here does reflect "internal to Steno", as opposed to Node.

I'm not sure what'd be better though, short of a whole other "node" type here.

I went ahead and made InternalNode pub(crate), so it shouldn't be exposed outside steno.
I went ahead and made a NodeEntry type, which just wraps a &InternalNode, and exposes similar methods.

Builds on #1835 and oxidecomputer/steno#67 Adds a test for the "instance create" saga's unwind safety by: - Using a saga which attempts to create a network interface, allocate an external IP, and attach a disk - Failing after *every single intermediate node* in the saga - Verifying that no intermediate resources remain after any of the attempted executions

Provides a simple API for instructing a node to "execute twice". This provides a "bare-minimum" helper API for testing idempotency within a saga. When combined with #67 - which was used to test unwind safety - it should be possible to test that all actions / undo actions within a saga are idempotent, at least across being called twice. Part of #31

smklein added 3 commits October 20, 2022 17:33

Add methods for inspecting Saga DAG

6183f1c

Add basic test

48eb4da

fmt

19cdbd6

smklein requested a review from davepacheco October 20, 2022 21:44

smklein mentioned this pull request Oct 20, 2022

[nexus] Add tests for instance_create unwind safety oxidecomputer/omicron#1843

Merged

davepacheco reviewed Oct 20, 2022

View reviewed changes

smklein added 3 commits October 21, 2022 11:06

Leave the lockfile alone

249e8a0

Use iterator

cf98df1

Make clippy happy

d5db5a9

davepacheco approved these changes Oct 28, 2022

View reviewed changes

Do not expose InternalNode

1bb1fd1

smklein merged commit 6661416 into main Oct 30, 2022

smklein deleted the node-access branch October 30, 2022 16:03

smklein added a commit that referenced this pull request Nov 1, 2022

A minor extention to #67, expose indices

6e4946b

smklein added a commit that referenced this pull request Nov 2, 2022

A minor extention to #67, expose indices (#73)

4cebb99

smklein mentioned this pull request Dec 26, 2022

Add API for injecting 'repetitions' into saga executor #88

Merged

smklein mentioned this pull request Dec 29, 2022

[nexus] Make instance creation actions/undo actions idempotent oxidecomputer/omicron#2095

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add methods for inspecting Saga DAG #67

Add methods for inspecting Saga DAG #67

smklein commented Oct 20, 2022 •

edited

Loading

davepacheco Oct 20, 2022

smklein Oct 21, 2022

davepacheco Oct 21, 2022

smklein Oct 26, 2022

davepacheco left a comment

smklein commented Oct 28, 2022

Add methods for inspecting Saga DAG #67

Add methods for inspecting Saga DAG #67

Conversation

smklein commented Oct 20, 2022 • edited Loading

davepacheco Oct 20, 2022

Choose a reason for hiding this comment

smklein Oct 21, 2022

Choose a reason for hiding this comment

davepacheco Oct 21, 2022

Choose a reason for hiding this comment

smklein Oct 26, 2022

Choose a reason for hiding this comment

davepacheco left a comment

Choose a reason for hiding this comment

smklein commented Oct 28, 2022

smklein commented Oct 20, 2022 •

edited

Loading