Dynamic sagas + working subsagas #29

andrewjstone · 2022-07-08T06:13:09Z

This code is a substantial change of the existing behavior. DAGs for a
given saga are no longer statically defined at build time through the use of
SagaTemplateBuilders. Instead, DAGs can be dynamically constructed at runtime
through a DagBuilder, which enables DAGs of different shapes for a given Saga
operation depending upon user input.

Additionally, the implementation of subsagas has changed. Subsagas are no
longer defined by templates, and launched as separate sagas by a node of the
parent saga. This was challenging to make idempotent, and in its current state
was unsound. Instead, subsagas are now added directly as nodes into the dynamic
DAG via the DagBuilder. Subsagas themselves are constructed as Dags and can
get added with other Nodes via DagBuilder::append and
DagBuilder::append_parallel. Subsaga parameters come from a parameter node
that is output from the parent saga. When the top level saga is done being constructed
it is packaged up into a SagaDag.

In order to enable fully dynamic sagas and subsagas, an ActionRegistry was
created, where all actions across all sagas are registered. Dags refer to
these actions by ActionName. This allows a SagaDag to be serialized,
and when deserialized, run arbitrary Rust action code without having to couple
the structure of the DAG to that rust code as in the prior template driven
design.

There are many other goodies sprinkled throughout, including better tests and
validation. See the comments on this PR for details.

Unfortunately the instance id strategy doesn't provide enough information for the outer saga to identify both its own saga and the nested subsagas it needs :( We're going to need some other mechanism for this. It may be as simple as adding another intermediate node with metadata to inform the following nodes what to do. Or maybe instance_id should instead become a list of key inputs required by the following nodes? Need to think more about this.

src/dag.rs

src/example_provision.rs

src/saga_action_generic.rs

src/saga_exec.rs

davepacheco

Nice! The structure of this looks great. It makes sense where we had to make changes and where things were able to stay the same.

If I'm understanding right, you can summarize this change in three pieces:

SagaTemplate -> Dag
- DAGs are created when a saga is created, rather than templates created at startup time
- big changes to the way things are constructed
- associated changes to recovery: the DAG is stored persistently and actions are associated by name using the registry
lookup() / instance_id changes: this is small in code but feels conceptually bigger. I don't fully grok this yet but I'm going to poke at it more.
Removal of saga_params type because it's harder (maybe impossible) to have an action registry if they have different saga params.

I feel like some of my suggestions are a little vague (in my head as well) so if you don't mind I'd like to prototype a few thoughts (e.g., separating out the builder layers that I mentioned). I hope I'll be able to do this before you're back @ajs so that it won't slow this down.

src/dag.rs

davepacheco · 2022-07-14T23:31:06Z

src/dag.rs

+//
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Dag {
+    pub(crate) name: SagaName,


If I'm understanding right, this is now a human-readable description. (Before, it was load-bearing -- the name of the template found in the saga log was used at recovery time to find the corresponding in-memory template.) Given the significance of the other names, maybe it'd be clearer to call this "label" or "description"?

Sure. That makes sense.

I've changed my mind on this one. I prefer the term "name". I'm not sure that implies load-bearing, but it does imply uniqueness to some degree to me. Label doesn't apply uniqueness and seems more like a "tag". These are just my own preferences, but I'm going to leave it for now if that's ok.

I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.

Anyway I don't mind keeping this called "name". I think we've cleared up the confusion by having newtypes and calling things saga_name vs. node_name vs. action_name.

I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.

That's a good point.

Cargo.toml

src/saga_exec.rs

src/sec.rs

examples/trip.rs

src/saga_exec.rs

src/dag.rs

davepacheco · 2022-08-02T03:56:14Z

@andrewjstone and I have been iterating on this and it's close to ready. The change to update Omicron to use this is oxidecomputer/omicron#1532.

Other goodies that wound up in this change:

First-class notion of a saga's output. Every saga must end with exactly one node. That node's output is the saga's output.
The error reporting is much better. When a node produces an error, the "message" in that error includes the name of the node.
The pretty-printer is much prettier (and will soon be much better tested)
The "dot" output has more useful information (like action name, subsaga structure, etc.)
"Constant" nodes -- when building the DAG, you can insert a node that just emits a value that's known already. This is convenient with subsagas or other cases where you've got a generic node that accepts an input from another node, but you happen to already know the right value when you're building the DAG.
We do more validation of the saga graph as it's built. You can't create nodes with duplicate names, for example.
A bunch of "name" types are better-typed using newtypes: SagaName, NodeName, ActionName.
A bunch of doc improvements
Update of GitHub Actions Mac CI image to macos-12 since macos-10.15 is being deprecated. Also removed the vestigial windows-debug job.

davepacheco · 2022-08-03T21:10:28Z

I wanted to document some of the breaking changes here in case we need to refer back to it:

Saga templates are no more. Instead, the saga DAG is constructed each time you run a given saga. (Saga templates used to be the way we mapped a recovered saga back to a DAG + Rust code. Now, the DAG itself is serialized in the recovery state. The mapping to Rust code happens using names for actions.)
Actions have names and must be registered. This is how Steno maps actions in a recovered saga DAG to Rust implementations.
There are a few more mechanical changes (e.g., the newtypes used for names, changing some function names) as well.

andrewjstone added 15 commits June 27, 2022 17:51

Add diagram for sec

6e2807a

wip

8f1976c

wip

e184727

wip

5915678

wip

4f6c629

wip

ba3cbc6

wip - it builds

e8e3c9c

working example

c285c76

Restore demo-provision functionality

237eb56

restore trip demo

039a648

restore tests

8c294eb

remove warnings

d286d87

fix subsaga example

9647068

remove saga_template

81a51c5

andrewjstone marked this pull request as draft July 8, 2022 06:14

andrewjstone added 4 commits July 8, 2022 15:22

Hang instance_id off ActionContext

a57a67d

comments

7debd2e

SubsagaSpec, now just plain SagaSpec

8e70679

comment fixup

17ebf76

andrewjstone commented Jul 12, 2022

View reviewed changes

andrewjstone changed the title ~~WIP: Dynamic sagas + working subsagas~~ Dynamic sagas + working subsagas Jul 12, 2022

andrewjstone marked this pull request as ready for review July 12, 2022 20:17

andrewjstone requested review from davepacheco and ahl July 12, 2022 20:17

andrewjstone mentioned this pull request Jul 12, 2022

Implement disk creation during instance creation oxidecomputer/omicron#812

Closed

Remove reference to SagaTemplateBuilder

86fbaa9

davepacheco reviewed Jul 15, 2022

View reviewed changes

davepacheco mentioned this pull request Jul 16, 2022

playing with the steno consumer experience with dynamic DAGs #30

Merged

ahl reviewed Jul 18, 2022

View reviewed changes

src/dag.rs Outdated Show resolved Hide resolved

andrewjstone and others added 8 commits July 30, 2022 17:37

it works I think

5ac0853

some doc comment fixups

e87abfb

Add end node to print order and a first test

ae9ded4

another test

a65e5a2

test with nested subsagas and parallel nodes

05cb854

Fix up tests for print format change

c738050

DagBuilderError message could include saga name

f933566

saga name should be available to SecStore

bb5d060

davepacheco mentioned this pull request Aug 2, 2022

update to steno 0.2.0 oxidecomputer/omicron#1532

Merged

5 tasks

andrewjstone added 7 commits August 2, 2022 00:53

wip

fc90a6d

dag generation works

c5a7fba

Add a simple property

6637db1

Working property based test

eb4d97d

An extra check for SubsagaStart nodes

9afd95b

Put property test in its own module

e186f4e

fix up comments

891057e

andrewjstone mentioned this pull request Aug 2, 2022

Add explicit parallel tracking to PrintOrderer #33

Open

davepacheco and others added 4 commits August 3, 2022 09:04

add test for unregistered action

f13d039

validate sagas better

e0223b3

fix typo

5de8ebd

trip example pub changes weren't necessary

28cc754

davepacheco added 2 commits August 3, 2022 14:13

update to macos-12

6ec8569

remove windows-debug

24bfaaa

davepacheco mentioned this pull request Aug 3, 2022

panic executing saga that used append_parallel with empty Vec #25

Closed

kick GitHub actions for change in required checks

b2f60a9

andrewjstone enabled auto-merge (squash) August 3, 2022 21:27

andrewjstone merged commit faa20da into main Aug 3, 2022

andrewjstone deleted the ajs-experiments branch August 3, 2022 21:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic sagas + working subsagas #29

Dynamic sagas + working subsagas #29

andrewjstone commented Jul 8, 2022 •

edited

Loading

davepacheco left a comment

davepacheco Jul 14, 2022

andrewjstone Jul 22, 2022

andrewjstone Jul 26, 2022

davepacheco Jul 28, 2022

andrewjstone Jul 28, 2022

davepacheco commented Aug 2, 2022 •

edited

Loading

davepacheco commented Aug 3, 2022

Dynamic sagas + working subsagas #29

Dynamic sagas + working subsagas #29

Conversation

andrewjstone commented Jul 8, 2022 • edited Loading

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Jul 14, 2022

Choose a reason for hiding this comment

andrewjstone Jul 22, 2022

Choose a reason for hiding this comment

andrewjstone Jul 26, 2022

Choose a reason for hiding this comment

davepacheco Jul 28, 2022

Choose a reason for hiding this comment

andrewjstone Jul 28, 2022

Choose a reason for hiding this comment

davepacheco commented Aug 2, 2022 • edited Loading

davepacheco commented Aug 3, 2022

andrewjstone commented Jul 8, 2022 •

edited

Loading

davepacheco commented Aug 2, 2022 •

edited

Loading