Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic sagas + working subsagas #29

Merged
merged 76 commits into from
Aug 3, 2022
Merged

Dynamic sagas + working subsagas #29

merged 76 commits into from
Aug 3, 2022

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Jul 8, 2022

This code is a substantial change of the existing behavior. DAGs for a
given saga are no longer statically defined at build time through the use of
SagaTemplateBuilders. Instead, DAGs can be dynamically constructed at runtime
through a DagBuilder, which enables DAGs of different shapes for a given Saga
operation depending upon user input.

Additionally, the implementation of subsagas has changed. Subsagas are no
longer defined by templates, and launched as separate sagas by a node of the
parent saga. This was challenging to make idempotent, and in its current state
was unsound. Instead, subsagas are now added directly as nodes into the dynamic
DAG via the DagBuilder. Subsagas themselves are constructed as Dags and can
get added with other Nodes via DagBuilder::append and
DagBuilder::append_parallel. Subsaga parameters come from a parameter node
that is output from the parent saga. When the top level saga is done being constructed
it is packaged up into a SagaDag.

In order to enable fully dynamic sagas and subsagas, an ActionRegistry was
created, where all actions across all sagas are registered. Dags refer to
these actions by ActionName. This allows a SagaDag to be serialized,
and when deserialized, run arbitrary Rust action code without having to couple
the structure of the DAG to that rust code as in the prior template driven
design.

There are many other goodies sprinkled throughout, including better tests and
validation. See the comments on this PR for details.

Unfortunately the instance id strategy doesn't provide enough information
for the outer saga to identify both its own saga and the nested subsagas it needs :(

We're going to need some other mechanism for this. It may be as simple as
adding another intermediate node with metadata to inform the following nodes
what to do. Or maybe instance_id should instead become a list of key inputs
required  by the following nodes?

Need to think more about this.
@andrewjstone andrewjstone marked this pull request as draft July 8, 2022 06:14
src/dag.rs Outdated Show resolved Hide resolved
src/dag.rs Outdated Show resolved Hide resolved
src/dag.rs Outdated Show resolved Hide resolved
src/example_provision.rs Outdated Show resolved Hide resolved
src/saga_action_generic.rs Show resolved Hide resolved
src/saga_exec.rs Outdated Show resolved Hide resolved
src/saga_exec.rs Outdated Show resolved Hide resolved
src/saga_exec.rs Outdated Show resolved Hide resolved
@andrewjstone andrewjstone changed the title WIP: Dynamic sagas + working subsagas Dynamic sagas + working subsagas Jul 12, 2022
@andrewjstone andrewjstone marked this pull request as ready for review July 12, 2022 20:17
@andrewjstone andrewjstone requested review from davepacheco and ahl July 12, 2022 20:17
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The structure of this looks great. It makes sense where we had to make changes and where things were able to stay the same.

If I'm understanding right, you can summarize this change in three pieces:

  1. SagaTemplate -> Dag
    • DAGs are created when a saga is created, rather than templates created at startup time
    • big changes to the way things are constructed
    • associated changes to recovery: the DAG is stored persistently and actions are associated by name using the registry
  2. lookup() / instance_id changes: this is small in code but feels conceptually bigger. I don't fully grok this yet but I'm going to poke at it more.
  3. Removal of saga_params type because it's harder (maybe impossible) to have an action registry if they have different saga params.

I feel like some of my suggestions are a little vague (in my head as well) so if you don't mind I'd like to prototype a few thoughts (e.g., separating out the builder layers that I mentioned). I hope I'll be able to do this before you're back @ajs so that it won't slow this down.

src/dag.rs Show resolved Hide resolved
src/dag.rs Outdated Show resolved Hide resolved
src/dag.rs Show resolved Hide resolved
src/dag.rs Outdated Show resolved Hide resolved
src/dag.rs Outdated
//
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Dag {
pub(crate) name: SagaName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding right, this is now a human-readable description. (Before, it was load-bearing -- the name of the template found in the saga log was used at recovery time to find the corresponding in-memory template.) Given the significance of the other names, maybe it'd be clearer to call this "label" or "description"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my mind on this one. I prefer the term "name". I'm not sure that implies load-bearing, but it does imply uniqueness to some degree to me. Label doesn't apply uniqueness and seems more like a "tag". These are just my own preferences, but I'm going to leave it for now if that's ok.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.

Anyway I don't mind keeping this called "name". I think we've cleared up the confusion by having newtypes and calling things saga_name vs. node_name vs. action_name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.

That's a good point.

Cargo.toml Show resolved Hide resolved
src/saga_exec.rs Show resolved Hide resolved
src/sec.rs Show resolved Hide resolved
examples/trip.rs Outdated Show resolved Hide resolved
src/saga_exec.rs Outdated Show resolved Hide resolved
src/dag.rs Outdated Show resolved Hide resolved
@davepacheco
Copy link
Collaborator

davepacheco commented Aug 2, 2022

@andrewjstone and I have been iterating on this and it's close to ready. The change to update Omicron to use this is oxidecomputer/omicron#1532.

Other goodies that wound up in this change:

  • First-class notion of a saga's output. Every saga must end with exactly one node. That node's output is the saga's output.
  • The error reporting is much better. When a node produces an error, the "message" in that error includes the name of the node.
  • The pretty-printer is much prettier (and will soon be much better tested)
  • The "dot" output has more useful information (like action name, subsaga structure, etc.)
  • "Constant" nodes -- when building the DAG, you can insert a node that just emits a value that's known already. This is convenient with subsagas or other cases where you've got a generic node that accepts an input from another node, but you happen to already know the right value when you're building the DAG.
  • We do more validation of the saga graph as it's built. You can't create nodes with duplicate names, for example.
  • A bunch of "name" types are better-typed using newtypes: SagaName, NodeName, ActionName.
  • A bunch of doc improvements
  • Update of GitHub Actions Mac CI image to macos-12 since macos-10.15 is being deprecated. Also removed the vestigial windows-debug job.

@davepacheco
Copy link
Collaborator

I wanted to document some of the breaking changes here in case we need to refer back to it:

  • Saga templates are no more. Instead, the saga DAG is constructed each time you run a given saga. (Saga templates used to be the way we mapped a recovered saga back to a DAG + Rust code. Now, the DAG itself is serialized in the recovery state. The mapping to Rust code happens using names for actions.)
  • Actions have names and must be registered. This is how Steno maps actions in a recovered saga DAG to Rust implementations.
  • There are a few more mechanical changes (e.g., the newtypes used for names, changing some function names) as well.

@andrewjstone andrewjstone enabled auto-merge (squash) August 3, 2022 21:27
@andrewjstone andrewjstone merged commit faa20da into main Aug 3, 2022
@andrewjstone andrewjstone deleted the ajs-experiments branch August 3, 2022 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants