-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic sagas + working subsagas #29
Conversation
Unfortunately the instance id strategy doesn't provide enough information for the outer saga to identify both its own saga and the nested subsagas it needs :( We're going to need some other mechanism for this. It may be as simple as adding another intermediate node with metadata to inform the following nodes what to do. Or maybe instance_id should instead become a list of key inputs required by the following nodes? Need to think more about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! The structure of this looks great. It makes sense where we had to make changes and where things were able to stay the same.
If I'm understanding right, you can summarize this change in three pieces:
SagaTemplate
->Dag
- DAGs are created when a saga is created, rather than templates created at startup time
- big changes to the way things are constructed
- associated changes to recovery: the DAG is stored persistently and actions are associated by name using the registry
lookup()
/ instance_id changes: this is small in code but feels conceptually bigger. I don't fully grok this yet but I'm going to poke at it more.- Removal of saga_params type because it's harder (maybe impossible) to have an action registry if they have different saga params.
I feel like some of my suggestions are a little vague (in my head as well) so if you don't mind I'd like to prototype a few thoughts (e.g., separating out the builder layers that I mentioned). I hope I'll be able to do this before you're back @ajs so that it won't slow this down.
src/dag.rs
Outdated
// | ||
#[derive(Debug, Clone, Serialize, Deserialize)] | ||
pub struct Dag { | ||
pub(crate) name: SagaName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding right, this is now a human-readable description. (Before, it was load-bearing -- the name of the template found in the saga log was used at recovery time to find the corresponding in-memory template.) Given the significance of the other names, maybe it'd be clearer to call this "label" or "description"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed my mind on this one. I prefer the term "name". I'm not sure that implies load-bearing, but it does imply uniqueness to some degree to me. Label doesn't apply uniqueness and seems more like a "tag". These are just my own preferences, but I'm going to leave it for now if that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.
Anyway I don't mind keeping this called "name". I think we've cleared up the confusion by having newtypes and calling things saga_name vs. node_name vs. action_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe Steno does assume this value is unique. The consumer might? In the case of Omicron, I imagine this will be something like "instance-provision". It doesn't uniquely identify the execution or the DAG, though it might uniquely identify the purpose or the subsystem that created it? Maybe some metrics or tooling will assume these values mean something, but I don't think Steno does.
That's a good point.
@andrewjstone and I have been iterating on this and it's close to ready. The change to update Omicron to use this is oxidecomputer/omicron#1532. Other goodies that wound up in this change:
|
I wanted to document some of the breaking changes here in case we need to refer back to it:
|
This code is a substantial change of the existing behavior. DAGs for a
given saga are no longer statically defined at build time through the use of
SagaTemplateBuilder
s. Instead, DAGs can be dynamically constructed at runtimethrough a
DagBuilder
, which enables DAGs of different shapes for a given Sagaoperation depending upon user input.
Additionally, the implementation of subsagas has changed. Subsagas are no
longer defined by templates, and launched as separate sagas by a node of the
parent saga. This was challenging to make idempotent, and in its current state
was unsound. Instead, subsagas are now added directly as nodes into the dynamic
DAG via the
DagBuilder
. Subsagas themselves are constructed asDag
s and canget added with other
Node
s viaDagBuilder::append
andDagBuilder::append_parallel
. Subsaga parameters come from a parameter nodethat is output from the parent saga. When the top level saga is done being constructed
it is packaged up into a
SagaDag
.In order to enable fully dynamic sagas and subsagas, an
ActionRegistry
wascreated, where all actions across all sagas are registered.
Dag
s refer tothese actions by
ActionName
. This allows aSagaDag
to be serialized,and when deserialized, run arbitrary Rust action code without having to couple
the structure of the DAG to that rust code as in the prior template driven
design.
There are many other goodies sprinkled throughout, including better tests and
validation. See the comments on this PR for details.