-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explaining the state machine used by FlytePropeller and Flyte #903
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,78 @@ | ||||||
.. _divedeep-state-machine: | ||||||
|
||||||
################################################# | ||||||
Understanding the state transition of a workflow | ||||||
################################################# | ||||||
|
||||||
High Level state transition of a Workflow | ||||||
========================================== | ||||||
|
||||||
.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBSZWFkeVxuICAgIFJlYWR5IC0tPiBSdW5uaW5nXG4gICAgUnVubmluZyAtLT4gU3VjY2Vzc1xuXG4gICAgc3RhdGUgUnVubmluZyB7XG4gICAgICBbKl0gLS0-IE5vZGVRdWV1ZWRcbiAgICAgIE5vZGVRdWV1ZWQgLS0-IE5vZGVSdW5uaW5nXG4gICAgICBOb2RlUnVubmluZyAtLT4gTm9kZVN1Y2Nlc3NcblxuICAgICAgc3RhdGUgTm9kZVJ1bm5pbmcge1xuICAgICAgICBbKl0gLS0-IFRhc2tRdWV1ZWRcbiAgICAgICAgVGFza1F1ZXVlZCAtLT4gVGFza1J1bm5pbmdcbiAgICAgICAgVGFza1J1bm5pbmcgLS0-IFRhc2tTdWNjZXNzXG4gICAgICB9XG4gICAgfVxuXG4iLCJtZXJtYWlkIjp7fSwidXBkYXRlRWRpdG9yIjpmYWxzZX0 | ||||||
:alt: Happy case for a workflow with one node and one task. | ||||||
|
||||||
This State diagram illustrates an extremely high level, simplistic view of the state transitions that a Workflow, with a single node and one task will go through as the observer observes success. | ||||||
|
||||||
The following section, explain in detail what are the various observable (and some hidden) states for a workflow, node and tasks state transitions. | ||||||
|
||||||
|
||||||
Workflow States | ||||||
================ | ||||||
|
||||||
.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBBYm9ydGVkIDogT24gc3lzdGVtIGVycm9ycyBtb3JlIHRoYW4gdGhyZXNob2xkXG4gICAgWypdIC0tPiBSZWFkeVxuICAgIFJlYWR5IC0tPiBSdW5uaW5nIDogV3JpdGUgaW5wdXRzIHRvIHdvcmtmbG93XG4gICAgUnVubmluZyAtLT4gUnVubmluZyA6IE9uIHN5c3RlbSBlcnJvclxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBhbGwgTm9kZXMgU3VjY2Vzc1xuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IE9uIHN1Y2Nlc3NmdWwgZXZlbnQgc2VuZCB0byBBZG1pblxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBzeXN0ZW0gZXJyb3JcbiAgICBSZWFkeSAtLT4gRmFpbGluZyA6IE9uIHByZWNvbmRpdGlvbiBmYWlsdXJlXG4gICAgUnVubmluZyAtLT4gRmFpbGluZyA6IE9uIGFueSBOb2RlIEZhaWx1cmVcbiAgICBSZWFkeSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgUnVubmluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgU3VjY2VlZGluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG5cbiAgICBGYWlsaW5nIC0tPiBIYW5kbGVGYWlsdXJlTm9kZSA6IElmIEZhaWx1cmUgbm9kZSBleGlzdHNcbiAgICBGYWlsaW5nIC0tPiBBYm9ydGVkIDogT24gdXNlciBpbml0aWF0ZWQgYWJvcnRcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gRmFpbGVkIDogT24gY29tcGxldGluZyBmYWlsdXJlIG5vZGVcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgRmFpbGluZyAtLT4gRmFpbGVkIDogT24gc3VjY2Vzc2Z1bCBzZW5kIG9mIEZhaWx1cmUgbm9kZVxuICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ | ||||||
:alt: The State diagram above illustrates the various states through which a workflow transitions. This is the core Finite state machine of a Workflow. | ||||||
|
||||||
The State diagram above illustrates the various states through which a workflow transitions. This is the core Finite state machine of a Workflow. | ||||||
|
||||||
A Workflow always starts in the Ready State and ends either in Failed, Succeeded or Aborted state. | ||||||
For any system error within a state, causes a retry on that state. These retries are capped by system retries and will eventually lead to an Aborted state | ||||||
|
||||||
Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.workflowexecutionevent` | ||||||
|
||||||
The phases in the above state diagram are captured in the Admin database as specified here :std:ref:`api_enum_flyteidl.core.workflowexecution.phase` and are sent as part of the Execution Event. | ||||||
|
||||||
The state machine specification for the illustration can be found `here <https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBBYm9ydGVkIDogT24gc3lzdGVtIGVycm9ycyBtb3JlIHRoYW4gdGhyZXNob2xkXG4gICAgWypdIC0tPiBSZWFkeVxuICAgIFJlYWR5IC0tPiBSdW5uaW5nIDogV3JpdGUgaW5wdXRzIHRvIHdvcmtmbG93XG4gICAgUnVubmluZyAtLT4gUnVubmluZyA6IE9uIHN5c3RlbSBlcnJvclxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBhbGwgTm9kZXMgU3VjY2Vzc1xuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IE9uIHN1Y2Nlc3NmdWwgZXZlbnQgc2VuZCB0byBBZG1pblxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBzeXN0ZW0gZXJyb3JcbiAgICBSZWFkeSAtLT4gRmFpbGluZyA6IE9uIHByZWNvbmRpdGlvbiBmYWlsdXJlXG4gICAgUnVubmluZyAtLT4gRmFpbGluZyA6IE9uIGFueSBOb2RlIEZhaWx1cmVcbiAgICBSZWFkeSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgUnVubmluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgU3VjY2VlZGluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG5cbiAgICBGYWlsaW5nIC0tPiBIYW5kbGVGYWlsdXJlTm9kZSA6IElmIEZhaWx1cmUgbm9kZSBleGlzdHNcbiAgICBGYWlsaW5nIC0tPiBBYm9ydGVkIDogT24gdXNlciBpbml0aWF0ZWQgYWJvcnRcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gRmFpbGVkIDogT24gY29tcGxldGluZyBmYWlsdXJlIG5vZGVcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgRmFpbGluZyAtLT4gRmFpbGVkIDogT24gc3VjY2Vzc2Z1bCBzZW5kIG9mIEZhaWx1cmUgbm9kZVxuICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ>`_ | ||||||
|
||||||
|
||||||
Node States | ||||||
================ | ||||||
|
||||||
.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RZZXRTdGFydGVkXG4gICAgWypdIC0tPiBBYm9ydGVkIDogV2lsbCBzdG9wIHRoZSBub2RlIGV4ZWN1dGlvblxuICAgIE5vdFlldFN0YXJ0ZWQgLS0-IFF1ZXVlZCA6IElmIGFsbCB1cHN0cmVhbSBub2RlcyBhcmUgcmVhZHkgaS5lLCBpbnB1dHMgYXJlIHJlYWR5XG4gICAgTm90WWV0U3RhcnRlZCAtLT4gU2tpcHBlZCA6IElmIHRoZSBicmFuY2ggd2FzIG5vdCB0YWtlblxuICAgIFF1ZXVlZCAtLT4gUnVubmluZyA6IFN0YXJ0IHRhc2sgZXhlY3V0aW9uIC0gYXR0ZW1wdCAwXG4gICAgUnVubmluZyAtLT4gVGltaW5nT3V0IDogSWYgdGFzayB0aW1lb3V0IGhhcyBlbGFwc2VkIGFuZCByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIFRpbWluZ091dCAtLT4gVGltZWRPdXQgOiBJdCB0b3RhbCBub2RlIHRpbWVvdXQgaGFzIGVsYXBzZWRcbiAgICBSdW5uaW5nIC0tPiBSZXRyeWFibGVGYWlsdXJlIDogSWYgcmV0cnlfYXR0ZW1wdHMgPCBtYXhfcmV0cmllcyBhbmQgZmFpbHVyZSBvciB0aW1lb3V0XG4gICAgUmV0cnlhYmxlRmFpbHVyZSAtLT4gUnVubmluZyA6IEFsd2F5c1xuICAgIFJldHJ5YWJsZUZhaWx1cmUgLS0-IEZhaWxpbmcgOiByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIEZhaWxpbmcgLS0-IEZhaWxlZFxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBJbnRlcm5hbCBzdGF0ZVxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IFVzZXIgb2JzZXJ2ZXMgdGhlIHRhc2sgYXMgc3VjY2VlZGVkXG4gICAgU3VjY2VlZGVkIC0tPiBbKl1cbiAgICBGYWlsZWQgLS0-IFsqXVxuIiwibWVybWFpZCI6e30sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually perhaps not swapped, but I'm confused by the 'Always' shouldn't that only be the case when retry_attempts + 1 < max_attempts There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i agree. i think i can improve this diagram |
||||||
:alt: The State diagram above illustrates the various states through which a Node transitions. this is the core FSM for a Node. | ||||||
|
||||||
The State diagram above illustrates the various states through which a Node transitions. this is the core FSM for a Node. | ||||||
From a user point of View a Workflow simply consists of Sequence of tasks. But to Flyte, internally creates a meta entity called a :std:ref:`api_msg_flyteidl.core.workflownode` | ||||||
|
||||||
Once a Workflow enters a ``Running`` state, it triggers the phantom ``start node`` of the workflow. The Start node is always the entry node of any workflow. The start node starts executing all its child-nodes using a modified DepthFirst Search algorithm recursively. | ||||||
|
||||||
Nodes can be of different types, as follows, but all the nodes traverse through the same transitions | ||||||
|
||||||
#. Start Node - Only exists during the execution and is not modeled in the core spec | ||||||
#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.tasknode` | ||||||
#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.branchnode` | ||||||
#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.workflownode` | ||||||
#. Dynamic node - which is just a task node that does not return outputs, but futures. | ||||||
#. End Node - only exists during the execution and is not modeled in the core spec | ||||||
|
||||||
Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.nodeexecutionevent` | ||||||
|
||||||
Every NodeExecutionEvent can have one of the :std:ref:`api_enum_flyteidl.core.nodeexecution.phase` | ||||||
|
||||||
.. note:: TODO add explanation for each phase | ||||||
|
||||||
The state machine specification for the illustration can be found `here <https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RZZXRTdGFydGVkXG4gICAgWypdIC0tPiBBYm9ydGVkIDogV2lsbCBzdG9wIHRoZSBub2RlIGV4ZWN1dGlvblxuICAgIE5vdFlldFN0YXJ0ZWQgLS0-IFF1ZXVlZCA6IElmIGFsbCB1cHN0cmVhbSBub2RlcyBhcmUgcmVhZHkgaS5lLCBpbnB1dHMgYXJlIHJlYWR5XG4gICAgTm90WWV0U3RhcnRlZCAtLT4gU2tpcHBlZCA6IElmIHRoZSBicmFuY2ggd2FzIG5vdCB0YWtlblxuICAgIFF1ZXVlZCAtLT4gUnVubmluZyA6IFN0YXJ0IHRhc2sgZXhlY3V0aW9uIC0gYXR0ZW1wdCAwXG4gICAgUnVubmluZyAtLT4gVGltaW5nT3V0IDogSWYgdGFzayB0aW1lb3V0IGhhcyBlbGFwc2VkIGFuZCByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIFRpbWluZ091dCAtLT4gVGltZWRPdXQgOiBJdCB0b3RhbCBub2RlIHRpbWVvdXQgaGFzIGVsYXBzZWRcbiAgICBSdW5uaW5nIC0tPiBSZXRyeWFibGVGYWlsdXJlIDogSWYgcmV0cnlfYXR0ZW1wdHMgPCBtYXhfcmV0cmllcyBhbmQgZmFpbHVyZSBvciB0aW1lb3V0XG4gICAgUmV0cnlhYmxlRmFpbHVyZSAtLT4gUnVubmluZyA6IEFsd2F5c1xuICAgIFJldHJ5YWJsZUZhaWx1cmUgLS0-IEZhaWxpbmcgOiByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIEZhaWxpbmcgLS0-IEZhaWxlZFxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBJbnRlcm5hbCBzdGF0ZVxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IFVzZXIgb2JzZXJ2ZXMgdGhlIHRhc2sgYXMgc3VjY2VlZGVkXG4gICAgU3VjY2VlZGVkIC0tPiBbKl1cbiAgICBGYWlsZWQgLS0-IFsqXVxuIiwibWVybWFpZCI6e30sInVwZGF0ZUVkaXRvciI6ZmFsc2V9>`_ | ||||||
|
||||||
Task States | ||||||
================ | ||||||
|
||||||
.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RSZWFkeVxuICAgIFsqXSAtLT4gQWJvcnRlZCA6IEFib3J0ZWQgYnkgTm9kZUhhbmRsZXIgLSB0aW1lb3V0cywgZXh0cmVuYWwgYWJvcnQsIGV0Y1xuICAgIE5vdFJlYWR5IC0tPiBXYWl0aW5nRm9yUmVzb3VyY2VzIDogQmxvY2tlZCBvbiByZXNvdXJjZSBxdW90YSBvciByZXNvdXJjZSBwb29sIChvcHRpb25hbClcbiAgICBXYWl0aW5nRm9yUmVzb3VyY2VzIC0tPiBRdWV1ZWQgOiBIYXMgYmVlbiBzdWJtaXR0ZWQsIGJ1dCBoYXMgbm90IHN0YXJ0ZWQgKG9wdGlvbmFsKVxuICAgIFF1ZXVlZCAtLT4gSW5pdGlhbGl6aW5nIDogUHJlc3RhcnQgaW5pdGlhbGl6YXRpb24gKG9wdGlvbmFsKVxuICAgIEluaXRpYWxpemluZyAtLT4gUnVubmluZyA6IEFjdHVhbCBleGVjdXRpb24gb2YgdXNlciBjb2RlIGhhcyBzdGFydGVkXG4gICAgUnVubmluZyAtLT4gU3VjY2VzcyA6IFN1Y2Nlc3NmdWwgZXhlY3V0aW9uXG4gICAgUnVubmluZyAtLT4gUmV0cnlhYmxlRmFpbHVyZSA6IEZhaWxlZCB3aXRoIGEgcmV0cnlhYmxlIGVycm9yXG4gICAgUnVubmluZyAtLT4gUGVybWFuZW50RmFpbHVyZSA6IFVucmVjb3ZlcmFibGUgZmFpbHVyZSwgd2lsbCBzdG9wIGFsbCBleGVjdXRpb25cbiAgICBTdWNjZXNzIC0tPiBbKl1cbiAgICBSZXRyeWFibGVGYWlsdXJlIC0tPiBbKl1cbiAgICBQZXJtYW5lbnRGYWlsdXJlIC0tPiBbKl1cbiIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ | ||||||
:alt: The State diagram above illustrates the various states through which a Task transitions. this is the core FSM for any Task in Flyte. | ||||||
|
||||||
The State diagram above illustrates the various states through which a Task transitions. | ||||||
|
||||||
Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.taskexecutionevent` | ||||||
|
||||||
Every NodeExecutionEvent can have one of the :std:ref:`api_enum_flyteidl.core.taskexecution.phase` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
.. note:: TODO add explanation for each phase | ||||||
|
||||||
The state machine specification for the illustration can be found `here <https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RSZWFkeVxuICAgIFsqXSAtLT4gQWJvcnRlZCA6IEFib3J0ZWQgYnkgTm9kZUhhbmRsZXIgLSB0aW1lb3V0cywgZXh0cmVuYWwgYWJvcnQsIGV0Y1xuICAgIE5vdFJlYWR5IC0tPiBXYWl0aW5nRm9yUmVzb3VyY2VzIDogQmxvY2tlZCBvbiByZXNvdXJjZSBxdW90YSBvciByZXNvdXJjZSBwb29sIChvcHRpb25hbClcbiAgICBXYWl0aW5nRm9yUmVzb3VyY2VzIC0tPiBRdWV1ZWQgOiBIYXMgYmVlbiBzdWJtaXR0ZWQsIGJ1dCBoYXMgbm90IHN0YXJ0ZWQgKG9wdGlvbmFsKVxuICAgIFF1ZXVlZCAtLT4gSW5pdGlhbGl6aW5nIDogUHJlc3RhcnQgaW5pdGlhbGl6YXRpb24gKG9wdGlvbmFsKVxuICAgIEluaXRpYWxpemluZyAtLT4gUnVubmluZyA6IEFjdHVhbCBleGVjdXRpb24gb2YgdXNlciBjb2RlIGhhcyBzdGFydGVkXG4gICAgUnVubmluZyAtLT4gU3VjY2VzcyA6IFN1Y2Nlc3NmdWwgZXhlY3V0aW9uXG4gICAgUnVubmluZyAtLT4gUmV0cnlhYmxlRmFpbHVyZSA6IEZhaWxlZCB3aXRoIGEgcmV0cnlhYmxlIGVycm9yXG4gICAgUnVubmluZyAtLT4gUGVybWFuZW50RmFpbHVyZSA6IFVucmVjb3ZlcmFibGUgZmFpbHVyZSwgd2lsbCBzdG9wIGFsbCBleGVjdXRpb25cbiAgICBTdWNjZXNzIC0tPiBbKl1cbiAgICBSZXRyeWFibGVGYWlsdXJlIC0tPiBbKl1cbiAgICBQZXJtYW5lbnRGYWlsdXJlIC0tPiBbKl1cbiIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samhita-alla that is the case - if retry-attempts is less than max, it actually causes a retryable failure and automatically retries