From 4b2f6808ead0517cca9426ef9e671ece468cbd21 Mon Sep 17 00:00:00 2001 From: Ketan Umare <16888709+kumare3@users.noreply.github.com> Date: Mon, 12 Apr 2021 11:50:11 -0700 Subject: [PATCH] Explaining the state machine used by FlytePropeller and Flyte (#903) * Explaining the state machine used by FlytePropeller and Flyte - this document helps to explain the various states a workflow, node and task transitions through. - TODO add a small table that helps users understand what a state in the UI represents Signed-off-by: Ketan Umare * updated docs (addressed comments) Signed-off-by: Ketan Umare * comments addressed - image for nodes improved Signed-off-by: Ketan Umare --- rsts/dive_deep/index.rst | 1 + rsts/dive_deep/state_machine.rst | 78 ++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) create mode 100644 rsts/dive_deep/state_machine.rst diff --git a/rsts/dive_deep/index.rst b/rsts/dive_deep/index.rst index 6e01c1dcff..9efea9094f 100644 --- a/rsts/dive_deep/index.rst +++ b/rsts/dive_deep/index.rst @@ -40,6 +40,7 @@ Execution Time Details :maxdepth: 1 executions + state_machine execution_timeline observability dynamic_spec diff --git a/rsts/dive_deep/state_machine.rst b/rsts/dive_deep/state_machine.rst new file mode 100644 index 0000000000..e04872db39 --- /dev/null +++ b/rsts/dive_deep/state_machine.rst @@ -0,0 +1,78 @@ +.. _divedeep-state-machine: + +################################################# +Understanding the State Transition in a workflow +################################################# + +High Level Overview of how a Workflow progresses to Success +============================================================ + +.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBSZWFkeVxuICAgIFJlYWR5IC0tPiBSdW5uaW5nXG4gICAgUnVubmluZyAtLT4gU3VjY2Vzc1xuXG4gICAgc3RhdGUgUnVubmluZyB7XG4gICAgICBbKl0gLS0-IE5vZGVRdWV1ZWRcbiAgICAgIE5vZGVRdWV1ZWQgLS0-IE5vZGVSdW5uaW5nXG4gICAgICBOb2RlUnVubmluZyAtLT4gTm9kZVN1Y2Nlc3NcblxuICAgICAgc3RhdGUgTm9kZVJ1bm5pbmcge1xuICAgICAgICBbKl0gLS0-IFRhc2tRdWV1ZWRcbiAgICAgICAgVGFza1F1ZXVlZCAtLT4gVGFza1J1bm5pbmdcbiAgICAgICAgVGFza1J1bm5pbmcgLS0-IFRhc2tTdWNjZXNzXG4gICAgICB9XG4gICAgfVxuXG4iLCJtZXJtYWlkIjp7fSwidXBkYXRlRWRpdG9yIjpmYWxzZX0 + :alt: Happy case for a workflow with one node and one task. + +This State diagram illustrates an extremely high level, simplistic view of the state transitions that a Workflow, with a single node and one task will go through as the observer observes success. + +The following section explains in detail the various observable (and some hidden) states for a workflow, node and tasks state transitions. + + +Workflow States +================ + +.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBBYm9ydGVkIDogT24gc3lzdGVtIGVycm9ycyBtb3JlIHRoYW4gdGhyZXNob2xkXG4gICAgWypdIC0tPiBSZWFkeVxuICAgIFJlYWR5IC0tPiBSdW5uaW5nIDogV3JpdGUgaW5wdXRzIHRvIHdvcmtmbG93XG4gICAgUnVubmluZyAtLT4gUnVubmluZyA6IE9uIHN5c3RlbSBlcnJvclxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBhbGwgTm9kZXMgU3VjY2Vzc1xuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IE9uIHN1Y2Nlc3NmdWwgZXZlbnQgc2VuZCB0byBBZG1pblxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRpbmcgOiBPbiBzeXN0ZW0gZXJyb3JcbiAgICBSZWFkeSAtLT4gRmFpbGluZyA6IE9uIHByZWNvbmRpdGlvbiBmYWlsdXJlXG4gICAgUnVubmluZyAtLT4gRmFpbGluZyA6IE9uIGFueSBOb2RlIEZhaWx1cmVcbiAgICBSZWFkeSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgUnVubmluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgU3VjY2VlZGluZyAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG5cbiAgICBGYWlsaW5nIC0tPiBIYW5kbGVGYWlsdXJlTm9kZSA6IElmIEZhaWx1cmUgbm9kZSBleGlzdHNcbiAgICBGYWlsaW5nIC0tPiBBYm9ydGVkIDogT24gdXNlciBpbml0aWF0ZWQgYWJvcnRcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gRmFpbGVkIDogT24gY29tcGxldGluZyBmYWlsdXJlIG5vZGVcbiAgICBIYW5kbGVGYWlsdXJlTm9kZSAtLT4gQWJvcnRlZCA6IE9uIHVzZXIgaW5pdGlhdGVkIGFib3J0XG4gICAgRmFpbGluZyAtLT4gRmFpbGVkIDogT24gc3VjY2Vzc2Z1bCBzZW5kIG9mIEZhaWx1cmUgbm9kZVxuICAgICIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ + :alt: The State diagram above illustrates the various states through which a workflow transitions. This is the core finite state machine (FSM) of a Workflow. + +The State diagram above illustrates the various states through which a Workflow transitions. This is the core Finite state machine of a Workflow. + +A Workflow always starts in the Ready State and ends either in Failed, Succeeded or Aborted state. +Any system error within a state causes a retry on that state. These retries are capped by system retries and will eventually lead to an Aborted state. + +Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.workflowexecutionevent` + +The phases in the above state diagram are captured in the Admin database as specified here :std:ref:`api_enum_flyteidl.core.workflowexecution.phase` and are sent as part of the Execution Event. + +The state machine specification for the illustration can be found `here `_ + + +Node States +================ + +.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RZZXRTdGFydGVkXG4gICAgWypdIC0tPiBBYm9ydGVkIDogV2lsbCBzdG9wIHRoZSBub2RlIGV4ZWN1dGlvblxuICAgIE5vdFlldFN0YXJ0ZWQgLS0-IFF1ZXVlZCA6IElmIGFsbCB1cHN0cmVhbSBub2RlcyBhcmUgcmVhZHkgaS5lLCBpbnB1dHMgYXJlIHJlYWR5XG4gICAgTm90WWV0U3RhcnRlZCAtLT4gU2tpcHBlZCA6IElmIHRoZSBicmFuY2ggd2FzIG5vdCB0YWtlblxuICAgIFF1ZXVlZCAtLT4gUnVubmluZyA6IFN0YXJ0IHRhc2sgZXhlY3V0aW9uIC0gYXR0ZW1wdCAwXG4gICAgUnVubmluZyAtLT4gVGltaW5nT3V0IDogSWYgdGFzayB0aW1lb3V0IGhhcyBlbGFwc2VkIGFuZCByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIFRpbWluZ091dCAtLT4gVGltZWRPdXQgOiBJdCB0b3RhbCBub2RlIHRpbWVvdXQgaGFzIGVsYXBzZWRcbiAgICBSdW5uaW5nIC0tPiBSZXRyeWFibGVGYWlsdXJlIDogb24gcmV0cnlhYmxlIGZhaWx1cmVcbiAgICBSZXRyeWFibGVGYWlsdXJlIC0tPiBSdW5uaW5nIDogaWYgcmV0cnlfYXR0ZW1wdHMgPCBtYXhfcmV0cmllc1xuICAgIFJldHJ5YWJsZUZhaWx1cmUgLS0-IEZhaWxpbmcgOiByZXRyeV9hdHRlbXB0cyA-PSBtYXhfcmV0cmllc1xuICAgIEZhaWxpbmcgLS0-IEZhaWxlZFxuICAgIFJ1bm5pbmcgLS0-IFN1Y2NlZWRpbmcgOiBJbnRlcm5hbCBzdGF0ZVxuICAgIFN1Y2NlZWRpbmcgLS0-IFN1Y2NlZWRlZCA6IFVzZXIgb2JzZXJ2ZXMgdGhlIHRhc2sgYXMgc3VjY2VlZGVkXG4gICAgU3VjY2VlZGVkIC0tPiBbKl1cbiAgICBGYWlsZWQgLS0-IFsqXVxuIiwibWVybWFpZCI6e30sInVwZGF0ZUVkaXRvciI6ZmFsc2V9 + :alt: The State diagram above illustrates the various states through which a Node transitions. This is the core FSM for a Node. + +The state diagram above illustrates the various states through which a Node transitions. This is the core FSM for a Node. +From a user's point of view, a Workflow simply consists of a sequence of tasks. But to Flyte, a Workflow internally creates a meta entity called a + +Once a Workflow enters a ``Running`` state, it triggers the phantom ``start node`` of the workflow. The Start node is always the entry node of any workflow. The start node starts executing all its child-nodes using a modified DepthFirst Search algorithm recursively. + +Nodes can be of different types, as follows, but all the nodes traverse through the same transitions + +#. Start Node - Only exists during the execution and is not modeled in the core spec +#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.tasknode` +#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.branchnode` +#. :std:ref:`gen/pb-protodoc/flyteidl/core/workflow.proto:flyteidl.core.workflownode` +#. Dynamic node - which is just a task node that does not return outputs, but futures. +#. End Node - only exists during the execution and is not modeled in the core spec + +Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.nodeexecutionevent` + +Every NodeExecutionEvent can have one of the :std:ref:`api_enum_flyteidl.core.nodeexecution.phase` + +.. note:: TODO add explanation for each phase + +The state machine specification for the illustration can be found `here `_ + +Task States +================ + +.. image:: https://mermaid.ink/img/eyJjb2RlIjoic3RhdGVEaWFncmFtLXYyXG4gICAgWypdIC0tPiBOb3RSZWFkeVxuICAgIFsqXSAtLT4gQWJvcnRlZCA6IEFib3J0ZWQgYnkgTm9kZUhhbmRsZXIgLSB0aW1lb3V0cywgZXh0cmVuYWwgYWJvcnQsIGV0Y1xuICAgIE5vdFJlYWR5IC0tPiBXYWl0aW5nRm9yUmVzb3VyY2VzIDogQmxvY2tlZCBvbiByZXNvdXJjZSBxdW90YSBvciByZXNvdXJjZSBwb29sIChvcHRpb25hbClcbiAgICBXYWl0aW5nRm9yUmVzb3VyY2VzIC0tPiBRdWV1ZWQgOiBIYXMgYmVlbiBzdWJtaXR0ZWQsIGJ1dCBoYXMgbm90IHN0YXJ0ZWQgKG9wdGlvbmFsKVxuICAgIFF1ZXVlZCAtLT4gSW5pdGlhbGl6aW5nIDogUHJlc3RhcnQgaW5pdGlhbGl6YXRpb24gKG9wdGlvbmFsKVxuICAgIEluaXRpYWxpemluZyAtLT4gUnVubmluZyA6IEFjdHVhbCBleGVjdXRpb24gb2YgdXNlciBjb2RlIGhhcyBzdGFydGVkXG4gICAgUnVubmluZyAtLT4gU3VjY2VzcyA6IFN1Y2Nlc3NmdWwgZXhlY3V0aW9uXG4gICAgUnVubmluZyAtLT4gUmV0cnlhYmxlRmFpbHVyZSA6IEZhaWxlZCB3aXRoIGEgcmV0cnlhYmxlIGVycm9yXG4gICAgUnVubmluZyAtLT4gUGVybWFuZW50RmFpbHVyZSA6IFVucmVjb3ZlcmFibGUgZmFpbHVyZSwgd2lsbCBzdG9wIGFsbCBleGVjdXRpb25cbiAgICBTdWNjZXNzIC0tPiBbKl1cbiAgICBSZXRyeWFibGVGYWlsdXJlIC0tPiBbKl1cbiAgICBQZXJtYW5lbnRGYWlsdXJlIC0tPiBbKl1cbiIsIm1lcm1haWQiOnt9LCJ1cGRhdGVFZGl0b3IiOmZhbHNlfQ + :alt: The State diagram above illustrates the various states through which a Task transitions. This is the core FSM for any Task in Flyte. + +The State diagram above illustrates the various states through which a Task transitions. + +Every transition between states is recorded in Flyteadmin using :std:ref:`gen/pb-protodoc/flyteidl/event/event.proto:flyteidl.event.taskexecutionevent` + +Every TaskExecutionEvent can have one of the :std:ref:`api_enum_flyteidl.core.taskexecution.phase` + +.. note:: TODO add explanation for each phase + +The state machine specification for the illustration can be found `here `_