diff --git a/rsts/concepts/admin.rst b/rsts/concepts/admin.rst index ced73ccc76..6c8692aab4 100644 --- a/rsts/concepts/admin.rst +++ b/rsts/concepts/admin.rst @@ -7,10 +7,10 @@ FlyteAdmin Admin Structure =============== -FlyteAdmin serves as the main Flyte API to process all client requests to the system. Clients include the Flyte Console, which calls: +FlyteAdmin serves as the main Flyte API to process all client requests to the system. Clients include the FlyteConsole, which calls: 1. FlyteAdmin to list the workflows, get execution details, etc. -2. Flytekit that in turn calls FlyteAdmin to register, launch workflows, etc. +2. Flytekit, which in turn calls FlyteAdmin to register, launch workflows, etc. Below, we'll dive into each component defined in admin in more detail. @@ -19,7 +19,7 @@ RPC --- FlyteAdmin uses the `grpc-gateway `__ library to serve incoming gRPC and HTTP requests with identical handlers. -See the admin service :std:ref:`definition ` for a more detailed API overview, including request and response entities. +Refer to the admin service :std:ref:`definition ` for a detailed API overview, including request and response entities. The RPC handlers are thin shims that enforce request structure validation and call out to the appropriate :ref:`manager ` methods to process requests. You can find a detailed explanation of the service in the :ref:`admin service ` page. @@ -66,7 +66,7 @@ implementation. You can find the actual code for issuing queries with gorm in t Models ++++++ -Database models are defined in the `models `__ directory and correspond 1:1 with database tables [0]_. +Database models are defined in the `models `__ directory and correspond 1:1 with the database tables [0]_. The full set of database tables includes: @@ -99,7 +99,7 @@ Asynchronous Components Notifications and schedules are handled by async routines that are responsible for enqueuing and subsequently processing dequeued messages. FlyteAdmin uses the `gizmo toolkit `__ to abstract queueing implementation. Gizmo's -`pubsub `__ library offers implementations for Amazon SNS/SQS, Google's Pubsub, Kafka topics, and publishing over HTTP. +`pubsub `__ library offers implementations for Amazon SNS/SQS, Google Pubsub, Kafka topics, and publishing over HTTP. For the sandbox development, no-op implementations of the notifications and schedule handlers are used to remove external cloud dependencies. @@ -132,7 +132,7 @@ Values specific to the FlyteAdmin application, including task, workflow registra Workflow engine ---------------- -This directory contains interfaces to build and execute workflows leveraging FlytePropeller compiler and client components. +This directory contains the interfaces to build and execute workflows leveraging FlytePropeller compiler and client components. .. [0] Given the unique naming constraints, some models are redefined in `migration_models `__ to guarantee unique index values. @@ -170,7 +170,7 @@ that consists of a project, domain, name, and version specification. These entit version must be re-registered with a unique and new version identifier attribute. One caveat is that the launch plan can toggle between :std:ref:`ACTIVE and INACTIVE ` states. -At a given point in time, only one launch plan version across a shared project, domain and name specification can be active. The state affects the scheduled launch plans only. +At a given point in time, only one launch plan version across a shared {Project, Domain, Name} specification can be active. The state affects the scheduled launch plans only. An inactive launch plan can be used to launch individual executions. However, only an active launch plan runs on a schedule (given it has a schedule defined). @@ -184,8 +184,8 @@ The named entity also includes metadata, which are mutable attributes about the This metadata includes: -- Description: a human-readable description for the Named Entity collection -- State (workflows only): this determines whether the workflow is shown on the overview list of workflows scoped by project and domain +- Description: a human-readable description for the Named Entity collection. +- State (workflows only): this determines whether the workflow is shown on the overview list of workflows scoped by project and domain. Permitted operations include: @@ -210,7 +210,7 @@ Permitted operations include: - Get - List -After an execution begins, FlytePropeller monitors the execution and sends events which admin uses to update the above executions. +After an execution begins, FlytePropeller monitors the execution and sends the events which the admin uses to update the above executions. These :std:ref:`events ` include @@ -218,19 +218,18 @@ These :std:ref:`events ` inc - NodeExecutionEvent - TaskExecutionEvent -and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. +and contain information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. -These events are the **only** way to update an execution. No raw Update endpoint exists. +These events provide the **only** way to update an execution. No raw update endpoint exists. -To track the lifecycle of an execution admin, store attributes such as duration, timestamp at which an execution transitioned to running, and end time. +To track the lifecycle of an execution, admin and store attributes such as `duration` and `timestamp` at which an execution transitioned to running and end time are used. -For debug purposes admin also stores Workflow and Node execution events in its database, but does not currently expose them through an API. Because array tasks can yield very many executions, -admin does **not** store TaskExecutionEvents. +For debugging purposes, admin also stores Workflow and Node execution events in its database, but does not currently expose them through an API. Because array tasks can yield many executions, admin does **not** store TaskExecutionEvents. Platform entities +++++++++++++++++ -Projects: like named entities, projects have mutable metadata such as human-readable names and descriptions, in addition to their unique string ids. +Projects: Like named entities, projects have mutable metadata such as human-readable names and descriptions, in addition to their unique string ids. Permitted project operations include: @@ -292,7 +291,7 @@ For example, multiple filters would be appended to an http request like:: ?filters=ne(version, TheWorst)+eq(workflow.name, workflow) -Timestamp fields use the ``RFC3339Nano`` spec (e.g., "2006-01-02T15:04:05.999999999Z07:00") +Timestamp fields use the ``RFC3339Nano`` spec (For example: "2006-01-02T15:04:05.999999999Z07:00") The fully supported set of filter functions are @@ -339,12 +338,12 @@ Filterable fields vary based on entity types: - created_at - updated_at - workflows.{any workflow field above} (for example: workflow.domain) - - state (you must use the integer enum, e.g., 1) + - state (you must use the integer enum, for example: 1) - States are defined in :std:ref:`launchplanstate `. - Named Entity Metadata - - state (you must use the integer enum, e.g., 1) + - state (you must use the integer enum, for example: 1) - States are defined in :std:ref:`namedentitystate `. - Executions (Workflow executions) @@ -354,12 +353,12 @@ Filterable fields vary based on entity types: - name - workflow.{any workflow field above} (for example: workflow.domain) - launch_plan.{any launch plan field above} (for example: launch_plan.name) - - phase (you must use the upper-cased string name, e.g., ``RUNNING``) + - phase (you must use the upper-cased string name, for example: ``RUNNING``) - Phases are defined in :std:ref:`workflowexecution.phase `. - execution_created_at - execution_updated_at - duration (in seconds) - - mode (you must use the integer enum e.g., 1) + - mode (you must use the integer enum, for example: 1) - Modes are defined in :std:ref:`executionmode `. - user (authenticated user or role from flytekit config) @@ -367,7 +366,7 @@ Filterable fields vary based on entity types: - node_id - execution.{any execution field above} (for example: execution.domain) - - phase (you must use the upper-cased string name e.g., ``QUEUED``) + - phase (you must use the upper-cased string name, for example: ``QUEUED``) - Phases are defined in :std:ref:`nodeexecution.phase `. - started_at - node_execution_created_at @@ -380,7 +379,7 @@ Filterable fields vary based on entity types: - task.{any task field above} (for example: task.version) - execution.{any execution field above} (for example: execution.domain) - node_execution.{any node execution field above} (for example: node_execution.phase) - - phase (you must use the upper-cased string name e.g., ``SUCCEEDED``) + - phase (you must use the upper-cased string name, for example: ``SUCCEEDED``) - Phases are defined in :std:ref:`taskexecution.phase `. - started_at - task_execution_created_at @@ -390,7 +389,7 @@ Filterable fields vary based on entity types: Putting It All Together ----------------------- -If you wish to query specific executions that were launched using a specific launch plan for a workflow with specific attributes, you coul something like: +If you wish to query specific executions that were launched using a specific launch plan for a workflow with specific attributes, use: :: @@ -440,7 +439,7 @@ Only a subset of fields are supported for sorting list queries. The explicit lis - version - created_at - updated_at - - state (you must use the integer enum e.g., 1) + - state (you must use the integer enum, for example: 1) - States are defined in :std:ref:`launchplanstate `. - ListWorkflowIds @@ -453,19 +452,19 @@ Only a subset of fields are supported for sorting list queries. The explicit lis - project - domain - name - - phase (you must use the upper-cased string name e.g., ``RUNNING``) + - phase (you must use the upper-cased string name, for example: ``RUNNING``) - Phases are defined in :std:ref:`workflowexecution.phase `. - execution_created_at - execution_updated_at - duration (in seconds) - - mode (you must use the integer enum e.g., 1) + - mode (you must use the integer enum, for example: 1) - Modes are defined :std:ref:`execution.proto `. - ListNodeExecutions - node_id - retry_attempt - - phase (you must use the upper-cased string name e.g., ``QUEUED``) + - phase (you must use the upper-cased string name, for example: ``QUEUED``) - Phases are defined in :std:ref:`nodeexecution.phase `. - started_at - node_execution_created_at @@ -475,7 +474,7 @@ Only a subset of fields are supported for sorting list queries. The explicit lis - ListTaskExecutions - retry_attempt - - phase (you must use the upper-cased string name e.g., ``SUCCEEDED``) + - phase (you must use the upper-cased string name, for example: ``SUCCEEDED``) - Phases are defined in :std:ref:`taskexecution.phase `. - started_at - task_execution_created_at @@ -485,7 +484,7 @@ Only a subset of fields are supported for sorting list queries. The explicit lis Sorting syntax -------------- -Adding sorting to a request requires specifying the ``key``, e.g., the attribute you wish to sort on. Sorting can also optionally specify the direction (one of ``ASCENDING`` or ``DESCENDING``) where ``DESCENDING`` is the default. +Adding sorting to a request requires specifying the ``key``. For example: The attribute you wish to sort on. Sorting can also optionally specify the direction (one of ``ASCENDING`` or ``DESCENDING``) where ``DESCENDING`` is the default. Example sorting HTTP parameter: diff --git a/rsts/concepts/architecture.rst b/rsts/concepts/architecture.rst index 92e6c67ac9..edbeca132c 100644 --- a/rsts/concepts/architecture.rst +++ b/rsts/concepts/architecture.rst @@ -4,16 +4,16 @@ Component Architecture ###################### -This document aims to demystify how Flyte's major components ``FlyteIDL``, ``FlyteKit``, ``FlyteCLI``, ``FlyteConsole``, ``FlyteAdmin``, ``FlytePropeller``, and ``FlytePlugins`` fit together at a high level. +This document aims to demystify how Flyte's major components ``Flyteidl``, ``Flytekit``, ``Flytectl``, ``FlyteConsole``, ``FlyteAdmin``, ``FlytePropeller``, and ``FlytePlugins`` fit together at a high level. FlyteIDL ======== -In Flyte, entities like "Workflows", "Tasks", "Launch Plans", and "Schedules" are recognized by multiple system components. In order for components to communicate effectively, they need a shared understanding about the structure of these entities. +In Flyte, entities like "Workflows", "Tasks", "Launch Plans", and "Schedules" are recognized by multiple system components. For components to communicate effectively, they need a shared understanding about the structure of these entities. -The Flyte IDL (Interface Definition Language) is where shared Flyte entities are defined. This IDL also defines the RPC service definition for the :std:ref:`core Flyte API `. +Flyteidl (Interface Definition Language) is where shared Flyte entities are defined. It also defines the RPC service definition for the :std:ref:`core Flyte API `. -FlyteIDL uses the `protobuf `_ schema to describe entities. Clients are generated for Python, Golang, and JavaScript and imported by Flyte components. +Flyteidl uses the `protobuf `_ schema to describe entities. Clients are generated for Python, Golang, and JavaScript and imported by Flyte components. Planes @@ -23,7 +23,7 @@ Flyte components are separated into 3 logical planes. The planes are summarized +-------------------+---------------------------------------------------------------------------------------------------------------+ | **User Plane** | The User Plane consists of all user tools that assist in interacting with the core Flyte API. | -| | These tools include the FlyteConsole, FlyteKit, and FlyteCLI. | +| | These tools include the FlyteConsole, Flytekit, and Flytectl. | +-------------------+---------------------------------------------------------------------------------------------------------------+ | **Control Plane** | The Control Plane implements the core Flyte API. | | | It serves all client requests coming from the User Plane. | @@ -39,30 +39,30 @@ Flyte components are separated into 3 logical planes. The planes are summarized User Plane ---------- -In Flyte, workflows are represented as a Directed Acyclic Graph (DAG) of tasks. While this representation is logical for services, managing workflow DAGs in this format is a tedious exercise for humans. The Flyte User Plane provides tools to create, manage, and visualize workflows in a format that is easily digestible to users. +In Flyte, workflows are represented as a Directed Acyclic Graph (DAG) of tasks. While this representation is logical for services, managing workflow DAGs in this format is a tedious exercise for humans. The Flyte User Plane provides tools to create, manage, and visualize workflows in a format that is easily digestible to the users. These tools include: -FlyteKit - FlyteKit is an SDK that helps users design new workflows using the Python programming language. FlyteKit can parse the python code, compile it into a valid Workflow DAG, and submit it to Flyte to be executed. +Flytekit + Flytekit is an SDK that helps users design new workflows using the Python programming language. It can parse the Python code, compile it into a valid Workflow DAG, and submit it to Flyte for execution. FlyteConsole - Flyte console provides the Web interface for Flyte. Users and administrators can use the console to view workflows, launch plans, schedules, tasks, and individual task executions. The console provides tools to visualize workflows, and surfaces relevant logs for debugging failed tasks. + FlyteConsole provides the Web interface for Flyte. Users and administrators can use the console to view workflows, launch plans, schedules, tasks, and individual task executions. The console provides tools to visualize workflows, and surfaces relevant logs for debugging failed tasks. -FlyteCLI - Flyte Command Line Interface provides interactive access to Flyte to launch and access Flyte workflows via terminal. +Flytectl + Flytectl provides interactive access to Flyte to launch and access workflows via terminal. Control Plane ------------- -The Control Plane supports the core REST/gRPC API defined in FlyteIDL. User Plane tools like FlyteConsole and FlyteKit contact the control plane on behalf of users to store and retrieve information. +The Control Plane supports the core REST/gRPC API defined in Flyteidl. User Plane tools like FlyteConsole and Flytekit contact the control plane on behalf of users to store and retrieve information. Currently, the entire control plane is handled by a single service called **FlyteAdmin**. -FlyteAdmin is stateless. It processes requests to create entities like Tasks, Workflows, and Schedules by persisting data in a relational database. +FlyteAdmin is stateless. It processes requests to create entities like tasks, workflows, and schedules by persisting data in a relational database. -While FlyteAdmin serves the Workflow Exeuction API, it does not, itself, execute workflows. To launch workflow executions, FlyteAdmin sends the workflow DAG off to the DataPlane. For added scalability and fault-tolerance, FlyteAdmin can be configured to load-balance workflows across multiple isolated data-plane clusters. +While FlyteAdmin serves the Workflow Exeuction API, it does not itself execute workflows. To launch workflow executions, FlyteAdmin sends the workflow DAG to the DataPlane. For added scalability and fault-tolerance, FlyteAdmin can be configured to load-balance workflows across multiple isolated data-plane clusters. Data Plane @@ -70,26 +70,26 @@ Data Plane The Data Plane is the engine that accepts DAGs, and fulfills workflow executions by launching tasks in the order defined by the graph. Requests to the Data Plane generally come via the control plane, and not from end-users. -In order to support compute-intensive workflows at massive scale, the Data Plane needs to launch containers on a cluster of machines. The current implementation leverages `kubernetes `_ for cluster management. +In order to support compute-intensive workflows at massive scale, the Data Plane needs to launch containers on a cluster of machines. The current implementation leverages `Kubernetes `_ for cluster management. -Unlike the user-facing control-plane, the Data Plane does not expose a traditional REST/gRPC API. To launch an execution in the Data Plane, you create a “flyteworkflow” resource in kubernetes. -A “flyteworkflow” is a kubernetes `Custom Resource `_ (CRD) created by our team. This custom resource represents the flyte workflow DAG. +Unlike the user-facing Control Plane, the Data Plane does not expose a traditional REST/gRPC API. To launch an execution in the Data Plane, you create a “flyteworkflow” resource in Kubernetes. +A “flyteworkflow” is a Kubernetes `Custom Resource `_ (CRD) created by our team. This custom resource represents the Flyte workflow DAG. -The core state machine that processes flyteworkflows is worker we call **FlytePropeller**. +The core state machine that processes flyteworkflows is the worker known as **FlytePropeller**. -FlytePropeller leverages the kubernetes `operator pattern `_. It polls the kubernetes API, looking for newly created flyteworkflow resources. FlytePropeller understands the workflow DAG, and launches the appropriate kubernetes pods as needed to complete tasks. It periodically checks for completed tasks, launching downstream tasks until the workflow is complete. +FlytePropeller leverages the Kubernetes `operator pattern `_. It polls the Kubernetes API, looking for newly created flyteworkflow resources. FlytePropeller understands the workflow DAG, and launches the appropriate Kubernetes pods as needed to complete tasks. It periodically checks for completed tasks, launching downstream tasks until the workflow is complete. **Plugins** Each task in a flyteworkflow DAG has a specified **type**. The logic for fulfilling a task is determined by its task type. -In the most basic case, FlytePropeller launches a single kubernetes pod to fulfill a task. -More complex task types require workloads to be distributed across hundreds of pods. +In the basic case, FlytePropeller launches a single Kubernetes pod to fulfill a task. +Complex task types require workloads to be distributed across hundreds of pods. -The type-specific task logic is separated into isolated code modules that we call **plugins**. +The type-specific task logic is separated into isolated code modules known as **plugins**. Each task type has an associated plugin that is responsible for handling tasks of its type. For each task in a workflow, FlytePropeller activates the appropriate plugin based on the task type in order to fullfill the task. -The Flyte team has pre-built plugins for Hive, Spark, and AWS Batch, and more. +The Flyte team has pre-built plugins for Hive, Spark, AWS Batch, and :ref:`more `. To support new use-cases, developers can create their own plugins and bundle them in their FlytePropeller deployment. Component Code Architecture diff --git a/rsts/concepts/catalog.rst b/rsts/concepts/catalog.rst index d8ad168ce6..9c84a70184 100644 --- a/rsts/concepts/catalog.rst +++ b/rsts/concepts/catalog.rst @@ -3,13 +3,13 @@ What is Data Catalog? ===================== -`Data Catalog `__ is a service for indexing parameterized, strongly-typed data artifacts across revisions. It allows for clients to query artifacts based on meta information and tags. +`DataCatalog `__ is a service to index parameterized, strongly-typed data artifacts across revisions. It allows clients to query artifacts based on meta information and tags. How Flyte Memoizes Task Executions on Data Catalog -------------------------------------------------- -Flyte ``memoizes task executions`` by creating artifacts in Data Catalog and associating meta information regarding the execution with the artifact. Let's walk through what happens when a task execution is cached on Data Catalog. +Flyte `memoizes task executions` by creating artifacts in DataCatalog and associating meta information regarding the execution with the artifact. Let's walk through what happens when a task execution is cached on DataCatalog. Every task instance is represented as a DataSet: @@ -38,7 +38,7 @@ Every task execution is represented as an Artifact in the Dataset above: value: } -To retrieve the Artifact, we tag the Artifact with a hash of the input values for the memoized task execution. +To retrieve the Artifact, tag the Artifact with a hash of the input values for the memoized task execution: .. code-block:: javascript @@ -52,10 +52,10 @@ When caching an execution, FlytePropeller will: 2. Create an artifact that represents the execution, along with the artifact data that represents the execution output. 3. Tag the artifact with a unique hash of the input values. -When checking to see if the task execution is memoized, Flyte Propeller will: +To ensure that the task execution is memoized, Flyte Propeller will: 1. Compute the tag by computing the hash of the input. 2. Check if a tagged artifact exists with that hash. - a. If it does, we have a cache hit and the Propeller can skip the task execution. - b. If an artifact is not associated with the tag, Flyte Propeller needs to run the task. + - If it exists, we have a cache hit and the Propeller can skip the task execution. + - If an artifact is not associated with the tag, Propeller needs to run the task. diff --git a/rsts/concepts/component_architecture/flytepropeller_architecture.rst b/rsts/concepts/component_architecture/flytepropeller_architecture.rst index 34e676e939..a21b29d5fc 100644 --- a/rsts/concepts/component_architecture/flytepropeller_architecture.rst +++ b/rsts/concepts/component_architecture/flytepropeller_architecture.rst @@ -4,14 +4,21 @@ FlytePropeller Architecture ########################### -Note: In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition. +.. note:: + In the frame of this document, we use the term “workflow” to describe the single execution of a workflow definition. Introduction ============ -Flyte workflows are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other. +A Flyte :ref:`workflow ` is represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. +- ``TaskNodes`` support a plugin system to externally add system integrations. +- Control flow can be altered during runtime using ``BranchNodes``, which prune downstream evaluation paths based on input. +- ``DynamicNodes`` add nodes to the DAG. +- ``WorkflowNodes`` allow embedding workflows within each other. -FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a k8s controller and adheres to established k8s design principles. In this scheme, resources are periodically evaluated and the goal is transition from the observed to a requested state. In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. +FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a K8s controller and adheres to the established K8s design principles. In this scheme, resources are periodically evaluated and the goal is to transition from the observed state to a requested state. + +In our case, workflows are the resources and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. This document attempts to break down the FlytePropeller architecture by tracking workflow life cycle through each internal component. Below is a high-level illustration of the FlytePropeller architecture and a flow chart of each component's responsibilities during FlyteWorkflow execution. @@ -20,48 +27,48 @@ This document attempts to break down the FlytePropeller architecture by tracking Components ========== -FlyteWorkflow CRD / k8s Integration +FlyteWorkflow CRD / K8s Integration ----------------------------------- -Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc. +Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node/task phases, status/phase transition timestamps, etc. -K8s exposes a powerful controller / operator API enabling entities to track creation / updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created or updated the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. +K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. -WorkQueue / WorkerPool +WorkQueue/WorkerPool ---------------------- FlytePropeller supports concurrent execution of multiple, unique workflows using a WorkQueue and WorkerPool. -The WorkQueue is a FIFO queue storing workflow ID strings, which then require a lookup to retrieve the FlyteWorkflow CRD to ensure up-to-date status. A workflow may be added to the queue in a variety of circumstances: +The WorkQueue is a FIFO queue storing workflow ID strings that require a lookup to retrieve the FlyteWorkflow CRD to ensure up-to-date status. A workflow may be added to the queue in a variety of circumstances: #. A new FlyteWorkflow CRD is created or an existing instance is updated -#. The k8s Informer resyncs the FlyteWorkflow periodically (necessary to detect workflow timeouts and ensure liveness) +#. The K8s Informer resyncs the FlyteWorkflow periodically (necessary to detect workflow timeouts and ensure liveness) #. A FlytePropeller worker experiences an error during a processing loop #. The WorkflowExecutor observes a completed downstream node -#. A NodeHandler observes state change and explicitly enqueues its owner (e.x. k8s pod informer observes completion of a task) +#. A NodeHandler observes state change and explicitly enqueues its owner (For example, K8s pod informer observes completion of a task) -The WorkerPool is implemented as a collection of goroutines, one for each worker. Using this lightweight construct FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor). +The WorkerPool is implemented as a collection of goroutines, one for each worker. Using this lightweight construct, FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor). WorkflowExecutor ---------------- -The WorkflowExecutor is unsurprisingly responsible for handling high-level workflow operations. This includes maintaining the workflow phase (e.x. running, failing, succeeded, etc) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD may be deleted. Additionally, at the conclusion of each evaluation round the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations. +The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (for example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track the status between evaluation iterations. NodeExecutor ------------ The NodeExecutor is executed on a single node, beginning with the workflow's start node. It traverses the workflow using a visitor pattern with a modified depth-first search (DFS), evaluating each node along the path. A few examples of node evaluation based on phase: successful nodes are skipped, unevaluated nodes are queued for processing, and failed nodes may be reattempted up to a configurable threshold. There are many configurable parameters to tune evaluation criteria including max parallelism which restricts the number of nodes which may be scheduled concurrently. Additionally, nodes may be retried to ensure recoverability on failure. -The NodeExecutor is also responsible for linking data readers / writers to facilitate data transfer between node executions. The data transfer process occurs automatically within Flyte, using efficient k8s events rather than a polling listener pattern which incurs more overhead. Relatively small data may be passed between nodes inline, but it is more common to pass data URLs to backing storage. A component of this is writing to and checking the data cache, which facilitates the reuse of previously completed evaluations. +The NodeExecutor is also responsible for linking data readers/writers to facilitate data transfer between node executions. The data transfer process occurs automatically within Flyte, using efficient K8s events rather than a polling listener pattern which incurs more overhead. Relatively small amounts of data may be passed between nodes inline, but it is more common to pass data URLs to backing storage. A component of this is writing to and checking the data cache, which facilitates the reuse of previously completed evaluations. NodeHandlers ------------ FlytePropeller includes a robust collection of NodeHandlers to support diverse evaluation of the workflow DAG: -* **TaskHandler (Plugins)**: These are responsible for executing plugin specific tasks. This may include contacting FlyteAdmin to schedule k8s pod to perform work, calling a web API to begin / track evaluation, and much more. The plugin paradigm exposes a very extensible interface for adding functionality to Flyte workflows. -* **DynamicHandler**: Flyte workflow CRDs are initialized using a DAG compiled during the registration process. The numerous benefits of this approach are beyond the scope of this document. However, there are situations where the complete DAG is unknown at compile time. For example, when executing a task on each value of an input list. Using Dynamic nodes a new DAG subgraph may be dynamically compiled during runtime and linked to the existing FlyteWorkflow CRD. -* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-workflow FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD. +* **TaskHandler (Plugins)**: These are responsible for executing plugin specific tasks. This may include contacting FlyteAdmin to schedule K8s pod to perform work, calling a web API to begin/track evaluation, and much more. The plugin paradigm exposes an extensible interface for adding functionality to Flyte workflows. +* **DynamicHandler**: Flyte workflow CRDs are initialized using a DAG compiled during the registration process. The numerous benefits of this approach are beyond the scope of this document. However, there are situations where the complete DAG is unknown at compile time. For example, when executing a task on each value of an input list. Using Dynamic nodes, a new DAG subgraph may be dynamically compiled during runtime and linked to the existing FlyteWorkflow CRD. +* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD. * **BranchHandler**: The branch handler allows the DAG to follow a specific control path based on input (or computed) values. * **Start / End Handlers**: These are dummy handlers which process input and output data and in turn transition start and end nodes to success. diff --git a/rsts/concepts/component_architecture/native_scheduler_architecture.rst b/rsts/concepts/component_architecture/native_scheduler_architecture.rst index 8f07007173..2c6d0f6608 100644 --- a/rsts/concepts/component_architecture/native_scheduler_architecture.rst +++ b/rsts/concepts/component_architecture/native_scheduler_architecture.rst @@ -6,7 +6,7 @@ Flyte Native Scheduler Architecture Introduction ============ -Any workflow engine needs functionality to support scheduled executions. Flyte fulfills this need using an in-built native scheduler, which allows the scheduling of fixed rate as well as cron based schedules. The workflow author specifies the schedule during the `launchplan creation `__ and `activates or deactivates `__ the schedule using the `admin API's `__ exposed for the launchplan. +Any workflow engine needs functionality to support scheduled executions. Flyte fulfills this using an in-built native scheduler, which schedules fixed rate and cron-based schedules. The workflow author specifies the schedule during the `launchplan creation `__ and `activates or deactivates `__ the schedule using the `admin APIs `__ exposed for the launch plan. Characteristics =============== @@ -15,7 +15,7 @@ Characteristics #. Standard `cron `__ support #. Independently scalable #. Small memory footprint -#. Schedules run as lightweight go routines +#. Schedules run as lightweight goroutines #. Fault tolerant and available #. Support in sandbox environment @@ -26,43 +26,43 @@ Components Schedule Management ------------------- -This component supports creation/activation and deactivation of schedules. Each schedule is tied to a launchplan and is versioned in a similar manner. The schedule is created or its state is changed to activated/deactivated whenever the `admin API `__ is invoked for it with `ACTIVE/INACTIVE state `__. This is done either through `flytectl `__ or through any other client calling the GRPC API. -The API is similar to that of a launchplan, which makes sure one schedule at most is active for a given launchplan. +This component supports creation/activation and deactivation of schedules. Each schedule is tied to a launch plan and is versioned in a similar manner. The schedule is created or its state is changed to activated/deactivated whenever the `admin API `__ is invoked for it with `ACTIVE/INACTIVE state `__. This is done either through `flytectl `__ or through any other client that calls the GRPC API. +The API is similar to a launchplan, ensuring that only one schedule is active for a given launchplan. Scheduler --------- -This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is minutes for scheduling through both cron and fixed rate schedulers. The scheduler would be running in one replica, two at the most during redeployment. Multiple replicas will just duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter in-flight go-routines launched by the scheduler. Thus the behavior of these executions is undefined. +This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight goroutines launched by the scheduler. Thus, the behavior of these executions is undefined. Snapshoter ********** -This component is responsible for writing the snapshot state of all schedules at a regular cadence to a persistent store. It uses a DB to store the GOB format of the snapshot, which is versioned. The snapshot is a map[string]time.Time, which stores a map of schedule names to their last execution times. During bootup the snapshot is bootstraped from the data store and loaded in the memory. The Scheduler uses this snapshot to schedule any missed schedules. +This component is responsible for writing the snapshot state of all schedules at a regular cadence to a persistent store. It uses a DB to store the GOB format of the snapshot, which is versioned. The snapshot is a map[string]time.Time, which stores a map of schedule names to their last execution times. During bootup, the snapshot is bootstrapped from the data store and loaded into memory. The Scheduler uses this snapshot to schedule any missed schedules. CatchupAll-System ***************** -This component runs at bootup and catches up all the schedules to the current time.Now(). New runs for the schedules are also sent to the admin in parallel. -But any failure in catching up is considered to be a hard failure and stops the scheduler. The rerun tries to catchup from the last snapshotted data. +This component runs at bootup and catches up all the schedules to current time, i.e., time.Now(). New runs for the schedules are sent to the admin in parallel. +Any failure in catching up is considered a hard failure and stops the scheduler. The rerun tries to catchup from the last snapshot of data. GOCronWrapper ************* -This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around the `following framework `__ for fixed rate and cron schedules and creates in-memory representation of the scheduled job functions. The scheduler provides the ability to schedule a function with scheduleTime parameters. This is useful to know once the scheduled function is invoked as to what scheduled time this invocation is for. This scheduler supports standard cron scheduling which has 5 `fields `__. It requires 5 entries representing: minute, hour, day of month, month and day of week, in that order. +This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around `this framework `__ for fixed rate and cron schedules that creates in-memory representation of the scheduled job functions. The scheduler schedules a function with scheduleTime parameters. When this scheduled function is invoked, the scheduleTime parameters provide the current schedule time used by the scheduler. This scheduler supports standard cron scheduling which has 5 `fields `__. It requires 5 entries representing ``minute``, ``hour``, ``day of month``, ``month`` and ``day of week``, in that order. Job Executor ************ -This component is responsible for sending the scheduled executions to flyteadmin. The job function accepts the scheduleTime and the schedule which is used for creating an execution request to the admin. Each job function is tied to the schedule, which is executed in separate go routine according the schedule cadence. +The job executor component is responsible for sending the scheduled executions to FlyteAdmin. The job function accepts ``scheduleTime`` and the schedule which is used to create an execution request to the admin. Each job function is tied to the schedule which is executed in a separate goroutine in accordance with the schedule cadence. Monitoring ---------- -The following metrics are published by the native scheduler for easier monitoring of the health of the system: +To monitor the system health, the following metrics are published by the native scheduler: -#. JobFuncPanicCounter : count of crashes of the job functions executed by the scheduler -#. JobScheduledFailedCounter : count of scheduling failures by the scheduler -#. CatchupErrCounter : count of unsuccessful attempts to catchup on the schedules -#. FailedExecutionCounter : count of unsuccessful attempts to fire executions of a schedule -#. SuccessfulExecutionCounter : count of successful attempts to fire executions of a schedule +#. JobFuncPanicCounter : count of crashes of the job functions executed by the scheduler. +#. JobScheduledFailedCounter : count of scheduling failures by the scheduler. +#. CatchupErrCounter : count of unsuccessful attempts to catchup on the schedules. +#. FailedExecutionCounter : count of unsuccessful attempts to fire executions of a schedule. +#. SuccessfulExecutionCounter : count of successful attempts to fire executions of a schedule. diff --git a/rsts/concepts/console.rst b/rsts/concepts/console.rst index d1302e3cdb..61eb4bb17d 100644 --- a/rsts/concepts/console.rst +++ b/rsts/concepts/console.rst @@ -1,21 +1,21 @@ .. _divedeep-console: ############# -Flyte Console +FlyteConsole ############# -This is the web UI for the Flyte platform. The results of running Flyte Console are displayed in this graph, explained below: +FlyteConsole is the web UI for the Flyte platform. Here's a video that dives into the graph UX: .. youtube:: 7YSc-QHk_Ec ********************* -Running flyteconsole +Running FlyteConsole ********************* ===================== Install Dependencies ===================== -Running flyteconsole locally requires `NodeJS `_ and +Running FlyteConsole locally requires `NodeJS `_ and `yarn `_. Once these are installed, all of the dependencies can be installed by running ``yarn`` in the project directory. @@ -26,7 +26,7 @@ Before we can run the server, we need to set up an environment variable or two. ``ADMIN_API_URL`` (default: `window.location.origin `_) -The Flyte console displays information fetched from the FlyteAdmin API. This +FlyteConsole displays information fetched from the FlyteAdmin API. This environment variable specifies the host prefix used in constructing API requests. .. NOTE:: @@ -36,15 +36,15 @@ environment variable specifies the host prefix used in constructing API requests This value will be combined with a suffix (such as ``/api/v1``) to construct the final URL used in an API request. -*Default Behavior* +**Default Behavior** -In most cases, ``flyteconsole`` will be hosted in the same cluster as the Admin +In most cases, ``FlyteConsole`` is hosted in the same cluster as the Admin API, meaning that the domain used to access the console is the same as that used to access the API. For this reason, if no value is set for ``ADMIN_API_URL``, the default behavior is to use the value of `window.location.origin`. -``BASE_URL`` (default: ``undefined``) +**``BASE_URL`` (default: ``undefined``)** This allows running the console at a prefix on the target host. This is necessary when hosting the API and console on the same domain (with prefixes of @@ -52,7 +52,7 @@ necessary when hosting the API and console on the same domain (with prefixes of usually not needed, so the default behavior is to run without a prefix. -``CORS_PROXY_PREFIX`` (default: ``/cors_proxy``) +**``CORS_PROXY_PREFIX`` (default: ``/cors_proxy``)** Sets the local endpoint for `CORS request proxying `_. @@ -74,9 +74,9 @@ Development Storybook ========== -This project has support for `Storybook `_. -Component stories live next to the components they test, in a ``__stories__`` -directory, with the filename pattern ``{Component}.stories.tsx``. +FlyteConsole uses `Storybook `__. +Component stories live next to the components they test in the ``__stories__`` +directory with the filename pattern ``{Component}.stories.tsx``. You can run storybook with ``npm run storybook``, and view the stories at http://localhost:9001. @@ -86,8 +86,8 @@ Protobuf and the Network tab Communication with the FlyteAdmin API is done using Protobuf as the request/response format. Protobuf is a binary format, which means looking at -responses in the Network tab won't be very helpful. To make debugging easier, -each network request is logged to the console with it's URL followed by the +responses in the Network tab won't be helpful. To make debugging easier, +each network request is logged to the console with its URL, followed by the decoded Protobuf payload. You must have debug output enabled (on by default in development) to see these messages. @@ -111,13 +111,12 @@ Admin API requests). CORS Proxying ============== -In the common hosting arrangement, all API requests will be to the same origin +In the common hosting arrangement, all API requests are made to the same origin serving the client application, making CORS unnecessary. For any requests which do not share the same ``origin`` value, the client application will route requests through a special endpoint on the NodeJS server. One example would be -hosting the Admin API on a different domain than the console. Another example is -when fetching execution data from external storage such as S3. This is done to -minimize the amount of extra configuration required for ingress to the Admin API +hosting the Admin API on a different domain than the console. Another example is fetching execution data from external storage such as S3. This is done to +minimize the extra configuration required for ingress to the Admin API and data storage, as well as to simplify local development of the console without the need to grant CORS access to ``localhost``. diff --git a/rsts/concepts/data_management.rst b/rsts/concepts/data_management.rst index e5a343757f..74520ddf63 100644 --- a/rsts/concepts/data_management.rst +++ b/rsts/concepts/data_management.rst @@ -38,7 +38,7 @@ All of them from Flyte's point of view are ``data``. The difference lies in how Flyte stores and passes each of these data items. For every task that receives input, Flyte sends an **Inputs Metadata** object, which contains all the primitive or simple scalar values inlined, but in the case of -complex, large objects, they are offloaded and the `Metadata` simply stores a reference to the object. In our example, ``m``, and ``n`` are inlined while +complex, large objects, they are offloaded and the `Metadata` simply stores a reference to the object. In our example, ``m`` and ``n`` are inlined while ``o`` and the output ``pd.DataFrame`` are offloaded to an object store, and their reference is captured in the metadata. `Flytekit TypeTransformers` make it possible to use complex objects as if they are available locally - just like persistent filehandles. But Flyte backend only deals with @@ -52,12 +52,12 @@ but can be accessed by users's container/tasks. Raw Data Prefix ~~~~~~~~~~~~~~~ -Every task can read/write its own data files. If ``FlyteFile``, or any natively supported type like ``pandas.DataFrame`` is used, Flyte will automatically offload and download +Every task can read/write its own data files. If ``FlyteFile`` or any natively supported type like ``pandas.DataFrame`` is used, Flyte will automatically offload and download data from the configured object-store paths. These paths are completely customizable per `LaunchPlan` or `Execution`. - The default Rawoutput path (prefix in an object store like S3/GCS) can be configured during registration as shown in :std:ref:`flytectl_register_files`. The argument ``--outputLocationPrefix`` allows us to set the destination directory for all the raw data produced. Flyte will create randomized folders in this path to store the data. -- To override the ``RawOutput`` path (prefix in an object store like S3/GCS), we can specify an alternate location when invoking a Flyte execution, as shown in the following screenshot of the LaunchForm in FlyteConsole: +- To override the ``RawOutput`` path (prefix in an object store like S3/GCS), you can specify an alternate location when invoking a Flyte execution, as shown in the following screenshot of the LaunchForm in FlyteConsole: .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/concepts/data_movement/launch_raw_output.png @@ -69,7 +69,7 @@ Metadata Metadata in Flyte is critical to enable the passing of data between tasks. It allows to perform in-memory computations for branches or send partial outputs from one task to another or compose outputs from multiple tasks into one input to be sent to a task. -Thus, metadata is restricted due to its omnipresence. Each `meta output` / `input` cannot be larger than 1MB. If you have `List[int]`, it cannot be larger than 1MB, considering the other input entities. In scenarios where large lists or strings need to be sent between tasks, file abstraction is preferred. +Thus, metadata is restricted due to its omnipresence. Each `meta output`/`input` cannot be larger than 1MB. If you have `List[int]`, it cannot be larger than 1MB, considering other input entities. In scenarios where large lists or strings need to be sent between tasks, file abstraction is preferred. ``LiteralType`` & Literals ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -149,7 +149,7 @@ One implementation of Flyte is the current workflow engine. The workflow engine is responsible for moving data from a previous task to the next task. As explained previously, Flyte only deals with Metadata and not the actual Raw data. The illustration below explains how data flows from engine to the task and how that is transferred between tasks. The medium to transfer the data can change, and will change in the future. -We could use faster metadata stores to speed up data movement or exploit locality. +We could use fast metadata stores to speed up data movement or exploit locality. Between Flytepropeller and Tasks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/rsts/concepts/domains.rst b/rsts/concepts/domains.rst index 6d151ca995..da2c22a35f 100644 --- a/rsts/concepts/domains.rst +++ b/rsts/concepts/domains.rst @@ -6,6 +6,6 @@ Domains Domains provide an abstraction to isolate resources and feature configuration for different deployment environments. -For example: At Lyft, we develop and deploy Flyte workflows in development, staging, and production. We configure Flyte domains with those names, and specify lower resource limits on the development and staging domains than production domains. +For example: We develop and deploy Flyte workflows in development, staging, and production. We configure Flyte domains with those names, and specify lower resource limits on the development and staging domains than production domains. We also use domains to disable launch plans and schedules from development and staging domains, since those features are typically meant for production deployments. \ No newline at end of file diff --git a/rsts/concepts/dynamic_spec.rst b/rsts/concepts/dynamic_spec.rst index 2bd7b84c7d..9621347e5a 100644 --- a/rsts/concepts/dynamic_spec.rst +++ b/rsts/concepts/dynamic_spec.rst @@ -3,8 +3,7 @@ Dynamic Job Spec ================ -A dynamic job spec is a subset of the full workflow spec that defines a set of tasks, workflows as well as -nodes and output bindindgs that control how the job should assemble its outputs. +A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows, nodes, and output bindings that control how the job should assemble its outputs. This spec is currently only supported as an intermediate step in running Dynamic Tasks. diff --git a/rsts/concepts/execution_timeline.rst b/rsts/concepts/execution_timeline.rst index 7db32f9b52..eae6884d00 100644 --- a/rsts/concepts/execution_timeline.rst +++ b/rsts/concepts/execution_timeline.rst @@ -21,18 +21,18 @@ The illustration above refers to a simple workflow, with 2 nodes N1 & N2. This c Acceptance Latency ==================== -Every workflow starts in the ``Acceptance`` phase. Acceptance refers to the time between FlyteAdmin receiving an execution request and Flyte Propeller evaluating the first round of workflow. +Every workflow starts in the ``Acceptance`` phase. Acceptance refers to the time between FlyteAdmin receiving an execution request and FlytePropeller evaluating the first round of workflow. Usually, within this phase, the K8s queuing latency is the largest contributor to latency where the overall acceptance latency of <5s is desirable. Transition Latency =================== -Transition latency refers to the time between successive node executions, i.e., between N1 & N2. For the first node ``N1`` this latency also encapsulates executing the start node. +Transition latency refers to the time between successive node executions, that is, between ``N1`` and ``N2``. For the first node ``N1``, this latency also encapsulates executing the start node. -Similarly, the last node also encapsulates executing end node as well. ``Start Node`` and ``End Node`` are capstones inserted to mark the beginning and end of the DAG. +Similarly, the last node also encapsulates executing end node. ``Start Node`` and ``End Node`` are capstones inserted to mark the beginning and end of the DAG. The latency involves time consumed to: -#. Gather outputs for a node after the node completes. +#. Gather outputs for a node after the node completes execution. #. Send an observation event to FlyteAdmin. Failing to do so will be regarded as an error and will be tried until it succeeds or system max retries are exhausted (the number of max system retries is configured to be 30 by default and can be altered per deployment). #. Persist data to Kubernetes. #. Receive the persisted object back from Kubernetes (as this process is eventually consistent using informer caches). diff --git a/rsts/concepts/executions.rst b/rsts/concepts/executions.rst index af727d91fc..712693bde1 100644 --- a/rsts/concepts/executions.rst +++ b/rsts/concepts/executions.rst @@ -5,12 +5,10 @@ Executions ########## **Executions** are instances of workflows, nodes or tasks created in the system as a result of a user-requested execution or a scheduled execution. -Typical Flow Using FlyteCTL +Typical Flow Using Flytectl --------------------------- -* When an execution of a workflow is triggered using UI/FlyteCTL/other stateless systems, the system first calls the - ``getLaunchPlan`` endpoint and retrieves a launch plan matching the given version. - The launch plan definition includes definitions of all input variables declared for the workflow. +* When an execution of a workflow is triggered using UI/Flytecli/other stateless systems, the system first calls the ``getLaunchPlan`` endpoint and retrieves a launch plan matching the given version. The launch plan definition includes definitions of all input variables declared for the workflow. * The user-side component then ensures that all the required inputs are supplied and requests the FlyteAdmin service for an execution. * The FlyteAdmin service validates the inputs, ensuring that they are all specified and, if required, within the declared bounds. * FlyteAdmin then fetches the previously validated and compiled workflow closure and translates it to an executable format with all the inputs. diff --git a/rsts/concepts/flyte_console.rst b/rsts/concepts/flyte_console.rst index b805b8091d..adef153bb1 100644 --- a/rsts/concepts/flyte_console.rst +++ b/rsts/concepts/flyte_console.rst @@ -7,8 +7,8 @@ Flyte UI is a web-based user interface for Flyte. It helps interact with Flyte o With Flyte UI, you can: -* Launch Workflows -* Launch Tasks +* Launch tasks +* Launch workflows * View Versioned Tasks and Workflows * Trigger Versioned Tasks and Workflows * Inspect Executions through Inputs, Outputs, Logs, and Graphs @@ -17,7 +17,7 @@ With Flyte UI, you can: * Recover Executions .. note:: - `Flyte Console `__ hosts the Flyte user interface code. + `FlyteConsole `__ hosts the Flyte user interface code. Launching Workflows ------------------- @@ -74,8 +74,7 @@ The UI should be accessible at http://localhost:30081/console. | -A pop-up window appears with input fields that the task requires and the role with which the task has to run -on clicking the **Launch Task** button. +A pop-up window appears with input fields that the task requires and the role with which the task has to run on clicking the **Launch Task** button. | @@ -148,7 +147,7 @@ Logs are accessible as well. Every execution has two views: Nodes and Graph. -A node in the nodes view encapsulates an instance of a task, but it can also contain an entire subworkflow or trigger a child workflow. +A node in the nodes view encapsulates an instance of a task, but it can also contain an entire subworkflow or trigger an external workflow. More about nodes can be found in :std:ref:`divedeep-nodes`. | @@ -174,7 +173,7 @@ Graph view showcases a static DAG. Cloning Executions ------------------ -An execution in the RUNNING state can be cloned. +An execution in the ``RUNNING`` state can be cloned. Click on the ellipsis on the top right corner of the UI. diff --git a/rsts/concepts/launchplans.rst b/rsts/concepts/launchplans.rst index d60e176fc5..fb0d41397b 100644 --- a/rsts/concepts/launchplans.rst +++ b/rsts/concepts/launchplans.rst @@ -15,7 +15,7 @@ See `here `, but it can also contain an entire subworkflow or Nodes can have inputs and outputs, which are used to coordinate task inputs and outputs. Moreover, node outputs can be used as inputs to other nodes within a workflow. -Tasks are always encapsulated within a node, however, like tasks, nodes can come in a variety of flavors determined by their *target*. +Tasks are always encapsulated within a node. Like tasks, nodes can come in a variety of flavors determined by their *target*. These targets include :ref:`task nodes `, :ref:`workflow nodes `, and :ref:`branch nodes `. .. _divedeep-task-nodes: diff --git a/rsts/concepts/projects.rst b/rsts/concepts/projects.rst index 83f94fa861..db7073e554 100644 --- a/rsts/concepts/projects.rst +++ b/rsts/concepts/projects.rst @@ -2,7 +2,7 @@ Projects ======== -A project in Flyte is a grouping of :ref:`workflows ` and :ref:`tasks ` to achieve a particular goal. +A project in Flyte is a group of :ref:`workflows ` and :ref:`tasks ` tied together to achieve a goal. A Flyte project can map to an engineering project or everything that's owned by a team or an individual. There cannot be multiple projects with the same name in Flyte. diff --git a/rsts/concepts/schedules.rst b/rsts/concepts/schedules.rst index 9a8370cf2d..8d7dd363de 100644 --- a/rsts/concepts/schedules.rst +++ b/rsts/concepts/schedules.rst @@ -4,9 +4,9 @@ Schedules ========== Workflows can be run automatically using :ref:`schedules ` associated with launch plans. -At most, only one launch plan version for a given {Project, Domain, Name} combination can be active, which means, at most, only one schedule can be active for a launch plan. This is because only one active schedule can exist across all versions of the launch plan. +Only one launch plan version for a given {Project, Domain, Name} combination can be active, which means only one schedule can be active for a launch plan. This is because a single active schedule can exist across all versions of the launch plan. -To clarify, a :ref:`workflow ` version can have multiple schedules associated with it, given that these schedules exist as versions of different launch plans. +A :ref:`workflow ` version can have multiple schedules associated with it, given that these schedules exist as versions of different launch plans. Creating a new schedule creates a new version of the launch plan. If you wish to change a schedule, you will have to create a new version of that launch plan since a **schedule cannot be edited**. diff --git a/rsts/concepts/state_machine.rst b/rsts/concepts/state_machine.rst index c25a8e5bb5..e8b28f765f 100644 --- a/rsts/concepts/state_machine.rst +++ b/rsts/concepts/state_machine.rst @@ -28,7 +28,7 @@ High Level Overview of How a Workflow Progresses to Success NodeSuccess --> Success -This state diagram illustrates a very high-level, simplistic view of the state transitions that a workflow with a single task and node would go through as the observer observes success. +This state diagram illustrates a high-level, simplistic view of the state transitions that a workflow with a single task and node would go through as the user observes success. The following sections explain the various observable (and some hidden) states for workflow, node, and task state transitions. @@ -58,26 +58,7 @@ Workflow States Aborting--> |On successful event send to Admin| Aborted A workflow always starts in the ``Ready`` state and ends either in ``Failed``, ``Succeeded``, or ``Aborted`` state. -Any system error within a state causes a retry on that state. These retries are capped by **system retries** which eventually lead to an ``Aborted`` state if the failure persists. - -.. note:: - System retry can be of two types: - - - **Downstream System Retry**: When a downstream system (or service) fails, or remote service is not contactable, - the failure is retried against the number of retries set - `here `__. - This performs end-to-end system retry against the node whenever the task fails with a system error. This is useful when the downstream - service throws a 500 error, abrupt network failure, etc. - - **Transient Failure Retry**: This retry mechanism offers resiliency against transient failures, which are opaque to the user. - It is tracked across the entire duration of execution. It helps Flyte entities and the additional services - connected to Flyte like S3, to continue operating despite a system failure. Indeed, all transient failures are handled gracefully - by Flyte! Moreover, in case of a transient failure retry, Flyte does not necessarily retry the entire task. “Retrying an entire - task” means that the entire pod associated with Flyte task would be rerun with a clean slate; instead, it just retries the atomic operation. - For example, Flyte tries to persist the state until it can, exhausts the max retries, and backs off. To set a transient failure - retry: - - - Update `MaxWorkflowRetries `__ in the propeller configuration - - Or update `max-workflow-retries `__ in helm +Any system error within a state causes a retry on that state. These retries are capped by :ref:`system retries ` which eventually lead to an ``Aborted`` state if the failure persists. Every transition between states is recorded in FlyteAdmin using :std:ref:`workflowexecutionevent `. @@ -85,6 +66,7 @@ The phases in the above state diagram are captured in the admin database as spec The state machine specification for the illustration can be found `here `__. + Node States =========== diff --git a/rsts/concepts/tasks.rst b/rsts/concepts/tasks.rst index de9a0ad77e..9bc4772dda 100644 --- a/rsts/concepts/tasks.rst +++ b/rsts/concepts/tasks.rst @@ -9,12 +9,12 @@ They are the fundamental building blocks and extension points that encapsulate t Characteristics --------------- -In general, a Flyte task is characterized by: +A Flyte task is characterized by: -1. A combination of :ref:`divedeep-projects` and :ref:`divedeep-domains`, +1. A combination of :ref:`projects ` and :ref:`domains `, 2. A unique unicode name (we recommend it not to exceed 32 characters), 3. A version string, and/or -4. *Optional* Task interface definition +4. *Optional* Task interface definition. For tasks to exchange data with each other, a task can define a signature (much like a function/method signature in programming languages). A task interface defines the input and output variables — @@ -24,15 +24,12 @@ In general, a Flyte task is characterized by: Can "X" Be a Flyte Task? ------------------------- -When deciding whether or not a unit of execution constitutes a Flyte task, consider the following: +When deciding if a unit of execution constitutes a Flyte task, consider these questions: -- Is there a well-defined graceful/successful exit criteria for the task? A task is expected to exit after finishing processing - its inputs. -- Is it repeatable? Under certain circumstances, a task might be retried, rerun, etc. with the same inputs. It's expected - to produce the same outputs every single time. For example, avoid using random number generators with current clock as seed - and instead use a system-provided clock as the seed. -- Is it a pure function, i.e., does it have side effects that are not known to the system (e.g. calls a web-service)? It's strongly - advisable to avoid side-effects in tasks. When side-effects are required, ensure that those operations are idempotent. +- Is there a well-defined graceful/successful exit criteria for the task? A task is expected to exit after completion of input processing. +- Is it repeatable? Under certain circumstances, a task might be retried, rerun, etc. with the same inputs. It is expected + to produce the same output every single time. For example, avoid using random number generators with current clock as seed. Use a system-provided clock as the seed instead. +- Is it a pure function, i.e., does it have side effects that are unknown to the system (calls a web-service)? It is recommended to avoid side-effects in tasks. When side-effects are evident, ensure that the operations are idempotent. Dynamic Tasks -------------- @@ -40,7 +37,7 @@ Dynamic Tasks "Dynamic tasks" is a misnomer. Flyte is one-of-a-kind workflow engine that ships with the concept of truly `Dynamic Workflows `__! Users can generate workflows in reaction to user inputs or computed values at runtime. -These executions are evaluated to generate a static graph, before execution. +These executions are evaluated to generate a static graph before execution. Extending Task --------------- @@ -48,25 +45,25 @@ Extending Task Plugins ^^^^^^^ -Flyte language exposes an extensible model to express tasks in an execution-independent language. -It contains first-class task plugins (e.g. `Papermill `__, -`Great Expectations `__, etc.) -that take care of executing the Flyte tasks. -Almost any action can be implemented and introduced into Flyte as a "Plugin". +Flyte exposes an extensible model to express tasks in an execution-independent language. +It contains first-class task plugins (for example: `Papermill `__, +`Great Expectations `__, and :ref:`more `.) +that execute the Flyte tasks. +Almost any action can be implemented and introduced into Flyte as a "Plugin", which includes: - Tasks that run queries on distributed data warehouses like Redshift, Hive, Snowflake, etc. - Tasks that run executions on compute engines like Spark, Flink, AWS Sagemaker, AWS Batch, Kubernetes pods, jobs, etc. - Tasks that call web services. -Flyte ships with some defaults; for example, running a simple Python function does not need any hosted service. Flyte knows how to -execute these kinds of tasks on Kubernetes. It turns out these are the vast majority of tasks in ML, and Flyte is adept at -handling an enormous scale on Kubernetes; this is achieved by implementing a unique scheduler on top of Kubernetes. +Flyte ships with certain defaults, for example, running a simple Python function does not need any hosted service. Flyte knows how to +execute these kinds of tasks on Kubernetes. It turns out these are the vast majority of tasks in machine learning, and Flyte is adept at +handling an enormous scale on Kubernetes. This is achieved by implementing a unique scheduler on Kubernetes. Types ^^^^^ -It is impossible to define the unit of execution of a task in the same way for all kinds of tasks. Hence, Flyte allows different task -types in the system. Flyte comes with a set of defined, battle-tested task types. It also allows for a very flexible model to +It is impossible to define the unit of execution of a task in the same way for all tasks. Hence, Flyte allows for different task +types in the system. Flyte has a set of defined, battle-tested task types. It allows for a flexible model to :std:ref:`define new types `. Inherent Features @@ -79,13 +76,41 @@ In any distributed system, failure is inevitable. Allowing users to design a fau At a high level, tasks offer two parameters to achieve fault tolerance: **Retries** - Tasks can define a retry strategy to let the system know how to handle failures (example: retry 3 times on any kind of error). + +Tasks can define a retry strategy to let the system know how to handle failures (For example: retry 3 times on any kind of error). + +There are two kinds of retries: + +1. System retry: It is a system-defined, recoverable failure that is used when system failures occur. The number of retries is validated against the number of system retries. + +.. _system-retry: + +System retry can be of two types: + +- **Downstream System Retry**: When a downstream system (or service) fails, or remote service is not contactable, the failure is retried against the number of retries set `here `__. This performs end-to-end system retry against the node whenever the task fails with a system error. This is useful when the downstream service throws a 500 error, abrupt network failure, etc. + +- **Transient Failure Retry**: This retry mechanism offers resiliency against transient failures, which are opaque to the user. It is tracked across the entire duration of execution. It helps Flyte entities and the additional services connected to Flyte like S3, to continue operating despite a system failure. Indeed, all transient failures are handled gracefully by Flyte! Moreover, in case of a transient failure retry, Flyte does not necessarily retry the entire task. “Retrying an entire task” means that the entire pod associated with the Flyte task would be rerun with a clean slate; instead, it just retries the atomic operation. For example, Flyte tries to persist the state until it can, exhausts the max retries, and backs off. + + To set a transient failure retry: + + - Update `MaxWorkflowRetries `__ in the propeller configuration. + + - Or update `max-workflow-retries `__ in helm. + +2. User retry: If a task fails to execute, it is retried for a specific number of times, and this number is set by the user in `TaskMetadata `__. The number of retries must be less than or equal to 10. + +.. note:: + + Recoverable vs. Non-Recoverable failures: Recoverable failures will be retried and counted against the task’s retry count. Non-recoverable failures will just fail, i.e., the task isn’t retried irrespective of user/system retry configurations. All user exceptions are considered non-recoverable unless the exception is a subclass of FlyteRecoverableException. + **Timeouts** - For the system to ensure it is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is also possible for task authors to define a timeout period, after which the task gets marked as failure. Note that a timed-out task will be retried if it has a retry strategy defined. + +To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata `__. + -Memoization -^^^^^^^^^^^ +Caching/Memoization +^^^^^^^^^^^^^^^^^^^ -Flyte supports memoization of task outputs to ensure that identical invocations of a task don't get executed repeatedly, wasting compute resources. -For more information on memoization, please refer to the :std:ref:`Caching Example `. +Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example, if you wish to run the same piece of code multiple times, you can re-use the output instead of re-computing it. +For more information on memoization, refer to the :std:ref:`Caching Example `. diff --git a/rsts/concepts/versioning.rst b/rsts/concepts/versioning.rst index 0c1b7781b7..1289296c8c 100644 --- a/rsts/concepts/versioning.rst +++ b/rsts/concepts/versioning.rst @@ -4,46 +4,44 @@ Versions ======== One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment. -When users experiment, they usually work in isolation and try multiple iterations. +When users experiment, they do so in isolation and try multiple iterations. Unlike traditional software, the users must conduct multiple experiments concurrently with different environments, algorithms, etc. -This may happen when different data scientists simultaneously iterate on the same workflow/pipeline. +This may happen when multiple data scientists simultaneously iterate on the same workflow/pipeline. -The cost of creating an independent infrastructure for each version is enormous and not desirable. -Moreover, it is desirable to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team, -while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams. +The cost of creating an independent infrastructure for each version is enormous and undesirable. +It is beneficial to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team, +while the users can use it independently. This improves the cost of operation since the same infrastructure can be reused by multiple teams. -Moreover, versioned workflows help users quickly reproduce prior results or identify the source of previous successful experiments. +Versioned workflows help users quickly reproduce prior results or identify the source of previous successful experiments. Why Do You Need Versioning? --------------------------- Versioning is required to: -- Work on the same project concurrently yet identify the version/experiment that was successful. -- Capture the environment for a version and independently launch this environment. +- Work on the same project concurrently and identify the version/experiment that was successful. +- Capture the environment for a version and independently launch it. - Visualize prior runs and tie them to experiment results. -- Easily and cleanly roll-back production deployments in case of failures. +- Rollback to production deployments in case of failures with ease. - Execute multiple experiments in production, which may use different training or data processing algorithms. - Understand how a specific system evolved and answer questions related to the effectiveness of a specific strategy. Operational Benefits of Completely Versioned Workflows/Pipelines ------------------------------------------------------------------- -Since the entire workflow in Flyte is completely versioned and all tasks and entities are immutable, it is possible to completely change -the structure of a workflow between versions, without worrying about consequences for the pipelines in production. This hermetic property makes it extremely -easy to manage and deploy new workflow versions. This is especially important for workflows that are long-running. Flyte guarantees, that if a workflow execution is in progress -and even if a new workflow version has been activated the execution using the old version, will continue unhindered. +The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. +This hermetic property makes it effortless to manage and deploy new workflow versions and is important for workflows that are long-running. +If a workflow execution is in progress and another new workflow version has been activated, Flyte guarantees that the execution of the old version continues unhindered. -The astute may question, but what if, I had a bug in the previous version and I want to just fix the bug and run all previous executions. -Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible -that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem. +Consider a scenario where you need to run all the previous executions if there's a bug to be fixed. +Simply fixing the bug in the task may not solve the problem. +Moreover, fixing bugs involves code changes, which may affect the workflow structure. +Flyte addresses this using two properties: -Flyte solves the above problem using 2 properties: +1. Since the entire workflow is versioned, changing the structure has no impact on the existing execution, and the workflow state won't be corrupted. +2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun the tasks. This strategy will work even if the workflow changes are in a task. -1. Since the workflow is completely versioned, changing the structure has no impact on an existing execution, and the workflow state will not be corrupted. -2. Flyte provides a concept of memoization. As long as the tasks have not changed and their behavior has not changed, it is possible to move them around and their previous outputs will be recovered, without having to rerun these tasks. And if the workflow changes were simply in a task this strategy will still work. - -Let us take an example of a workflow: +Let us take a sample workflow: .. mermaid:: @@ -70,29 +68,28 @@ The same ``cache=True`` will handle this complicated situation as well. Why Is Versioning Hard? ----------------------- -Git has become the defacto-standard in version control for code. Git makes it extremely easy to work on branches, merge them, and revert unwanted changes. -But achieving this for a live (running) algorithm usually needs the entire infrastructure to be associated and potentially re-created for every execution. +Git has become the defacto-standard in version control for code, making it easy to work on branches, merge them, and revert unwanted changes. +But achieving this for a live (running) algorithm usually requires the entire infrastructure to be associated and potentially re-created for every execution. How Is Versioning Tied to Reproducibility? ------------------------------------------ -Reproducibility is possible without explicit versioning within the workflow system. -To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that this code may have used (for example, TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). -It is also necessary to instantiate any infrastructure that the previous version may have used and, if not already recorded, ensure that the previously used dataset (say) can be reconstructed. -From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method. +Workflows can be reproduced without explicit versioning within the system. +To reproduce a past experiment, users need to identify the source code and resurrect any dependencies that the code may have used (for example, TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). +It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, you'll have to ensure that the previously used dataset (say) can be reconstructed. This is exactly how Flyte was conceived! -Every task is versioned, and Flyte precisely captures its dependency set. For external tasks, it is highly encouraged to use -memoization so that the constructed dataset is cached on the Flyte side, and hence, one can comfortably guarantee reproducible behavior from the external systems. +In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can be cached on the Flyte side. This way, one can guarantee reproducible behavior from the external systems. + Moreover, every piece of code is registered with the version of the code that was used to create the instance. -Users can therefore easily construct the lineage for all the parts of the workflow. +Therefore, users can easily construct the data lineage for all the parts of the workflow. What Is the Cost of Versioning & Reproducibility? ------------------------------------------------- One of the costs of versioning and allowing on-demand reproducibility is the need to re-instantiate the infrastructure from scratch. -This may sometimes cause additional overhead. However, the advent of Docker containers and Kubernetes has made it possible to build a platform to achieve these goals. +This may sometimes result in additional overhead. However, the advent of Docker containers and Kubernetes has made it possible to build a platform to achieve these goals. .. admonition:: Coming soon! diff --git a/rsts/concepts/workflows.rst b/rsts/concepts/workflows.rst index 886a43ef9d..71acbb4158 100644 --- a/rsts/concepts/workflows.rst +++ b/rsts/concepts/workflows.rst @@ -4,13 +4,13 @@ Workflows ========= A workflow is a directed acyclic graph (DAG) of units of work encapsulated by :ref:`nodes `. -Specific instantiations of a workflow (commonly with bound input arguments) are referred to as **workflow executions**, +Specific instantiations of a workflow (commonly bound with input arguments) are referred to as **workflow executions**, or just executions. In other words, a workflow is a template for an ordered task execution. -Flyte workflows are defined in ``protobuf`` and the Flytekit SDK facilitates writing workflows. Users can define workflows as a collection of nodes. -Nodes within a workflow can produce outputs that subsequent nodes consume as inputs. These dependencies dictate the workflow structure. +Flyte workflows are defined in ``protobuf`` and the flytekit SDK facilitates writing workflows. Users can define workflows as a collection of nodes. +Nodes within a workflow can produce outputs that subsequent nodes could consume as inputs. These dependencies dictate the structure of the workflow. -Workflows written using the SDK do not need to explicitly define nodes to enclose execution units (tasks, sub-workflows, launch plans); +Workflows written using the SDK don't need to explicitly define nodes to enclose execution units (tasks, sub-workflows, launch plans); they will be injected by the SDK and captured at registration time. Structure @@ -25,7 +25,7 @@ Workflow structure is flexible because: - A single workflow can contain any combination of task types. - A workflow can contain a single functional node. - A workflow can contain multiple nodes in all sorts of arrangements. -- A workflow can launch other workflows as well. +- A workflow can launch other workflows. At execution time, node executions are triggered as soon as their inputs are available. @@ -37,7 +37,7 @@ Flyte-Specific Structure ^^^^^^^^^^^^^^^^^^^^^^^^ During :ref:`registration `, Flyte validates the workflow structure and saves the workflow. -The registration process updates the workflow graph too. +The registration process updates the workflow graph. A compiled workflow will always have a start and end node injected into the workflow graph. In addition, a failure handler will catch and process execution failures. @@ -45,5 +45,5 @@ Versioning ---------- Like :ref:`tasks `, workflows are versioned too. Registered workflows are immutable, i.e., an instance of a -workflow defined by a specific project-domain-name-version combination can't be updated. +workflow defined by a specific {Project, Domain, Name, Version} combination can't be updated. Tasks referenced in a workflow version are immutable and are tied to specific tasks' versions. \ No newline at end of file