Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Concepts page #2409

Merged
merged 8 commits into from
May 4, 2022
Merged

Cleanup Concepts page #2409

merged 8 commits into from
May 4, 2022

Conversation

SmritiSatyanV
Copy link
Contributor

@SmritiSatyanV SmritiSatyanV commented Apr 25, 2022

Flyte Console changed to FlyteConsole, FlyteKit to Flytekit, FlyteCLI to Flytecli, Flyte Propeller to FlytePropeller
Restructured statements
Added directives and redirected links to internal files
Fixed files where rendering was off
Signed-off-by: SmritiSatyanV <[email protected]>

@cosmicBboy
Copy link
Contributor

we should rename this to "clean up concepts", the changes are on the concepts pages

@SmritiSatyanV SmritiSatyanV changed the title Restructure/Cleanup getting started Restructure/Cleanup Concepts page Apr 25, 2022
@SmritiSatyanV SmritiSatyanV changed the title Restructure/Cleanup Concepts page Cleanup Concepts page Apr 25, 2022
@SmritiSatyanV SmritiSatyanV marked this pull request as ready for review April 25, 2022 16:24

These :std:ref:`events <protos/docs/event/event:flyteidl/event/event.proto>` include

- WorkflowExecutionEvent
- NodeExecutionEvent
- TaskExecutionEvent

and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change.
and contain information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change.

These events are the **only** way to update an execution. No raw Update endpoint exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These events are the **only** way to update an execution. No raw Update endpoint exists.
These events provide the **only** way to update an execution. No raw update endpoint exists.


These events are the **only** way to update an execution. No raw Update endpoint exists.

To track the lifecycle of an execution admin, store attributes such as duration, timestamp at which an execution transitioned to running, and end time.
To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time.
To track the lifecycle of an execution, admin and store attributes such as `duration` and `timestamp` at which an execution transitioned to running and end time are used.



Planes
======

Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. The goal is that these planes can be replaced by alternate implementations.
Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. These planes can be replaced by alternate implementations too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert to the previous version? That's clearer.

@@ -4,14 +4,17 @@
FlytePropeller Architecture
###########################

Note: In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.
.. note::
In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.
In the frame of this document, we use the term “workflow” to describe the single execution of a workflow definition.


Introduction
============

Flyte workflows are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.
Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.
A Flyte :ref:`workflow <divedeep-workflows>` is represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality.
- ``TaskNodes`` support a plugin system to externally add system integrations.
- Control flow can be altered during runtime using ``BranchNodes``, which prune downstream evaluation paths based on input.
- ``DynamicNodes` add nodes to the DAG.
- ``WorkflowNodes`` allow embedding workflows within each other.

FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a k8s controller and adheres to established k8s design principles. In this scheme, resources are periodically evaluated and the goal is transition from the observed to a requested state. In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.
FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a K8s controller and adheres to the established K8s design principles. In this scheme, resources are periodically evaluated and the goal is to transition from the observed state to a requested state.

In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.
In our case, workflows are the resources and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.

-----------------------------------

Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node/task phases, status/phase transition timestamps, etc.


K8s exposes a powerful controller / operator API enabling entities to track creation / updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created or updated the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track ``FlyteWorkflow``s, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.


The WorkerPool is implemented as a collection of goroutines, one for each worker. Using this lightweight construct FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor).
The WorkerPool is implemented as a collection of Go routines, one for each worker. Using this lightweight construct, FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goroutines is correct, so we can revert to the previous version.


WorkflowExecutor
----------------

The WorkflowExecutor is unsurprisingly responsible for handling high-level workflow operations. This includes maintaining the workflow phase (e.x. running, failing, succeeded, etc) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD may be deleted. Additionally, at the conclusion of each evaluation round the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.
Copy link
Contributor

@samhita-alla samhita-alla Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (for example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track the status between evaluation iterations.

* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-workflow FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD.
* **TaskHandler (Plugins)**: These are responsible for executing plugin specific tasks. This may include contacting FlyteAdmin to schedule K8s pod to perform work, calling a web API to begin/track evaluation, and much more. The plugin paradigm exposes an extensible interface for adding functionality to Flyte workflows.
* **DynamicHandler**: Flyte workflow CRDs are initialized using a DAG compiled during the registration process. The numerous benefits of this approach are beyond the scope of this document. However, there are situations where the complete DAG is unknown at compile time. For example, when executing a task on each value of an input list. Using Dynamic nodes, a new DAG subgraph may be dynamically compiled during runtime and linked to the existing FlyteWorkflow CRD.
* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SmritiSatyanV, could you ask what TODO here is?

@@ -15,7 +15,7 @@ Characteristics
#. Standard `cron <https://en.wikipedia.org/wiki/Cron#CRON_expression>`__ support
#. Independently scalable
#. Small memory footprint
#. Schedules run as lightweight go routines
#. Schedules run as lightweight Go routines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. Schedules run as lightweight Go routines
#. Schedules run as lightweight goroutines



Scheduler
---------

This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is minutes for scheduling through both cron and fixed rate schedulers. The scheduler would be running in one replica, two at the most during redeployment. Multiple replicas will just duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter in-flight go-routines launched by the scheduler. Thus the behavior of these executions is undefined.
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined.
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight goroutines launched by the scheduler. Thus, the behavior of these executions is undefined.


GOCronWrapper
*************

This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around the `following framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules and creates in-memory representation of the scheduled job functions. The scheduler provides the ability to schedule a function with scheduleTime parameters. This is useful to know once the scheduled function is invoked as to what scheduled time this invocation is for. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing: minute, hour, day of month, month and day of week, in that order.
This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around `this framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules that creates in-memory representation of the scheduled job functions. The scheduler schedules a function with scheduleTime parameters. When this scheduled function is invoked, the scheduleTime parameters provide the current schedule time used by the scheduler. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing `minute`, `hour`, `day of month`, `month` and `day of week`, in that order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks?


Job Executor
************

This component is responsible for sending the scheduled executions to flyteadmin. The job function accepts the scheduleTime and the schedule which is used for creating an execution request to the admin. Each job function is tied to the schedule, which is executed in separate go routine according the schedule cadence.
This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence.
The job executor component is responsible for sending the scheduled executions to FlyteAdmin. The job function accepts ``scheduleTime`` and the schedule which is used to create an execution request to the admin. Each job function is tied to the schedule which is executed in a separate goroutine in accordance with the schedule cadence.

#############

This is the web UI for the Flyte platform. The results of running Flyte Console are displayed in this graph, explained below:
This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below:
FlyteConsole is the web UI for the Flyte platform. Here's a video that dives into the graph UX:

Comment on lines 77 to 79
This project supports `Storybook <https://storybook.js.org/>`_.
Component stories live next to the components they test, in a ``__stories__``
directory, with the filename pattern ``{Component}.stories.tsx``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This project supports `Storybook <https://storybook.js.org/>`_.
Component stories live next to the components they test, in a ``__stories__``
directory, with the filename pattern ``{Component}.stories.tsx``.
FlyteConsole uses `Storybook <https://storybook.js.org/>`__.
Component stories live next to the components they test in the ``__stories__``
directory with the filename pattern ``{Component}.stories.tsx``.

@@ -3,8 +3,7 @@
Dynamic Job Spec
================

A dynamic job spec is a subset of the full workflow spec that defines a set of tasks, workflows as well as
nodes and output bindindgs that control how the job should assemble its outputs.
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs.
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows, nodes, and output bindings that control how the job should assemble its outputs.

@@ -7,8 +7,7 @@ Flyte UI is a web-based user interface for Flyte. It helps interact with Flyte o

With Flyte UI, you can:

* Launch Workflows
* Launch Tasks
* Launch tasks and workflows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We explain these separately, so can you revert to the previous version?

@@ -8,7 +8,7 @@ a :ref:`task <divedeep-tasks>`, but it can also contain an entire subworkflow or
Nodes can have inputs and outputs, which are used to coordinate task inputs and outputs.
Moreover, node outputs can be used as inputs to other nodes within a workflow.

Tasks are always encapsulated within a node, however, like tasks, nodes can come in a variety of flavors determined by their *target*.
Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*.
Tasks are always encapsulated within a node. Like tasks, nodes can come in a variety of flavors determined by their *target*.


Dynamic Tasks
--------------

"Dynamic tasks" is a misnomer.
Flyte is one-of-a-kind workflow engine that ships with the concept of truly `Dynamic Workflows <https://blog.flyte.org/dynamic-workflows-in-flyte>`__!
Users can generate workflows in reaction to user inputs or computed values at runtime.
These executions are evaluated to generate a static graph, before execution.
These executions are evaluated to generate a static graph before execution commences. Such static graphs are shareable, and reproducible without any external infrastructure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this line?

that take care of executing the Flyte tasks.
Almost any action can be implemented and introduced into Flyte as a "Plugin".
Flyte exposes an extensible model to express tasks in an execution-independent language.
It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,
It contains first-class task plugins (for example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,

It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,
`Great Expectations <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/task.py>`__, and :ref:`more <integrations>`.)
that execute the Flyte tasks.
Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes.
Almost any action can be implemented and introduced into Flyte as a "Plugin", which includes:


**Timeouts**
For the system to ensure it is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is also possible for task authors to define a timeout period, after which the task gets marked as failure. Note that a timed-out task will be retried if it has a retry strategy defined.

To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.
To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.


Flyte supports memoization of task outputs to ensure that identical invocations of a task don't get executed repeatedly, wasting compute resources.
For more information on memoization, please refer to the :std:ref:`Caching Example <cookbook:sphx_glr_auto_core_flyte_basics_task_cache.py>`.
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it.
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example, if you wish to run the same piece of code multiple times, you can re-use the output instead of re-computing it.

@@ -3,47 +3,45 @@
Versions
========

One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment.
When users experiment, they usually work in isolation and try multiple iterations.
One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment.
One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment.

The cost of creating an independent infrastructure for each version is enormous and not desirable.
Moreover, it is desirable to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team,
while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams.
The cost of creating an independent infrastructure for each version is enormous but undesirable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The cost of creating an independent infrastructure for each version is enormous but undesirable.
The cost of creating an independent infrastructure for each version is enormous and undesirable.

while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams.
The cost of creating an independent infrastructure for each version is enormous but undesirable.
It is beneficial to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team,
whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams.
while the users can use it independently. This improves the cost of operation since the same infrastructure can be reused by multiple teams.

- Work on the same project concurrently yet identify the version/experiment that was successful.
- Capture the environment for a version and independently launch this environment.
- Work on the same project concurrently and identify the version/experiment that was successful.
- Capture the environment for a version and independently launch its environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Capture the environment for a version and independently launch its environment.
- Capture the environment for a version and independently launch it.

Comment on lines 32 to 34
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change
the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress
and another new workflow version has been activated, the execution of the old version continues unhindered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change
the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress
and another new workflow version has been activated, the execution of the old version continues unhindered.
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change the structure of a workflow between versions, without worrying about the consequences for the pipelines in production.
This hermetic property makes it effortless to manage and deploy new workflow versions and is important for workflows that are long-running.
If a workflow execution is in progress and another new workflow version has been activated, Flyte guarantees that the execution of the old version continues unhindered.

The astute may question, but what if, I had a bug in the previous version and I want to just fix the bug and run all previous executions.
Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible
that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem.
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions?
Copy link
Contributor

@samhita-alla samhita-alla Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions?
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.

Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible
that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem.
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions?
Fixing bugs involves code changes and this may affect the workflow structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fixing bugs involves code changes and this may affect the workflow structure.
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.


Flyte solves the above problem using 2 properties:
Flyte addresses this using 2 properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Flyte addresses this using 2 properties:
Flyte addresses this using two properties:

1. Since the workflow is completely versioned, changing the structure has no impact on an existing execution, and the workflow state will not be corrupted.
2. Flyte provides a concept of memoization. As long as the tasks have not changed and their behavior has not changed, it is possible to move them around and their previous outputs will be recovered, without having to rerun these tasks. And if the workflow changes were simply in a task this strategy will still work.
1. Since the entire workflow is versioned, changing the structure has no impact on the existing execution, and the workflow state won't be corrupted.
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task.
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun the tasks. This strategy will work even if the workflow changes are in a task.


How Is Versioning Tied to Reproducibility?
------------------------------------------
How Is Versioning Associated to Reproducibility?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "tied to" sounds better. WDYT?

It is also necessary to instantiate any infrastructure that the previous version may have used and, if not already recorded, ensure that the previously used dataset (say) can be reconstructed.
From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method.
Workflows can be reproduced without explicit versioning within the system.
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).
To reproduce a past experiment, users need to identify the source code and resurrect any dependencies that the code may have used (for example, TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).

From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method.
Workflows can be reproduced without explicit versioning within the system.
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed.
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, you'll have to ensure that the previously used dataset (say) can be reconstructed.


This is exactly how Flyte was conceived!

Every task is versioned, and Flyte precisely captures its dependency set. For external tasks, it is highly encouraged to use
memoization so that the constructed dataset is cached on the Flyte side, and hence, one can comfortably guarantee reproducible behavior from the external systems.
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems.
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can be cached on the Flyte side. This way, one can guarantee reproducible behavior from the external systems.

Signed-off-by: SmritiSatyanV <[email protected]>
Comment on lines 36 to 37
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.
Consider a scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
Simply fixing the bug in the task may not solve the problem.
Moreover, fixing bugs involves code changes, which may affect the workflow structure.

Signed-off-by: SmritiSatyanV <[email protected]>
@samhita-alla samhita-alla merged commit ae0a26c into master May 4, 2022
@samhita-alla samhita-alla deleted the restructure-getting-started branch May 4, 2022 16:41
yindia pushed a commit that referenced this pull request May 4, 2022
* Updated index.rst

Signed-off-by: SmritiSatyanV <[email protected]>

* Cleanup

Signed-off-by: SmritiSatyanV <[email protected]>

* Changes based on review

Signed-off-by: SmritiSatyanV <[email protected]>

* Updated flytepropeller

Signed-off-by: SmritiSatyanV <[email protected]>

* removed redundant line

Signed-off-by: SmritiSatyanV <[email protected]>

* Updated versioning.rst

Signed-off-by: SmritiSatyanV <[email protected]>
Signed-off-by: Yuvraj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants