-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup Concepts page #2409
Cleanup Concepts page #2409
Conversation
Signed-off-by: SmritiSatyanV <[email protected]>
Signed-off-by: SmritiSatyanV <[email protected]>
we should rename this to "clean up concepts", the changes are on the concepts pages |
rsts/concepts/admin.rst
Outdated
|
||
These :std:ref:`events <protos/docs/event/event:flyteidl/event/event.proto>` include | ||
|
||
- WorkflowExecutionEvent | ||
- NodeExecutionEvent | ||
- TaskExecutionEvent | ||
|
||
and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. | ||
and contain information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change. | ||
|
||
These events are the **only** way to update an execution. No raw Update endpoint exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These events are the **only** way to update an execution. No raw Update endpoint exists. | |
These events provide the **only** way to update an execution. No raw update endpoint exists. |
rsts/concepts/admin.rst
Outdated
|
||
These events are the **only** way to update an execution. No raw Update endpoint exists. | ||
|
||
To track the lifecycle of an execution admin, store attributes such as duration, timestamp at which an execution transitioned to running, and end time. | ||
To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time. | |
To track the lifecycle of an execution, admin and store attributes such as `duration` and `timestamp` at which an execution transitioned to running and end time are used. |
rsts/concepts/architecture.rst
Outdated
|
||
|
||
Planes | ||
====== | ||
|
||
Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. The goal is that these planes can be replaced by alternate implementations. | ||
Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. These planes can be replaced by alternate implementations too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you revert to the previous version? That's clearer.
@@ -4,14 +4,17 @@ | |||
FlytePropeller Architecture | |||
########################### | |||
|
|||
Note: In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition. | |||
.. note:: | |||
In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition. | |
In the frame of this document, we use the term “workflow” to describe the single execution of a workflow definition. |
|
||
Introduction | ||
============ | ||
|
||
Flyte workflows are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other. | ||
Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other. | |
A Flyte :ref:`workflow <divedeep-workflows>` is represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. | |
- ``TaskNodes`` support a plugin system to externally add system integrations. | |
- Control flow can be altered during runtime using ``BranchNodes``, which prune downstream evaluation paths based on input. | |
- ``DynamicNodes` add nodes to the DAG. | |
- ``WorkflowNodes`` allow embedding workflows within each other. |
FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a k8s controller and adheres to established k8s design principles. In this scheme, resources are periodically evaluated and the goal is transition from the observed to a requested state. In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. | ||
FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a K8s controller and adheres to the established K8s design principles. In this scheme, resources are periodically evaluated and the goal is to transition from the observed state to a requested state. | ||
|
||
In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. | |
In our case, workflows are the resources and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation. |
----------------------------------- | ||
|
||
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc. | ||
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc. | |
Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node/task phases, status/phase transition timestamps, etc. |
|
||
K8s exposes a powerful controller / operator API enabling entities to track creation / updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created or updated the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. | ||
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. | |
K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track ``FlyteWorkflow``s, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability. |
|
||
The WorkerPool is implemented as a collection of goroutines, one for each worker. Using this lightweight construct FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor). | ||
The WorkerPool is implemented as a collection of Go routines, one for each worker. Using this lightweight construct, FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goroutines
is correct, so we can revert to the previous version.
|
||
WorkflowExecutor | ||
---------------- | ||
|
||
The WorkflowExecutor is unsurprisingly responsible for handling high-level workflow operations. This includes maintaining the workflow phase (e.x. running, failing, succeeded, etc) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD may be deleted. Additionally, at the conclusion of each evaluation round the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations. | ||
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations. | |
The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (for example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track the status between evaluation iterations. |
* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-workflow FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD. | ||
* **TaskHandler (Plugins)**: These are responsible for executing plugin specific tasks. This may include contacting FlyteAdmin to schedule K8s pod to perform work, calling a web API to begin/track evaluation, and much more. The plugin paradigm exposes an extensible interface for adding functionality to Flyte workflows. | ||
* **DynamicHandler**: Flyte workflow CRDs are initialized using a DAG compiled during the registration process. The numerous benefits of this approach are beyond the scope of this document. However, there are situations where the complete DAG is unknown at compile time. For example, when executing a task on each value of an input list. Using Dynamic nodes, a new DAG subgraph may be dynamically compiled during runtime and linked to the existing FlyteWorkflow CRD. | ||
* **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SmritiSatyanV, could you ask what TODO here is?
@@ -15,7 +15,7 @@ Characteristics | |||
#. Standard `cron <https://en.wikipedia.org/wiki/Cron#CRON_expression>`__ support | |||
#. Independently scalable | |||
#. Small memory footprint | |||
#. Schedules run as lightweight go routines | |||
#. Schedules run as lightweight Go routines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#. Schedules run as lightweight Go routines | |
#. Schedules run as lightweight goroutines |
|
||
|
||
Scheduler | ||
--------- | ||
|
||
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is minutes for scheduling through both cron and fixed rate schedulers. The scheduler would be running in one replica, two at the most during redeployment. Multiple replicas will just duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter in-flight go-routines launched by the scheduler. Thus the behavior of these executions is undefined. | ||
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined. | |
This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight goroutines launched by the scheduler. Thus, the behavior of these executions is undefined. |
|
||
GOCronWrapper | ||
************* | ||
|
||
This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around the `following framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules and creates in-memory representation of the scheduled job functions. The scheduler provides the ability to schedule a function with scheduleTime parameters. This is useful to know once the scheduled function is invoked as to what scheduled time this invocation is for. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing: minute, hour, day of month, month and day of week, in that order. | ||
This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around `this framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules that creates in-memory representation of the scheduled job functions. The scheduler schedules a function with scheduleTime parameters. When this scheduled function is invoked, the scheduleTime parameters provide the current schedule time used by the scheduler. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing `minute`, `hour`, `day of month`, `month` and `day of week`, in that order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double backticks?
|
||
Job Executor | ||
************ | ||
|
||
This component is responsible for sending the scheduled executions to flyteadmin. The job function accepts the scheduleTime and the schedule which is used for creating an execution request to the admin. Each job function is tied to the schedule, which is executed in separate go routine according the schedule cadence. | ||
This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence. | |
The job executor component is responsible for sending the scheduled executions to FlyteAdmin. The job function accepts ``scheduleTime`` and the schedule which is used to create an execution request to the admin. Each job function is tied to the schedule which is executed in a separate goroutine in accordance with the schedule cadence. |
rsts/concepts/console.rst
Outdated
############# | ||
|
||
This is the web UI for the Flyte platform. The results of running Flyte Console are displayed in this graph, explained below: | ||
This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below: | |
FlyteConsole is the web UI for the Flyte platform. Here's a video that dives into the graph UX: |
rsts/concepts/console.rst
Outdated
This project supports `Storybook <https://storybook.js.org/>`_. | ||
Component stories live next to the components they test, in a ``__stories__`` | ||
directory, with the filename pattern ``{Component}.stories.tsx``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project supports `Storybook <https://storybook.js.org/>`_. | |
Component stories live next to the components they test, in a ``__stories__`` | |
directory, with the filename pattern ``{Component}.stories.tsx``. | |
FlyteConsole uses `Storybook <https://storybook.js.org/>`__. | |
Component stories live next to the components they test in the ``__stories__`` | |
directory with the filename pattern ``{Component}.stories.tsx``. |
rsts/concepts/dynamic_spec.rst
Outdated
@@ -3,8 +3,7 @@ | |||
Dynamic Job Spec | |||
================ | |||
|
|||
A dynamic job spec is a subset of the full workflow spec that defines a set of tasks, workflows as well as | |||
nodes and output bindindgs that control how the job should assemble its outputs. | |||
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs. | |
A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows, nodes, and output bindings that control how the job should assemble its outputs. |
rsts/concepts/flyte_console.rst
Outdated
@@ -7,8 +7,7 @@ Flyte UI is a web-based user interface for Flyte. It helps interact with Flyte o | |||
|
|||
With Flyte UI, you can: | |||
|
|||
* Launch Workflows | |||
* Launch Tasks | |||
* Launch tasks and workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We explain these separately, so can you revert to the previous version?
rsts/concepts/nodes.rst
Outdated
@@ -8,7 +8,7 @@ a :ref:`task <divedeep-tasks>`, but it can also contain an entire subworkflow or | |||
Nodes can have inputs and outputs, which are used to coordinate task inputs and outputs. | |||
Moreover, node outputs can be used as inputs to other nodes within a workflow. | |||
|
|||
Tasks are always encapsulated within a node, however, like tasks, nodes can come in a variety of flavors determined by their *target*. | |||
Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*. | |
Tasks are always encapsulated within a node. Like tasks, nodes can come in a variety of flavors determined by their *target*. |
rsts/concepts/tasks.rst
Outdated
|
||
Dynamic Tasks | ||
-------------- | ||
|
||
"Dynamic tasks" is a misnomer. | ||
Flyte is one-of-a-kind workflow engine that ships with the concept of truly `Dynamic Workflows <https://blog.flyte.org/dynamic-workflows-in-flyte>`__! | ||
Users can generate workflows in reaction to user inputs or computed values at runtime. | ||
These executions are evaluated to generate a static graph, before execution. | ||
These executions are evaluated to generate a static graph before execution commences. Such static graphs are shareable, and reproducible without any external infrastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this line?
rsts/concepts/tasks.rst
Outdated
that take care of executing the Flyte tasks. | ||
Almost any action can be implemented and introduced into Flyte as a "Plugin". | ||
Flyte exposes an extensible model to express tasks in an execution-independent language. | ||
It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, | |
It contains first-class task plugins (for example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, |
rsts/concepts/tasks.rst
Outdated
It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, | ||
`Great Expectations <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/task.py>`__, and :ref:`more <integrations>`.) | ||
that execute the Flyte tasks. | ||
Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes. | |
Almost any action can be implemented and introduced into Flyte as a "Plugin", which includes: |
rsts/concepts/tasks.rst
Outdated
|
||
**Timeouts** | ||
For the system to ensure it is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is also possible for task authors to define a timeout period, after which the task gets marked as failure. Note that a timed-out task will be retried if it has a retry strategy defined. | ||
|
||
To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__. | |
To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__. |
rsts/concepts/tasks.rst
Outdated
|
||
Flyte supports memoization of task outputs to ensure that identical invocations of a task don't get executed repeatedly, wasting compute resources. | ||
For more information on memoization, please refer to the :std:ref:`Caching Example <cookbook:sphx_glr_auto_core_flyte_basics_task_cache.py>`. | ||
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it. | |
Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example, if you wish to run the same piece of code multiple times, you can re-use the output instead of re-computing it. |
rsts/concepts/versioning.rst
Outdated
@@ -3,47 +3,45 @@ | |||
Versions | |||
======== | |||
|
|||
One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment. | |||
When users experiment, they usually work in isolation and try multiple iterations. | |||
One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment. | |
One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment. |
rsts/concepts/versioning.rst
Outdated
The cost of creating an independent infrastructure for each version is enormous and not desirable. | ||
Moreover, it is desirable to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team, | ||
while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams. | ||
The cost of creating an independent infrastructure for each version is enormous but undesirable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost of creating an independent infrastructure for each version is enormous but undesirable. | |
The cost of creating an independent infrastructure for each version is enormous and undesirable. |
rsts/concepts/versioning.rst
Outdated
while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams. | ||
The cost of creating an independent infrastructure for each version is enormous but undesirable. | ||
It is beneficial to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team, | ||
whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams. | |
while the users can use it independently. This improves the cost of operation since the same infrastructure can be reused by multiple teams. |
rsts/concepts/versioning.rst
Outdated
- Work on the same project concurrently yet identify the version/experiment that was successful. | ||
- Capture the environment for a version and independently launch this environment. | ||
- Work on the same project concurrently and identify the version/experiment that was successful. | ||
- Capture the environment for a version and independently launch its environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Capture the environment for a version and independently launch its environment. | |
- Capture the environment for a version and independently launch it. |
rsts/concepts/versioning.rst
Outdated
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change | ||
the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress | ||
and another new workflow version has been activated, the execution of the old version continues unhindered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change | |
the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress | |
and another new workflow version has been activated, the execution of the old version continues unhindered. | |
The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. | |
This hermetic property makes it effortless to manage and deploy new workflow versions and is important for workflows that are long-running. | |
If a workflow execution is in progress and another new workflow version has been activated, Flyte guarantees that the execution of the old version continues unhindered. |
rsts/concepts/versioning.rst
Outdated
The astute may question, but what if, I had a bug in the previous version and I want to just fix the bug and run all previous executions. | ||
Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible | ||
that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem. | ||
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions? | |
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed. |
rsts/concepts/versioning.rst
Outdated
Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible | ||
that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem. | ||
Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions? | ||
Fixing bugs involves code changes and this may affect the workflow structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing bugs involves code changes and this may affect the workflow structure. | |
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem. |
rsts/concepts/versioning.rst
Outdated
|
||
Flyte solves the above problem using 2 properties: | ||
Flyte addresses this using 2 properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flyte addresses this using 2 properties: | |
Flyte addresses this using two properties: |
rsts/concepts/versioning.rst
Outdated
1. Since the workflow is completely versioned, changing the structure has no impact on an existing execution, and the workflow state will not be corrupted. | ||
2. Flyte provides a concept of memoization. As long as the tasks have not changed and their behavior has not changed, it is possible to move them around and their previous outputs will be recovered, without having to rerun these tasks. And if the workflow changes were simply in a task this strategy will still work. | ||
1. Since the entire workflow is versioned, changing the structure has no impact on the existing execution, and the workflow state won't be corrupted. | ||
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task. | |
2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun the tasks. This strategy will work even if the workflow changes are in a task. |
rsts/concepts/versioning.rst
Outdated
|
||
How Is Versioning Tied to Reproducibility? | ||
------------------------------------------ | ||
How Is Versioning Associated to Reproducibility? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "tied to" sounds better. WDYT?
rsts/concepts/versioning.rst
Outdated
It is also necessary to instantiate any infrastructure that the previous version may have used and, if not already recorded, ensure that the previously used dataset (say) can be reconstructed. | ||
From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method. | ||
Workflows can be reproduced without explicit versioning within the system. | ||
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). | |
To reproduce a past experiment, users need to identify the source code and resurrect any dependencies that the code may have used (for example, TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). |
rsts/concepts/versioning.rst
Outdated
From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method. | ||
Workflows can be reproduced without explicit versioning within the system. | ||
To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries). | ||
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed. | |
It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, you'll have to ensure that the previously used dataset (say) can be reconstructed. |
rsts/concepts/versioning.rst
Outdated
|
||
This is exactly how Flyte was conceived! | ||
|
||
Every task is versioned, and Flyte precisely captures its dependency set. For external tasks, it is highly encouraged to use | ||
memoization so that the constructed dataset is cached on the Flyte side, and hence, one can comfortably guarantee reproducible behavior from the external systems. | ||
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems. | |
In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can be cached on the Flyte side. This way, one can guarantee reproducible behavior from the external systems. |
Signed-off-by: SmritiSatyanV <[email protected]>
…eorg/flyte into restructure-getting-started
Signed-off-by: SmritiSatyanV <[email protected]>
Signed-off-by: SmritiSatyanV <[email protected]>
rsts/concepts/versioning.rst
Outdated
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed. | ||
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed. | |
Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem. | |
Consider a scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed. | |
Simply fixing the bug in the task may not solve the problem. | |
Moreover, fixing bugs involves code changes, which may affect the workflow structure. |
Signed-off-by: SmritiSatyanV <[email protected]>
* Updated index.rst Signed-off-by: SmritiSatyanV <[email protected]> * Cleanup Signed-off-by: SmritiSatyanV <[email protected]> * Changes based on review Signed-off-by: SmritiSatyanV <[email protected]> * Updated flytepropeller Signed-off-by: SmritiSatyanV <[email protected]> * removed redundant line Signed-off-by: SmritiSatyanV <[email protected]> * Updated versioning.rst Signed-off-by: SmritiSatyanV <[email protected]> Signed-off-by: Yuvraj <[email protected]>
Flyte Console changed to FlyteConsole, FlyteKit to Flytekit, FlyteCLI to Flytecli, Flyte Propeller to FlytePropeller
Restructured statements
Added directives and redirected links to internal files
Fixed files where rendering was off
Signed-off-by: SmritiSatyanV <[email protected]>