New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Cleanup Concepts page #2409

Merged

samhita-alla merged 8 commits into master from restructure-getting-started

May 4, 2022

Contributor

SmritiSatyanV commented Apr 25, 2022 •

edited

Loading

Flyte Console changed to FlyteConsole, FlyteKit to Flytekit, FlyteCLI to Flytecli, Flyte Propeller to FlytePropeller
Restructured statements
Added directives and redirected links to internal files
Fixed files where rendering was off
Signed-off-by: SmritiSatyanV <[email protected]>

SmritiSatyanV and others added 3 commits

April 15, 2022 12:14


          Updated index.rst

681122a

Signed-off-by: SmritiSatyanV <[email protected]>


          Cleanup

a60c68b

Signed-off-by: SmritiSatyanV <[email protected]>


          Merge branch 'master' into restructure-getting-started

b864e72

Contributor

cosmicBboy commented Apr 25, 2022

we should rename this to "clean up concepts", the changes are on the concepts pages

SmritiSatyanV changed the title ~~Restructure/Cleanup getting started~~ Restructure/Cleanup Concepts page

SmritiSatyanV changed the title ~~Restructure/Cleanup Concepts page~~ Cleanup Concepts page

SmritiSatyanV marked this pull request as ready for review

April 25, 2022 16:24

samhita-alla reviewed

View reviewed changes

rsts/concepts/admin.rst Outdated

               These :std:ref:`events <protos/docs/event/event:flyteidl/event/event.proto>` include
               - WorkflowExecutionEvent
               - NodeExecutionEvent
               - TaskExecutionEvent
-              and include information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change.
+              and contain information about respective phase transitions, phase transition time and optional output data if the event concerns a terminal phase change.
               These events are the **only** way to update an execution. No raw Update endpoint exists.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            These events are the **only** way to update an execution. No raw Update endpoint exists.
          
            These events provide the **only** way to update an execution. No raw update endpoint exists.

samhita-alla reviewed

View reviewed changes

rsts/concepts/admin.rst Outdated

    
              These events are the **only** way to update an execution. No raw Update endpoint exists.

              To track the lifecycle of an execution admin, store attributes such as duration, timestamp at which an execution transitioned to running, and end time.

              To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            To track the lifecycle of an execution admin, store attributes such as `duration`, and `timestamp` at which an execution transitioned to running, and end time.
          
            To track the lifecycle of an execution, admin and store attributes such as `duration` and `timestamp` at which an execution transitioned to running and end time are used.

samhita-alla reviewed

View reviewed changes

rsts/concepts/architecture.rst Outdated

    
              Planes

              ======

              Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. The goal is that these planes can be replaced by alternate implementations.

              Flyte components are separated into 3 logical planes. The planes are summarized and explained in detail below. These planes can be replaced by alternate implementations too.

Contributor

samhita-alla Apr 27, 2022

Could you revert to the previous version? That's clearer.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

@@ @@ -4,14 +4,17 @@ @@
               FlytePropeller Architecture
               ###########################
-              Note: In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.
+              .. note::
+                 In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
               In the frame of this document we use the term “workflow” to describe a single execution of a workflow definition.
          
               In the frame of this document, we use the term “workflow” to describe the single execution of a workflow definition.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

               Introduction
               ============
-              Flyte workflows are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.
+              Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Flyte :ref:`workflows <divedeep-workflows>` are represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality. TaskNodes support a plugin system to externally add system integrations. Control flow can be altered during runtime using BranchNodes, which prune downstream evaluation paths based on input, and DynamicNodes, which add nodes to the DAG. WorkflowNodes allow embedding workflows within each other.
          
            A Flyte :ref:`workflow <divedeep-workflows>` is represented as a Directed Acyclic Graph (DAG) of interconnected Nodes. Flyte supports a robust collection of Node types to ensure diverse functionality.
          
            - ``TaskNodes`` support a plugin system to externally add system integrations.
          
            - Control flow can be altered during runtime using ``BranchNodes``, which prune downstream evaluation paths based on input. 
          
            - ``DynamicNodes` add nodes to the DAG. 
          
            - ``WorkflowNodes`` allow embedding workflows within each other.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

-              FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a k8s controller and adheres to established k8s design principles. In this scheme, resources are periodically evaluated and the goal is transition from the observed to a requested state. In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.
+              FlytePropeller is responsible for scheduling and tracking execution of Flyte workflows. It is implemented using a K8s controller and adheres to the established K8s design principles. In this scheme, resources are periodically evaluated and the goal is to transition from the observed state to a requested state.
+              In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            In our case, workflows are the resource and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and the FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.
          
            In our case, workflows are the resources and they are iteratively evaluated to transition from the current state to success. During each loop, the current workflow state is established as the phase of workflow nodes and subsequent tasks, and FlytePropeller performs operations to transition this state to success. The operations may include scheduling (or rescheduling) node executions, evaluating dynamic or branch nodes, etc. These design decisions ensure that FlytePropeller can scale to manage a large number of concurrent workflows without performance degradation.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

               -----------------------------------
-              Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.
+              Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node / task phases, status / phase transition timestamps, etc.
          
            Workflows in Flyte are maintained as Custom Resource Definitions (CRDs) in Kubernetes, which are stored in the backing etcd cluster. Each execution of a workflow definition results in the creation of a new FlyteWorkflow CRD which maintains a state for the entirety of processing. CRDs provide variable definitions to describe both resource specifications (spec) and status' (status). The FlyteWorkflow CRD uses the spec subsection to detail the workflow DAG, embodying node dependencies, etc. The status subsection tracks workflow metadata including overall workflow status, node/task phases, status/phase transition timestamps, etc.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst

    
              K8s exposes a powerful controller / operator API enabling entities to track creation / updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created or updated the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.

              K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track FlyteWorkflows, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.
          
            K8s exposes a powerful controller/operator API that enables entities to track creation/updates over a specific resource type. FlytePropeller uses this API to track ``FlyteWorkflow``s, meaning every time an instance of the FlyteWorkflow CRD is created/updated, the FlytePropeller instance is notified. FlyteAdmin is the common entry point, where initialization of FlyteWorkflow CRDs may be triggered by user workflow definition executions, automatic relaunches, or periodically scheduled workflow definition executions. However, it is conceivable to manually create FlyteWorkflow CRDs, but this will have limited visibility and usability.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

    
              The WorkerPool is implemented as a collection of goroutines, one for each worker. Using this lightweight construct FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor).

              The WorkerPool is implemented as a collection of Go routines, one for each worker. Using this lightweight construct, FlytePropeller can scale to 1000s of workers on a single CPU. Workers continually poll the WorkQueue for workflows. On success, the workflow is executed (passed to WorkflowExecutor).

Contributor

samhita-alla Apr 27, 2022

goroutines is correct, so we can revert to the previous version.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst Outdated

    
              WorkflowExecutor

              ----------------

              The WorkflowExecutor is unsurprisingly responsible for handling high-level workflow operations. This includes maintaining the workflow phase (e.x. running, failing, succeeded, etc) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD may be deleted. Additionally, at the conclusion of each evaluation round the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.

              The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.

Contributor

samhita-alla Apr 27, 2022 •

edited

Loading

Suggested change

      
            The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (For example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track status between evaluation iterations.
          
            The WorkflowExecutor is responsible for handling high-level workflow operations. This includes maintaining the workflow phase (for example: running, failing, succeeded, etc.) according to the underlying node phases and administering pending cleanup operations. For example, aborting existing node evaluations during workflow failures or removing FlyteWorkflow CRD finalizers on completion to ensure the CRD is deleted. Additionally, at the conclusion of each evaluation round, the WorkflowExecutor updates the FlyteWorkflow CRD with updated metadata fields to track the status between evaluation iterations.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/flytepropeller_architecture.rst

    
              * **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-workflow FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD.

              * **TaskHandler (Plugins)**: These are responsible for executing plugin specific tasks. This may include contacting FlyteAdmin to schedule K8s pod to perform work, calling a web API to begin/track evaluation, and much more. The plugin paradigm exposes an extensible interface for adding functionality to Flyte workflows.

              * **DynamicHandler**: Flyte workflow CRDs are initialized using a DAG compiled during the registration process. The numerous benefits of this approach are beyond the scope of this document. However, there are situations where the complete DAG is unknown at compile time. For example, when executing a task on each value of an input list. Using Dynamic nodes, a new DAG subgraph may be dynamically compiled during runtime and linked to the existing FlyteWorkflow CRD.

              * **WorkflowHandler**: This handler allows embedding workflows within another workflow definition. The API exposes this functionality using either (1) an inline execution, where the workflow function is invoked directly resulting in a single FlyteWorkflow CRD with an appended sub-workflow, or (2) a launch plan, which uses a TODO to create a separate sub-FlyteWorkflow CRD whose execution state is linked to the parent FlyteWorkflow CRD.

Contributor

samhita-alla Apr 27, 2022

@SmritiSatyanV, could you ask what TODO here is?

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/native_scheduler_architecture.rst Outdated

@@ @@ -15,7 +15,7 @@ Characteristics @@
               #. Standard `cron <https://en.wikipedia.org/wiki/Cron#CRON_expression>`__ support
               #. Independently scalable
               #. Small memory footprint
-              #. Schedules run as lightweight go routines
+              #. Schedules run as lightweight Go routines

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            #. Schedules run as lightweight Go routines
          
            #. Schedules run as lightweight goroutines

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/native_scheduler_architecture.rst Outdated

    
              Scheduler

              ---------

              This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is minutes for scheduling through both cron and fixed rate schedulers. The scheduler would be running in one replica, two at the most during redeployment. Multiple replicas will just duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter in-flight go-routines launched by the scheduler. Thus the behavior of these executions is undefined.

              This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight Go routines launched by the scheduler. Thus, the behavior of these executions is undefined.
          
            This component is a singleton and is responsible for reading the schedules from the DB and running them at the cadence defined by the schedule. The lowest granularity supported is `minutes` for scheduling through both cron and fixed rate schedulers. The scheduler can run in one replica, two at the most during redeployment. Multiple replicas will only duplicate the work, since each execution for a scheduleTime will have a unique identifier derived from the schedule name and the time of the schedule. The idempotency aspect of the admin for the same identifier prevents duplication on the admin side. The scheduler runs continuously in a loop reading the updated schedule entries in the data store and adding or removing the schedules. Removing a schedule will not alter the in-flight goroutines launched by the scheduler. Thus, the behavior of these executions is undefined.

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/native_scheduler_architecture.rst Outdated

    
              GOCronWrapper

              *************

              This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around the `following framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules and creates in-memory representation of the scheduled job functions. The scheduler provides the ability to schedule a function with scheduleTime parameters. This is useful to know once the scheduled function is invoked as to what scheduled time this invocation is for. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing: minute, hour, day of month, month and day of week, in that order.

              This component is responsible for locking in the time for the scheduled job to be invoked and adding those to the cron scheduler. It is a wrapper around `this framework <https://github.com/robfig/cron/v3>`__ for fixed rate and cron schedules that creates in-memory representation of the scheduled job functions. The scheduler schedules a function with scheduleTime parameters. When this scheduled function is invoked, the scheduleTime parameters provide the current schedule time used by the scheduler. This scheduler supports standard cron scheduling which has 5 `fields <https://en.wikipedia.org/wiki/Cron>`__. It requires 5 entries representing `minute`, `hour`, `day of month`, `month` and `day of week`, in that order.

Contributor

samhita-alla Apr 27, 2022

double backticks?

samhita-alla reviewed

View reviewed changes

rsts/concepts/component_architecture/native_scheduler_architecture.rst Outdated

    
              Job Executor

              ************

              This component is responsible for sending the scheduled executions to flyteadmin. The job function accepts the scheduleTime and the schedule which is used for creating an execution request to the admin. Each job function is tied to the schedule, which is executed in separate go routine according the schedule cadence.

              This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            This component is responsible in sending the scheduled executions to FlyteAdmin. The job function accepts the scheduleTime and the schedule used to create an execution requests the admin. Each job function is tied to the schedule, which is executed in separate Go routine in accordance to the schedule cadence.
          
            The job executor component is responsible for sending the scheduled executions to FlyteAdmin. The job function accepts ``scheduleTime`` and the schedule which is used to create an execution request to the admin. Each job function is tied to the schedule which is executed in a separate goroutine in accordance with the schedule cadence.

samhita-alla reviewed

View reviewed changes

rsts/concepts/console.rst Outdated

               #############
-              This is the web UI for the Flyte platform. The results of running Flyte Console are displayed in this graph, explained below:
+              This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below:

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            This is the web UI for the Flyte platform. The results of running FlyteConsole are displayed in this graph, explained below:
          
            FlyteConsole is the web UI for the Flyte platform. Here's a video that dives into the graph UX:

samhita-alla reviewed

View reviewed changes

rsts/concepts/console.rst Outdated

Comment on lines 77 to 79

+              This project supports `Storybook <https://storybook.js.org/>`_.
               Component stories live next to the components they test, in a ``__stories__``
               directory, with the filename pattern ``{Component}.stories.tsx``.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            This project supports `Storybook <https://storybook.js.org/>`_.
          
            Component stories live next to the components they test, in a ``__stories__``
          
            directory, with the filename pattern ``{Component}.stories.tsx``.
          
            FlyteConsole uses `Storybook <https://storybook.js.org/>`__.
          
            Component stories live next to the components they test in the ``__stories__``
          
            directory with the filename pattern ``{Component}.stories.tsx``.

samhita-alla reviewed

View reviewed changes

rsts/concepts/dynamic_spec.rst Outdated

@@ @@ -3,8 +3,7 @@ @@
               Dynamic Job Spec
               ================
-              A dynamic job spec is a subset of the full workflow spec that defines a set of tasks, workflows as well as
-              nodes and output bindindgs that control how the job should assemble its outputs.
+              A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows as well as nodes and output bindindgs that control how the job should assemble its outputs.
          
            A dynamic job spec is a subset of the entire workflow spec that defines a set of tasks, workflows, nodes, and output bindings that control how the job should assemble its outputs.

samhita-alla reviewed

View reviewed changes

rsts/concepts/flyte_console.rst Outdated

               With Flyte UI, you can:
-              * Launch Workflows
-              * Launch Tasks
+              * Launch tasks and workflows

Contributor

samhita-alla Apr 27, 2022

We explain these separately, so can you revert to the previous version?

samhita-alla reviewed

View reviewed changes

rsts/concepts/nodes.rst Outdated

               Nodes can have inputs and outputs, which are used to coordinate task inputs and outputs.
               Moreover, node outputs can be used as inputs to other nodes within a workflow.
-              Tasks are always encapsulated within a node, however, like tasks, nodes can come in a variety of flavors determined by their *target*.
+              Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Tasks are always encapsulated within a node. However, like tasks, nodes can come in a variety of flavors determined by their *target*.
          
            Tasks are always encapsulated within a node. Like tasks, nodes can come in a variety of flavors determined by their *target*.

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Show resolved Hide resolved

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Outdated

               Dynamic Tasks
               --------------
               "Dynamic tasks" is a misnomer.
               Flyte is one-of-a-kind workflow engine that ships with the concept of truly `Dynamic Workflows <https://blog.flyte.org/dynamic-workflows-in-flyte>`__!
               Users can generate workflows in reaction to user inputs or computed values at runtime.
-              These executions are evaluated to generate a static graph, before execution.
+              These executions are evaluated to generate a static graph before execution commences. Such static graphs are shareable, and reproducible without any external infrastructure.

Contributor

samhita-alla Apr 27, 2022

Can you remove this line?

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Outdated

-              that take care of executing the Flyte tasks.
-              Almost any action can be implemented and introduced into Flyte as a "Plugin".
+              Flyte exposes an extensible model to express tasks in an execution-independent language.
+              It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, 
          
            It contains first-class task plugins (for example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__,

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Outdated

    
              It contains first-class task plugins (For example: `Papermill <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-papermill/flytekitplugins/papermill/task.py>`__, 

              `Great Expectations <https://github.com/flyteorg/flytekit/blob/master/plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/task.py>`__, and :ref:`more <integrations>`.) 

              that execute the Flyte tasks. 

              Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Almost any action can be implemented and introduced into Flyte as a "Plugin", that includes.
          
            Almost any action can be implemented and introduced into Flyte as a "Plugin", which includes:

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Outdated

               **Timeouts**
-                For the system to ensure it is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is also possible for task authors to define a timeout period, after which the task gets marked as failure. Note that a timed-out task will be retried if it has a retry strategy defined.
+              To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout mechanism is handled `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.
          
            To ensure that the system is always making progress, tasks must be guaranteed to end gracefully/successfully. The system defines a default timeout period for the tasks. It is possible for task authors to define a timeout period, after which the task is marked as ``failure``. Note that a timed-out task will be retried if it has a retry strategy defined. The timeout can be handled in the `TaskMetadata <https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.TaskMetadata.html?highlight=retries#flytekit.TaskMetadata>`__.

samhita-alla reviewed

View reviewed changes

rsts/concepts/tasks.rst Outdated

-              Flyte supports memoization of task outputs to ensure that identical invocations of a task don't get executed repeatedly, wasting compute resources.
-              For more information on memoization, please refer to the :std:ref:`Caching Example <cookbook:sphx_glr_auto_core_flyte_basics_task_cache.py>`.
+              Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example: If you are debugging your code and wish to run it multiple times, you can re-use the output instead of re-computing it.
          
            Flyte supports memoization of task outputs to ensure that identical invocations of a task are not executed repeatedly, thereby saving compute resources and execution time. For example, if you wish to run the same piece of code multiple times, you can re-use the output instead of re-computing it.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

@@ @@ -3,47 +3,45 @@ @@
               Versions
               ========
-              One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment.
-              When users experiment, they usually work in isolation and try multiple iterations.
+              One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            One of the most important features and reasons for design decisions in Flyte is the need for machine learning and data practitioners to experiment.
          
            One of the most important features and reasons for certain design decisions in Flyte is the need for machine learning and data practitioners to experiment.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              The cost of creating an independent infrastructure for each version is enormous and not desirable.
-              Moreover, it is desirable to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team,
-              while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams.
+              The cost of creating an independent infrastructure for each version is enormous but undesirable.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            The cost of creating an independent infrastructure for each version is enormous but undesirable.
          
            The cost of creating an independent infrastructure for each version is enormous and undesirable.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

    
              while users can use it independently. This also improves the cost of operation, since it is possible to reuse the same infrastructure for multiple teams.

              The cost of creating an independent infrastructure for each version is enormous but undesirable.

              It is beneficial to share the same centralized infrastructure, where the burden of maintaining the infrastructure is with a central infrastructure team,

              whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            whereas the users can use it independently. This improves the cost of operation, since the same infrastructure can be reused by multiple teams.
          
            while the users can use it independently. This improves the cost of operation since the same infrastructure can be reused by multiple teams.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

    
              - Work on the same project concurrently yet identify the version/experiment that was successful.

              - Capture the environment for a version and independently launch this environment.

              - Work on the same project concurrently and identify the version/experiment that was successful.

              - Capture the environment for a version and independently launch its environment.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            - Capture the environment for a version and independently launch its environment.
          
            - Capture the environment for a version and independently launch it.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

Comment on lines 32 to 34

+              The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change
+              the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress
+              and another new workflow version has been activated, the execution of the old version continues unhindered.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change
          
            the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. This hermetic property makes it effortless to manage and deploy new workflow versions. This is important for workflows that are long-running. Flyte guarantees that if a workflow execution is in progress
          
            and another new workflow version has been activated, the execution of the old version continues unhindered.
          
            The entire workflow in Flyte is versioned and all tasks and entities are immutable which makes it possible to completely change the structure of a workflow between versions, without worrying about the consequences for the pipelines in production. 
          
            This hermetic property makes it effortless to manage and deploy new workflow versions and is important for workflows that are long-running. 
          
            If a workflow execution is in progress and another new workflow version has been activated, Flyte guarantees that the execution of the old version continues unhindered.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              The astute may question, but what if, I had a bug in the previous version and I want to just fix the bug and run all previous executions.
-              Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible
-              that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem.
+              Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions?

Contributor

samhita-alla Apr 27, 2022 •

edited

Loading

Suggested change

      
            Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions? 
          
            Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              Before we understand how Flyte tackles this, let us analyze the problem further - fixing a bug will need a code change and it is possible
-              that the bug may actually affect the structure of the workflow. Simply fixing the bug in the task may not solve the problem.
+              Another questions we address here is: What if there was a bug in the previous version that needs to be fixed, and run the previous executions?
+              Fixing bugs involves code changes and this may affect the workflow structure.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Fixing bugs involves code changes and this may affect the workflow structure. 
          
            Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated


		Flyte solves the above problem using 2 properties:
		Flyte addresses this using 2 properties:

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            Flyte addresses this using 2 properties:
          
            Flyte addresses this using two properties:

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

    
              1. Since the workflow is completely versioned, changing the structure has no impact on an existing execution, and the workflow state will not be corrupted.

              2. Flyte provides a concept of memoization. As long as the tasks have not changed and their behavior has not changed, it is possible to move them around and their previous outputs will be recovered, without having to rerun these tasks. And if the workflow changes were simply in a task this strategy will still work.

              1. Since the entire workflow is versioned, changing the structure has no impact on the existing execution, and the workflow state won't be corrupted.

              2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun these tasks. This strategy will work even ff the workflow changes were only in a task.
          
            2. Flyte provides caching/memoization of outputs. As long as the tasks and their behavior have not changed, it is possible to move them around and still recover their previous outputs, without having to rerun the tasks. This strategy will work even if the workflow changes are in a task.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              How Is Versioning Tied to Reproducibility?
-              ------------------------------------------
+              How Is Versioning Associated to Reproducibility?

Contributor

samhita-alla Apr 27, 2022

I think "tied to" sounds better. WDYT?

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              It is also necessary to instantiate any infrastructure that the previous version may have used and, if not already recorded, ensure that the previously used dataset (say) can be reconstructed.
-              From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method.
+              Workflows can be reproduced without explicit versioning within the system.
+              To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).
          
            To reproduce a past experiment, users need to identify the source code and resurrect any dependencies that the code may have used (for example, TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

-              From the first principles, if reproducibility is considered to be one of the most important concerns, then one would capture all these variables and provide them in an easy-to-use method.
+              Workflows can be reproduced without explicit versioning within the system.
+              To reproduce a past experiment, users need to identify the source code, and resurrect any dependencies that the code may have used (For example: TensorFlow 1.x instead of TensorFlow 2.x, or specific Python libraries).
+              It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, ensure that the previously used dataset (say) can be reconstructed.
          
            It is also required to instantiate the infrastructure that the previous version may have used. If not recorded, you'll have to ensure that the previously used dataset (say) can be reconstructed.

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

               This is exactly how Flyte was conceived!
-              Every task is versioned, and Flyte precisely captures its dependency set. For external tasks, it is highly encouraged to use
-              memoization so that the constructed dataset is cached on the Flyte side, and hence, one can comfortably guarantee reproducible behavior from the external systems.
+              In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems.

Contributor

samhita-alla Apr 27, 2022

Suggested change

      
            In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can BE cached on the Flyte side. This way, one can guarantee reproducible behaviour from the external systems.
          
            In Flyte, every task is versioned, and it precisely captures the dependency set. For external tasks, memoization is recommended so that the constructed dataset can be cached on the Flyte side. This way, one can guarantee reproducible behavior from the external systems.

SmritiSatyanV added 3 commits

May 2, 2022 17:50


          Changes based on review

2ef7b70

Signed-off-by: SmritiSatyanV <[email protected]>


          Merge branch 'restructure-getting-started' of https://github.com/flyt…

bd6afcf

…eorg/flyte into restructure-getting-started


          Updated flytepropeller

0769d99

Signed-off-by: SmritiSatyanV <[email protected]>

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Show resolved Hide resolved


          removed redundant line

a6880c7

Signed-off-by: SmritiSatyanV <[email protected]>

samhita-alla reviewed

View reviewed changes

rsts/concepts/versioning.rst Outdated

Comment on lines 36 to 37

		Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
		Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.

Contributor

samhita-alla May 4, 2022

Suggested change

      
            Now consider the scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
          
            Fixing bugs involves code changes, which may affect the workflow structure. Simply fixing the bug in the task may not solve the problem.
          
            Consider a scenario where there's a requirement to run all the previous executions if there's a bug that needs to be fixed.
          
            Simply fixing the bug in the task may not solve the problem.
          
            Moreover, fixing bugs involves code changes, which may affect the workflow structure.


          Updated versioning.rst

1d79258

Signed-off-by: SmritiSatyanV <[email protected]>

samhita-alla approved these changes

View reviewed changes

samhita-alla merged commit ae0a26c into master

samhita-alla deleted the restructure-getting-started branch

May 4, 2022 16:41

yindia pushed a commit that referenced this pull request


          Cleanup Concepts page (#2409)

2f8e6e6

* Updated index.rst

Signed-off-by: SmritiSatyanV <[email protected]>

* Cleanup

Signed-off-by: SmritiSatyanV <[email protected]>

* Changes based on review

Signed-off-by: SmritiSatyanV <[email protected]>

* Updated flytepropeller

Signed-off-by: SmritiSatyanV <[email protected]>

* removed redundant line

Signed-off-by: SmritiSatyanV <[email protected]>

* Updated versioning.rst

Signed-off-by: SmritiSatyanV <[email protected]>
Signed-off-by: Yuvraj <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet