From ca5256e9c2a7fc4501c201f4666b44304367972a Mon Sep 17 00:00:00 2001 From: nikki everett Date: Thu, 8 Feb 2024 16:24:29 -0600 Subject: [PATCH 1/5] move @pingsutw airflow migration doc from flytesnacks branch to flyte Signed-off-by: nikki everett --- .../index.md | 1 + docs/index.md | 1 + docs/migrating_to_flyte/index.md | 11 ++++ .../migrating_from_airflow_to_flyte.md | 66 +++++++++++++++++++ 4 files changed, 79 insertions(+) create mode 100644 docs/migrating_to_flyte/index.md create mode 100644 docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md diff --git a/docs/getting_started_with_workflow_development/index.md b/docs/getting_started_with_workflow_development/index.md index d7b39d7cee..e27705a465 100644 --- a/docs/getting_started_with_workflow_development/index.md +++ b/docs/getting_started_with_workflow_development/index.md @@ -1,4 +1,5 @@ (getting_started_workflow_development)= + # Getting started with workflow development Machine learning engineers, data engineers, and data analysts often represent the processes that consume, transform, and output data with directed acyclic graphs (DAGs). In this section, you will learn how to create a Flyte project to contain the workflow code that implements your DAG, as well as the configuration files needed to package the code to run on a local or remote Flyte cluster. diff --git a/docs/index.md b/docs/index.md index 3a8d38e6ba..9fd57bbc76 100644 --- a/docs/index.md +++ b/docs/index.md @@ -138,6 +138,7 @@ Introduction Quickstart guide Getting started with workflow development Flyte fundamentals +Migrating to Flyte Core use cases ``` diff --git a/docs/migrating_to_flyte/index.md b/docs/migrating_to_flyte/index.md new file mode 100644 index 0000000000..a9ae38ba18 --- /dev/null +++ b/docs/migrating_to_flyte/index.md @@ -0,0 +1,11 @@ +(migrating_to_flyte)= +# Migrating to Flyte + +TK + +```{toctree} +:maxdepth: -1 +:hidden: + +migrating_from_airflow_to_flyte +``` \ No newline at end of file diff --git a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md new file mode 100644 index 0000000000..7b741e25f7 --- /dev/null +++ b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md @@ -0,0 +1,66 @@ +(migrating_from_airflow_to_flyte)= + +# Migrating from Airflow to Flyte + +Flyte can compile Airflow tasks into Flyte tasks without changing code, which allows you +to migrate your Airflow DAGs to Flyte with minimal effort. This guide will walk you through +the process of migrating Airflow to Flyte. + +## Prerequisites + +- Install `flytekitplugins-airflow` in your python environment. +- Deploy an Airflow agent to your flyte cluster. + +## Use Airflow tasks inside Flyte workflow +flytekit compiles Airflow tasks into Flyte tasks under the hood, so you can use +any Airflow sensor or operator inside a Flyte workflow. + + +```python +from flytekit import task, workflow +from airflow.operators.bash import BashOperator + +@task +def say_hello() -> str: + return "Hello, World!" + +@workflow +def airflow_wf(): + flyte_task = say_hello() + airflow_task = BashOperator(task_id=f"airflow_bash_operator", bash_command="echo hello") + airflow_task >> flyte_task + +if __name__ == "__main__": + print(f"Running airflow_wf() {airflow_wf()}") +``` + +## Run your Airflow tasks locally +Although Airflow doesn't support local execution, you can run your Airflow tasks locally using Flyte. + +```bash +pyflyte run workflows.py airflow_wf +``` + +:::{warning} +Some Airflow operators may require certain permissions to execute. For instance, `DataprocCreateClusterOperator` requires the `dataproc.clusters.create` permission. +When running Airflow tasks locally, you may need to set up the necessary permissions locally for the task to execute successfully. +::: + +## Move to production +Airflow workflows can be executed on a Flyte cluster using the `--remote` flag. +In this case, Flyte creates a pod in the Kubernetes cluster to run `say_hello` task, and then run +your Airflow `BashOperator` on the Airflow agent. + +```bash +pyflyte run --remote workflows.py airflow_wf +``` + +## Configure Airflow connection +In the local execution, you can configure the [Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) by setting the `AIRFLOW_CONN_{CONN_ID}` environment variable. +For example, +```bash +export AIRFLOW_CONN_MY_PROD_DATABASE='my-conn-type://login:password@host:port/schema?param1=val1¶m2=val2' +``` + +In production, we recommend storing connections in a [secret Backend](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html). +Make sure agent pod has the right permission (IAM role) to access the secret from external secrets backends. \ No newline at end of file From ff76a03f165d642e54260f8372f752fbab7089d9 Mon Sep 17 00:00:00 2001 From: nikki everett Date: Tue, 20 Feb 2024 20:26:29 -0600 Subject: [PATCH 2/5] copy edits Signed-off-by: nikki everett --- docs/migrating_to_flyte/index.md | 8 ++- .../migrating_from_airflow_to_flyte.md | 58 +++++++++++-------- 2 files changed, 41 insertions(+), 25 deletions(-) diff --git a/docs/migrating_to_flyte/index.md b/docs/migrating_to_flyte/index.md index a9ae38ba18..b7003c3ba9 100644 --- a/docs/migrating_to_flyte/index.md +++ b/docs/migrating_to_flyte/index.md @@ -1,7 +1,13 @@ (migrating_to_flyte)= # Migrating to Flyte -TK +```{list-table} +:header-rows: 0 +:widths: 20 30 + +* - {doc}`Migrating from Airflow to Flyte ` + - Migrate your Airflow DAGs to Flyte with minimal effort. +``` ```{toctree} :maxdepth: -1 diff --git a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md index 7b741e25f7..54a68a844b 100644 --- a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md +++ b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md @@ -3,17 +3,22 @@ # Migrating from Airflow to Flyte Flyte can compile Airflow tasks into Flyte tasks without changing code, which allows you -to migrate your Airflow DAGs to Flyte with minimal effort. This guide will walk you through -the process of migrating Airflow to Flyte. +to migrate your Airflow DAGs to Flyte with minimal effort. To migrate to Flyte: + +1. [Complete the prerequisites](#prerequisites) +2. [Define your Airflow tasks in a Flyte workflow](#define-your-airflow-tasks-in-a-flyte-workflow) +3. [Test your workflow locally](#test-your-workflow-locally) +4. [Move your workflow to production](#move-your-workflow-to-production) ## Prerequisites -- Install `flytekitplugins-airflow` in your python environment. -- Deploy an Airflow agent to your flyte cluster. +- Install `flytekitplugins-airflow` in your Python environment. +- Enable an {ref}`Airflow agent` in your Flyte cluster. + +## Define your Airflow tasks in a Flyte workflow -## Use Airflow tasks inside Flyte workflow -flytekit compiles Airflow tasks into Flyte tasks under the hood, so you can use -any Airflow sensor or operator inside a Flyte workflow. +Flytekit compiles Airflow tasks into Flyte tasks, so you can use +any Airflow sensor or operator in a Flyte workflow. ```python @@ -34,8 +39,17 @@ if __name__ == "__main__": print(f"Running airflow_wf() {airflow_wf()}") ``` -## Run your Airflow tasks locally -Although Airflow doesn't support local execution, you can run your Airflow tasks locally using Flyte. +## Test your workflow locally + +:::{note} +Before running your workflow locally, you must configure the [Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) by setting the `AIRFLOW_CONN_{CONN_ID}` environment variable. +For example, +```bash +export AIRFLOW_CONN_MY_PROD_DATABASE='my-conn-type://login:password@host:port/schema?param1=val1¶m2=val2' +``` +::: + +Although Airflow doesn't support local execution, you can run your workflow that contains Airflow tasks locally, which is helpful for testing and debugging your tasks before moving to production. ```bash pyflyte run workflows.py airflow_wf @@ -43,24 +57,20 @@ pyflyte run workflows.py airflow_wf :::{warning} Some Airflow operators may require certain permissions to execute. For instance, `DataprocCreateClusterOperator` requires the `dataproc.clusters.create` permission. -When running Airflow tasks locally, you may need to set up the necessary permissions locally for the task to execute successfully. +When running Airflow tasks locally, you may need to set the necessary permissions locally for the task to execute successfully. ::: -## Move to production -Airflow workflows can be executed on a Flyte cluster using the `--remote` flag. -In this case, Flyte creates a pod in the Kubernetes cluster to run `say_hello` task, and then run -your Airflow `BashOperator` on the Airflow agent. +## Move your workflow to production -```bash -pyflyte run --remote workflows.py airflow_wf -``` +:::{note} +In production, we recommend storing connections in a [secrets backend](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html). +Make sure the agent pod has the right permission (IAM role) to access the secret from the external secrets backend. +::: + +After you have tested your workflow locally, you can execute it on a Flyte cluster using the `--remote` flag. +In this case, Flyte creates a pod in the Kubernetes cluster to run the `say_hello` task, and then runs +your Airflow `BashOperator` task on the Airflow agent. -## Configure Airflow connection -In the local execution, you can configure the [Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) by setting the `AIRFLOW_CONN_{CONN_ID}` environment variable. -For example, ```bash -export AIRFLOW_CONN_MY_PROD_DATABASE='my-conn-type://login:password@host:port/schema?param1=val1¶m2=val2' +pyflyte run --remote workflows.py airflow_wf ``` - -In production, we recommend storing connections in a [secret Backend](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html). -Make sure agent pod has the right permission (IAM role) to access the secret from external secrets backends. \ No newline at end of file From aacdee6a3fd98c0766d11ac5f2296aae2bc9028d Mon Sep 17 00:00:00 2001 From: nikki everett Date: Tue, 20 Feb 2024 20:42:00 -0600 Subject: [PATCH 3/5] more copy edits Signed-off-by: nikki everett --- .../migrating_from_airflow_to_flyte.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md index 54a68a844b..b3896aaaa4 100644 --- a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md +++ b/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md @@ -3,19 +3,16 @@ # Migrating from Airflow to Flyte Flyte can compile Airflow tasks into Flyte tasks without changing code, which allows you -to migrate your Airflow DAGs to Flyte with minimal effort. To migrate to Flyte: - -1. [Complete the prerequisites](#prerequisites) -2. [Define your Airflow tasks in a Flyte workflow](#define-your-airflow-tasks-in-a-flyte-workflow) -3. [Test your workflow locally](#test-your-workflow-locally) -4. [Move your workflow to production](#move-your-workflow-to-production) +to migrate your Airflow DAGs to Flyte with minimal effort. ## Prerequisites - Install `flytekitplugins-airflow` in your Python environment. -- Enable an {ref}`Airflow agent` in your Flyte cluster. +- Enable an {ref}`Airflow agent` in your Flyte cluster. + +## Steps -## Define your Airflow tasks in a Flyte workflow +### 1. Define your Airflow tasks in a Flyte workflow Flytekit compiles Airflow tasks into Flyte tasks, so you can use any Airflow sensor or operator in a Flyte workflow. @@ -39,7 +36,7 @@ if __name__ == "__main__": print(f"Running airflow_wf() {airflow_wf()}") ``` -## Test your workflow locally +### 2. Test your workflow locally :::{note} Before running your workflow locally, you must configure the [Airflow connection](https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html) by setting the `AIRFLOW_CONN_{CONN_ID}` environment variable. @@ -60,7 +57,7 @@ Some Airflow operators may require certain permissions to execute. For instance, When running Airflow tasks locally, you may need to set the necessary permissions locally for the task to execute successfully. ::: -## Move your workflow to production +### 3. Move your workflow to production :::{note} In production, we recommend storing connections in a [secrets backend](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html). From 734d82c03fb95b00fc90c25d3dae8363f6fd93fd Mon Sep 17 00:00:00 2001 From: nikki everett Date: Mon, 4 Mar 2024 13:22:40 -0600 Subject: [PATCH 4/5] move airflow migration guide to development lifecycle section Signed-off-by: nikki everett --- docs/index.md | 1 - docs/migrating_to_flyte/index.md | 17 ----------------- docs/user_guide/development_lifecycle/index.md | 1 + .../migrating_from_airflow_to_flyte.md | 0 4 files changed, 1 insertion(+), 18 deletions(-) delete mode 100644 docs/migrating_to_flyte/index.md rename docs/{migrating_to_flyte => user_guide/development_lifecycle}/migrating_from_airflow_to_flyte.md (100%) diff --git a/docs/index.md b/docs/index.md index be75706a96..4720be51f7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -139,7 +139,6 @@ Quickstart guide Getting started with workflow development Flyte fundamentals Flyte agents -Migrating to Flyte Core use cases ``` diff --git a/docs/migrating_to_flyte/index.md b/docs/migrating_to_flyte/index.md deleted file mode 100644 index b7003c3ba9..0000000000 --- a/docs/migrating_to_flyte/index.md +++ /dev/null @@ -1,17 +0,0 @@ -(migrating_to_flyte)= -# Migrating to Flyte - -```{list-table} -:header-rows: 0 -:widths: 20 30 - -* - {doc}`Migrating from Airflow to Flyte ` - - Migrate your Airflow DAGs to Flyte with minimal effort. -``` - -```{toctree} -:maxdepth: -1 -:hidden: - -migrating_from_airflow_to_flyte -``` \ No newline at end of file diff --git a/docs/user_guide/development_lifecycle/index.md b/docs/user_guide/development_lifecycle/index.md index 8c21abc291..6da72983b9 100644 --- a/docs/user_guide/development_lifecycle/index.md +++ b/docs/user_guide/development_lifecycle/index.md @@ -20,4 +20,5 @@ running_workflows running_launch_plans inspecting_executions debugging_executions +migrating_from_airflow_to_flyte ``` diff --git a/docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md b/docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md similarity index 100% rename from docs/migrating_to_flyte/migrating_from_airflow_to_flyte.md rename to docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md From 66af8ce2d58a8d72e3fa854f4b302f9e4776bca2 Mon Sep 17 00:00:00 2001 From: nikki everett Date: Tue, 5 Mar 2024 19:23:21 -0600 Subject: [PATCH 5/5] copy changes from docs/migration-guides-ping Signed-off-by: nikki everett --- .../migrating_from_airflow_to_flyte.md | 24 +++++++++++++++---- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md b/docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md index b3896aaaa4..42f2f93ccd 100644 --- a/docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md +++ b/docs/user_guide/development_lifecycle/migrating_from_airflow_to_flyte.md @@ -1,10 +1,24 @@ (migrating_from_airflow_to_flyte)= - # Migrating from Airflow to Flyte +:::{important} +Many Airflow operators and sensors have been tested on Flyte, but some may not work as expected. +If you encounter any issues, please file an [issue](https://github.com/flyteorg/flyte/issues) or reach out to the Flyte community on [Slack](https://slack.flyte.org/). +::: + Flyte can compile Airflow tasks into Flyte tasks without changing code, which allows you to migrate your Airflow DAGs to Flyte with minimal effort. +In addition to migration capabilities, Flyte users can seamlessly integrate Airflow tasks into their workflows, leveraging the ecosystem of Airflow operators and sensors. +By combining the robust Airflow ecosystem with Flyte's capabilities such as scalability, versioning, and reproducibility, users can run more complex data and machine learning workflows with ease. +For more information, see the [Airflow agent documentation](https://docs.flyte.org/en/latest/flytesnacks/examples/airflow_agent/index.html). + +# For current Flyte users + +Even if you're already using Flyte and have no intentions of migrating from Airflow, +you can still incorporate Airflow tasks into your Flyte workflows. For instance, Airflow offers support +for Google Cloud [Dataproc Operators](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/dataproc.html), facilitating the execution of Spark jobs on Google Cloud Dataproc clusters. Rather than developing a custom plugin in Flyte, you can seamlessly integrate Airflow's Dataproc Operators into your Flyte workflows to execute Spark jobs. + ## Prerequisites - Install `flytekitplugins-airflow` in your Python environment. @@ -15,12 +29,12 @@ to migrate your Airflow DAGs to Flyte with minimal effort. ### 1. Define your Airflow tasks in a Flyte workflow Flytekit compiles Airflow tasks into Flyte tasks, so you can use -any Airflow sensor or operator in a Flyte workflow. +any Airflow sensor or operator in a Flyte workflow: ```python from flytekit import task, workflow -from airflow.operators.bash import BashOperator +from airflow.sensors.filesystem import FileSensor @task def say_hello() -> str: @@ -29,7 +43,7 @@ def say_hello() -> str: @workflow def airflow_wf(): flyte_task = say_hello() - airflow_task = BashOperator(task_id=f"airflow_bash_operator", bash_command="echo hello") + airflow_task = FileSensor(task_id="sensor", filepath="/") airflow_task >> flyte_task if __name__ == "__main__": @@ -49,7 +63,7 @@ export AIRFLOW_CONN_MY_PROD_DATABASE='my-conn-type://login:password@host:port/sc Although Airflow doesn't support local execution, you can run your workflow that contains Airflow tasks locally, which is helpful for testing and debugging your tasks before moving to production. ```bash -pyflyte run workflows.py airflow_wf +AIRFLOW_CONN_FS_DEFAULT="/" pyflyte run workflows.py airflow_wf ``` :::{warning}