From 9cbd3a2a0abc0a3978460bc0eb4eb1c3e01991e0 Mon Sep 17 00:00:00 2001 From: Samhita Alla Date: Fri, 4 Aug 2023 23:47:19 +0530 Subject: [PATCH] update the plugin setup guide (#3915) * update the plugin setup guide Signed-off-by: Samhita Alla * nit Signed-off-by: Samhita Alla * snowflake docs link Signed-off-by: Samhita Alla * content clean-up Signed-off-by: Samhita Alla * update secrets doc Signed-off-by: Samhita Alla --------- Signed-off-by: Samhita Alla --- doc-requirements.in | 2 - rsts/deployment/plugins/aws/athena.rst | 132 ++- rsts/deployment/plugins/aws/batch.rst | 218 ++-- rsts/deployment/plugins/aws/index.rst | 15 +- rsts/deployment/plugins/aws/sagemaker.rst | 151 ++- rsts/deployment/plugins/gcp/bigquery.rst | 147 ++- rsts/deployment/plugins/gcp/index.rst | 14 +- rsts/deployment/plugins/index.rst | 10 +- rsts/deployment/plugins/k8s/index.rst | 961 ++++++++++-------- rsts/deployment/plugins/webapi/databricks.rst | 460 +++++++-- rsts/deployment/plugins/webapi/index.rst | 14 +- rsts/deployment/plugins/webapi/snowflake.rst | 329 +++--- 12 files changed, 1416 insertions(+), 1037 deletions(-) diff --git a/doc-requirements.in b/doc-requirements.in index f0a345e5c6..58c9af9d6f 100644 --- a/doc-requirements.in +++ b/doc-requirements.in @@ -13,5 +13,3 @@ sphinxcontrib-video sphinxcontrib-youtube sphinx-tabs sphinx-tags -grpcio<1.49.0 -grpcio-status<1.49.0 diff --git a/rsts/deployment/plugins/aws/athena.rst b/rsts/deployment/plugins/aws/athena.rst index 9c88c0ef71..34edafc4bd 100644 --- a/rsts/deployment/plugins/aws/athena.rst +++ b/rsts/deployment/plugins/aws/athena.rst @@ -1,93 +1,87 @@ .. _deployment-plugin-setup-aws-athena: -Athena Plugin Setup -------------------- +Athena Plugin +============= -This guide gives an overview of how to set up Athena in your Flyte deployment. Athena plugin needs Flyte deployment in AWS cloud; sandbox/GCP/Azure wouldn't work. +This guide provides an overview of setting up Athena in your Flyte deployment. -Setup the AWS Flyte cluster -=========================== +.. note:: + Please note that the Athena plugin requires a Flyte deployment in the AWS cloud; it won't work with demo/GCP/Azure. + +Set up the AWS Flyte cluster +---------------------------- + +1. Ensure you have a functional Flyte cluster up and running in `AWS `__ +2. Verify that you possess the correct ``kubeconfig`` and have selected the appropriate Kubernetes context +3. Double-check that your ``~/.flyte/config.yaml`` file contains the correct Flytectl configuration + +Specify plugin configuration +---------------------------- .. tabs:: - .. tab:: AWS cluster setup + .. group-tab:: Flyte binary - * Make sure you have up and running flyte cluster in `AWS `__ - * Make sure you have correct kubeconfig and selected the correct kubernetes context - * make sure you have the correct FlyteCTL config at ~/.flyte/config.yaml + Edit the relevant YAML file to specify the plugin. + .. code-block:: yaml + :emphasize-lines: 7,11 -Specify Plugin Configuration -====================================== + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - athena + default-for-task-types: + - container: container + - container_array: k8s-array + - athena: athena -Create a file named ``values-override.yaml`` and add the following config to it. -Please make sure that the propeller has the correct service account for Athena. + .. group-tab:: Flyte core -.. code-block:: yaml + Create a file named ``values-override.yaml`` and include the following configuration: - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - athena - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - athena: athena + .. code-block:: yaml -Upgrade the Flyte Helm release -============================== + configmap: + enabled_plugins: + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - athena + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + athena: athena -.. prompt:: bash $ +Ensure that the propeller has the correct service account for Athena. - helm upgrade -n flyte -f values-override.yaml flyteorg/flyte-core +Upgrade the Flyte Helm release +------------------------------ +.. tabs:: -Register the Athena plugin example -================================== + .. group-tab:: Flyte binary -.. code-block:: bash + .. code-block:: bash - flytectl register files https://github.com/flyteorg/flytesnacks/releases/download/v0.2.226/snacks-cookbook-integrations-aws-athena.tar.gz --archive -p flytesnacks -d development + helm upgrade flyteorg/flyte-binary -n --values + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. -Launch an execution -=================== + .. group-tab:: Flyte core -.. tabs:: + .. code-block:: bash + + helm upgrade flyte/flyte-core -n --values values-override.yaml - .. tab:: Flyte Console - - * Navigate to Flyte Console's UI (e.g. `sandbox `_) and find the workflow. - * Click on `Launch` to open up the launch form. - * Submit the form. - - .. tab:: Flytectl - - Retrieve an execution form in the form of a YAML file: - - .. prompt:: bash $ - - flytectl get launchplan \ - --config ~/.flyte/flytectl.yaml \ - --project flytesnacks \ - --domain development \ - athena.athena.full_athena_wf - --latest \ - --execFile exec_spec.yaml - - Launch! 🚀 - - .. prompt:: bash $ - - flytectl --config ~/.flyte/flytectl.yaml create execution \ - -p -d --execFile ./exec_spec.yaml + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). diff --git a/rsts/deployment/plugins/aws/batch.rst b/rsts/deployment/plugins/aws/batch.rst index 5179744d12..ba6d069b74 100644 --- a/rsts/deployment/plugins/aws/batch.rst +++ b/rsts/deployment/plugins/aws/batch.rst @@ -1,81 +1,85 @@ .. _deployment-plugin-setup-aws-array: -AWS Batch Setup ---------------- +AWS Batch +========= This setup document applies to both :py:func:`map tasks ` and single tasks running on AWS Batch. .. note:: - For single [non-map] task use, please take note of the additional code when - updating Propeller config. + For single [non-map] task use, please take note of + the additional code when updating the flytepropeller config. -AWS Batch enables developers, scientists, and engineers to easily and efficiently -run hundreds of thousands of batch computing jobs on AWS. +AWS Batch simplifies the process for developers, scientists and engineers to run +hundreds of thousands of batch computing jobs on AWS. -Flyte abstracts away the complexity of integrating AWS Batch into users' -workflows. It takes care of packaging inputs, reading outputs, scheduling map -tasks, leveraging AWS Batch Job Queues to distribute the load and coordinate -priorities. +Flyte abstracts away the complexity of integrating AWS Batch into users' workflows, +taking care of packaging inputs, reading outputs, scheduling map tasks and +optimizing AWS Batch job queues for load distribution and priority coordination. -Set-up AWS Batch -================ +Set up AWS Batch +---------------- -Follow the guide `Running batch jobs at scale for less `_. +Follow the guide `Running batch jobs +at scale for less `__. -At the end of this step, the AWS Account should have a configured compute -environment and one or more AWS Batch Job Queues. +By the end of this step, your AWS Account should have a configured compute environment +and one or more AWS Batch Job Queues. -Modify Users' AWS IAM Role Trust Policy Document -================================================ +Modify users' AWS IAM role trust policy document +------------------------------------------------ -Follow the guide `AWS Batch Execution IAM role `_. +Follow the guide `AWS Batch Execution +IAM role `__. -When running Workflows in Flyte, users have the option to specify a K8s Account -and/or an IAM Role to run as. For AWS Batch, an IAM Role must be specified. -For every one of these IAM Roles, modify the trust policy to allow ECS to assume -the role. +When running workflows in Flyte, users can specify a Kubernetes service account and/or an IAM Role to run as. +For AWS Batch, an IAM Role must be specified. For each of these IAM Roles, modify the trust policy +to allow elastic container service (ECS) to assume the role. -Modify System's AWS IAM Role policies -===================================== +Modify system's AWS IAM role policies +------------------------------------- -Follow the guide: `Granting a user permissions to pass a role to an AWS service `_. +Follow the guide `Granting a user permissions to pass a +role to an AWS service `__. -The recommended way of assigning permissions to flyte components is using -`OIDC `_. -This involves assigning an IAM Role for every service account used. You will -need to find the IAM Role assigned to the flytepropeller's kubernetes service account. -Then modify the policy document to allow the role to pass other roles to AWS Batch. +The best practice for granting permissions to Flyte components is by utilizing OIDC, +as described in the +`OIDC documentation `__. +This approach entails assigning an IAM Role to each service account being used. +To proceed, identify the IAM Role associated with the flytepropeller's Kubernetes service account, +and subsequently, modify the policy document to enable the role to pass other roles to AWS Batch. -Update FlyteAdmin Config -======================== +Update FlyteAdmin configuration +------------------------------- -FlyteAdmin needs to be made aware of all the AWS Batch Job Queues and how the system should distribute the load onto them. -The simplest setup looks something like this: +FlyteAdmin must be informed of all the AWS Batch job queues +and how the system should distribute the load among them. +The simplest setup is as follows: .. code-block:: yaml - flyteadmin: - roleNameKey: "eks.amazonaws.com/role-arn" - queues: - # A list of items, one per AWS Batch Job Queue. - executionQueues: - # The name of the job queue from AWS Batch - - dynamic: "tutorial" - # A list of tags/attributes that can be used to match workflows to this queue. - attributes: - - default - # A list of configs to match project and/or domain and/or workflows to job queues using tags. - workflowConfigs: - # An empty rule to match any workflow to the queue tagged as "default" - - tags: - - default - -If you are using Helm, this block can be added under `configMaps.adminServer` section `here `_. - -An example of a more complex matching config below defines 3 different queues with separate attributes and matching -logic based on project/domain/workflowName. + flyteadmin: + roleNameKey: "eks.amazonaws.com/role-arn" + queues: + # A list of items, one per AWS Batch Job Queue. + executionQueues: + # The name of the job queue from AWS Batch + - dynamic: "tutorial" + # A list of tags/attributes that can be used to match workflows to this queue. + attributes: + - default + # A list of configs to match project and/or domain and/or workflows to job queues using tags. + workflowConfigs: + # An empty rule to match any workflow to the queue tagged as "default" + - tags: + - default + +If you are using Helm, you can add this block under the ``configMaps.adminServer`` section, +as shown `here `__. + +For a more complex matching configuration, the example below defines three different queues +with distinct attributes and matching logic based on project/domain/workflowName. .. code-block:: yaml @@ -108,84 +112,54 @@ logic based on project/domain/workflowName. - tags: - default -These settings can also be dynamically altered through `flytectl` (or flyteAdmin API). -Read about the :ref:`core concept here `. Then visit :ref:`flytectl docs ` for a guide on how to dynamically -update these configs. +These settings can also be dynamically altered through ``flytectl`` (or FlyteAdmin API). +Learn about the :ref:`core concept here `. +For guidance on how to dynamically update these configurations, refer to the :ref:`Flytectl docs `. -Update Flyte Propeller's Config -=============================== +Update FlytePropeller's configuration +------------------------------------- -AWS Array Plugin requires some configurations to correctly communicate with AWS Batch Service. +The AWS Array Plugin requires specific configurations to ensure proper communication with the AWS Batch Service. -These configurations live within flytepropeller's configMap. The config should be modified to set the following keys: +These configurations reside within FlytePropeller's configMap. Modify the config in the relevant YAML file to set the following keys: .. code-block:: yaml - plugins: - aws: - batch: - # Must match that set in flyteAdmin's configMap flyteadmin.roleNameKey - roleAnnotationKey: eks.amazonaws.com/role-arn - # Must match the desired region to launch these tasks. - region: us-east-2 - tasks: - task-plugins: - enabled-plugins: - # Enable aws_array task plugin. - - aws_array - default-for-task-types: - # Set it as the default handler for array/map tasks. - container_array: aws_array - # Make sure to add this line to enable single (non-map) AWS Batch tasks - aws-batch: aws_array - -Launch an Execution on AWS Batch -================================ - -Follow `this guide `_ to -write a workflow with a Map Task. + plugins: + aws: + batch: + # Must match that set in flyteAdmin's configMap flyteadmin.roleNameKey + roleAnnotationKey: eks.amazonaws.com/role-arn + # Must match the desired region to launch these tasks. + region: us-east-2 + tasks: + task-plugins: + enabled-plugins: + # Enable aws_array task plugin. + - aws_array + default-for-task-types: + # Set it as the default handler for array/map tasks. + container_array: aws_array + # Make sure to add this line to enable single (non-map) AWS Batch tasks + aws-batch: aws_array -Serialize and Register the workflow/task to a Flyte backend, then launch an -execution either on Flyte Console or with ``flytectl``: - -.. tabs:: - - .. tab:: Flyte Console - - * Navigate to Flyte Console's UI (e.g. `sandbox `_) and find the workflow. - * Click on `Launch` to open up the launch form. - * Select `IAM Role` and enter the full `AWS Arn` of an IAM Role configured according to the above guide. - * Submit the form. - - .. tab:: Flytectl - - Retrieve an execution form in the form of a yaml file: - - .. code-block:: bash - - flytectl --config ~/.flyte/flytectl.yaml get launchplan \ - -p -d \ - --version --execFile ./map_wf.yaml - - Fill in `iamRole` field (and optionally `kubeServiceAcct` if required in - the deployment), then launch an execution: - - .. code-block:: bash +.. note:: - flytectl --config ~/.flyte/flytectl.yaml create execution \ - -p -d \ - --execFile ./map_wf.yaml + To register the `map task + `__ on Flyte, + use the command ``pyflyte register ``. + Launch the execution through the FlyteConsole by selecting the appropriate ``IAM Role`` and entering the full + ``AWS Arn`` of an IAM Role configured according to the above guide. -As soon as the task starts executing, a link for the AWS Array Job will appear in -the log links section in Flyte Console. As individual jobs start getting scheduled, -links to their individual cloudWatch log streams will also appear in the UI. + Once the task starts executing, you'll find a link for the AWS Array Job in the log links section of the Flyte Console. + As individual jobs start getting scheduled, links to their respective CloudWatch log streams will also appear in the UI. -.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/deployment/aws_plugin_setup/map_task_success.png - :alt: A screenshot of Flyte Console displaying log links for a successful array job. + .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/deployment/aws_plugin_setup/map_task_success.png + :alt: A screenshot of Flyte Console displaying log links for a successful array job. -A screenshot of Flyte Console displaying log links for a successful array job. + *A screenshot of Flyte Console displaying log links for a successful array job.* -.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/deployment/aws_plugin_setup/map_task_failure.png - :alt: A screenshot of Flyte Console displaying log links for a failed array job. + .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/flyte/deployment/aws_plugin_setup/map_task_failure.png + :alt: A screenshot of Flyte Console displaying log links for a failed array job. -A screenshot of Flyte Console displaying log links for a failed array job. + *A screenshot of Flyte Console displaying log links for a failed array job.* diff --git a/rsts/deployment/plugins/aws/index.rst b/rsts/deployment/plugins/aws/index.rst index 54d673dcdd..5a7206c27e 100644 --- a/rsts/deployment/plugins/aws/index.rst +++ b/rsts/deployment/plugins/aws/index.rst @@ -1,12 +1,11 @@ .. _deployment-plugin-setup-aws: -################# -AWS Plugins Setup -################# +Configure AWS Plugins +===================== .. tags:: AWS, Integration, MachineLearning, Data, Advanced -Learn how to set up AWS plugins for Flyte. +Discover the process of setting up AWS plugins for Flyte. .. panels:: :header: text-center @@ -18,7 +17,7 @@ Learn how to set up AWS plugins for Flyte. :text: AWS Batch :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the AWS Batch Plugin. + Guide to setting up the AWS Batch plugin. --- @@ -27,7 +26,7 @@ Learn how to set up AWS plugins for Flyte. :text: AWS Athena :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the AWS Athena Plugin. + Guide to setting up the AWS Athena plugin. --- @@ -36,11 +35,11 @@ Learn how to set up AWS plugins for Flyte. :text: AWS Sagemaker :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the AWS Sagemaker Plugin. + Guide to setting up the AWS Sagemaker plugin. .. toctree:: :maxdepth: 1 - :name: AWS plugin Setup + :name: AWS plugin setup :hidden: batch diff --git a/rsts/deployment/plugins/aws/sagemaker.rst b/rsts/deployment/plugins/aws/sagemaker.rst index 6411ee6c2b..9401920901 100644 --- a/rsts/deployment/plugins/aws/sagemaker.rst +++ b/rsts/deployment/plugins/aws/sagemaker.rst @@ -1,98 +1,95 @@ .. _deployment-plugin-setup-aws-sagemaker: -Sagemaker Plugin Setup ----------------------- +Sagemaker Plugin +================ -This guide gives an overview of how to set up Sagemaker in your Flyte deployment. +This guide provides an overview of setting up Sagemaker in your Flyte deployment. .. note:: - The Sagemaker plugin needs Flyte deployment in AWS cloud; sandbox/GCP/Azure - won't work. + The Sagemaker plugin requires Flyte deployment in the AWS cloud; + it is not compatible with demo/GCP/Azure. -Setup the AWS Flyte cluster -=========================== +Set up AWS Flyte cluster +------------------------ -.. tabs:: - - .. tab:: AWS cluster setup - - * Make sure you have up and running flyte cluster in `AWS `__ - * You have your `AWS role set up correctly for SageMaker `_ - * `AWS SageMaker k8s operator `_ is installed in your k8s cluster - * Make sure you have correct kubeconfig and selected the correct kubernetes context - * make sure you have the correct FlyteCTL config at ~/.flyte/config.yaml - -Specify Plugin Configuration -====================================== - -Create a file named ``values-override.yaml`` and add the following config to it. -Please make sure that the propeller has the correct service account for Sagemaker. - -.. code-block:: yaml - - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - sagemaker_training - - sagemaker_hyperparameter_tuning - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array +* Ensure you have a functional Flyte cluster running in `AWS `__. +* Verify that your AWS role is set up correctly for `SageMaker `__. +* Install the `AWS SageMaker k8s operator `__ in your Kubernetes cluster. +* Confirm that you have the correct kubeconfig and have selected the appropriate Kubernetes context. +* Verify the presence of the correct Flytectl configuration at ``~/.flyte/config.yaml``. -Upgrade the Flyte Helm release -============================== +Specify the plugin configuration +-------------------------------- -.. prompt:: bash $ +.. tabs:: - helm upgrade -n flyte -f values-override.yaml flyteorg/flyte-core + .. tab:: Flyte binary + + Edit the relevant YAML file to specify the plugin. + + .. code-block:: yaml + :emphasize-lines: 7,8 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - sagemaker_training + - sagemaker_hyperparameter_tuning + default-for-task-types: + - container: container + - container_array: k8s-array + + .. tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following configuration to it. + + .. code-block:: yaml + + configmap: + enabled_plugins: + # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) + tasks: + # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) + task-plugins: + # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). + # plugins + enabled-plugins: + - container + - sidecar + - k8s-array + - sagemaker_training + - sagemaker_hyperparameter_tuning + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + +Ensure that the propeller has the correct service account for Sagemaker. +Upgrade the Flyte Helm release +------------------------------ -Register the Sagemaker plugin example -===================================== +.. tabs:: -.. prompt:: bash $ + .. group-tab:: Flyte binary - flytectl register files https://github.com/flyteorg/flytesnacks/releases/download/v0.3.0/snacks-cookbook-integrations-aws-sagemaker_training.tar.gz --archive -p flytesnacks -d development + .. code-block:: bash + helm upgrade flyteorg/flyte-binary -n --values -Launch an execution -=================== + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. -.. tabs:: + .. group-tab:: Flyte core - .. tab:: Flyte Console + .. code-block:: bash - * Navigate to Flyte Console's UI (e.g. `sandbox `_) and find the workflow. - * Click on `Launch` to open up the launch form. - * Submit the form. + helm upgrade flyte/flyte-core -n --values values-override.yaml - .. tab:: Flytectl - - Retrieve an execution form in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/flytectl.yaml \ - --project flytesnacks \ - --domain development \ - sagemaker_training.sagemaker_custom_training.mnist_trainer \ - --latest \ - --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/flytectl.yaml create execution \ - -p -d --execFile ~/exec_spec.yaml + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). diff --git a/rsts/deployment/plugins/gcp/bigquery.rst b/rsts/deployment/plugins/gcp/bigquery.rst index 75ada75b29..03b21e02e1 100644 --- a/rsts/deployment/plugins/gcp/bigquery.rst +++ b/rsts/deployment/plugins/gcp/bigquery.rst @@ -1,93 +1,90 @@ .. _deployment-plugin-setup-gcp-bigquery: -Google Bigquery Plugin Setup +Google BigQuery Plugin +====================== + +This guide provides an overview of setting up BigQuery in your Flyte deployment. +Please note that the BigQuery plugin requires Flyte deployment in the GCP cloud; +it is not compatible with demo/AWS/Azure. + +Set up the GCP Flyte cluster ---------------------------- -This guide gives an overview of how to set up BigQuery in your Flyte deployment. -BigQuery plugin needs Flyte deployment in GCP cloud; sandbox/AWS/Azure wouldn't -work. +* Ensure you have a functional Flyte cluster running in `GCP `__. +* Create a service account for BigQuery. For more details, refer to: https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries. +* Verify that you have the correct kubeconfig and have selected the appropriate Kubernetes context. +* Confirm that you have the correct Flytectl configuration at ``~/.flyte/config.yaml``. -Setup the GCP Flyte cluster -=========================== +Specify plugin configuration +---------------------------- .. tabs:: - .. tab:: GCP cluster setup - - * Make sure you have up and running flyte cluster in `GCP `__ - * Create a service account for BigQuery. More detail: https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries - * Make sure you have correct kubeconfig and selected the correct kubernetes context - * make sure you have the correct FlyteCTL config at ~/.flyte/config.yaml - -Specify Plugin Configuration -============================ - -Create a file named ``values-override.yaml`` and add the following config to it. -Please make sure that the propeller has the correct service account for BigQuery. - -.. code-block:: yaml - - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - bigquery - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - bigquery_query_job_task: bigquery + .. group-tab:: Flyte binary + + Edit the relevant YAML file to specify the plugin. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - bigquery + default-for-task-types: + - container: container + - container_array: k8s-array + - bigquery_query_job_task: bigquery + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following configuration to it. + + .. code-block:: yaml + + configmap: + enabled_plugins: + # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) + tasks: + # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) + task-plugins: + # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend + enabled-plugins: + - container + - sidecar + - k8s-array + - bigquery + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + bigquery_query_job_task: bigquery + +Ensure that the propeller has the correct service account for BigQuery. Upgrade the Flyte Helm release -============================== +------------------------------ -.. prompt:: bash $ +.. tabs:: - helm upgrade -n flyte -f values-override.yaml flyteorg/flyte-core + .. group-tab:: Flyte binary -Register the BigQuery plugin example -==================================== + .. code-block:: bash -.. prompt:: bash $ + helm upgrade flyteorg/flyte-binary -n --values - flytectl register files https://github.com/flyteorg/flytesnacks/releases/download/v0.2.226/snacks-cookbook-integrations-gcp-bigquery.tar.gz --archive -p flytesnacks -d development + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. -Launch an execution -=================== + .. group-tab:: Flyte core -.. tabs:: + .. code-block:: bash + + helm upgrade flyte/flyte-core -n --values values-override.yaml - .. tab:: Flyte Console - - * Navigate to the Flyte Console's UI (e.g. `sandbox `_) and find the relevant workflow - * Click on `Launch` to open up a launch form - * Submit the form to launch an execution - - .. tab:: FlyteCTL - - Retrieve an execution form in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/flytectl.yaml \ - --project flytesnacks \ - --domain development \ - bigquery.bigquery.full_bigquery_wf \ - --latest --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/flytectl.yaml create execution \ - -p flytesnacks \ - -d development \ - --execFile ./exec_spec.yaml + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). diff --git a/rsts/deployment/plugins/gcp/index.rst b/rsts/deployment/plugins/gcp/index.rst index f4f00fa907..2fcd827ed6 100644 --- a/rsts/deployment/plugins/gcp/index.rst +++ b/rsts/deployment/plugins/gcp/index.rst @@ -1,26 +1,26 @@ .. _deployment-plugin-setup-gcp: -################# -GCP Plugins Setup -################# +Configure GCP Plugins +===================== .. tags:: GCP, Integration, Data, Advanced +Discover the process of setting up GCP plugins for Flyte. + .. panels:: :header: text-center :column: col-lg-12 p-2 - .. link-button:: deployment-plugin-setup-gcp-bigquery :type: ref - :text: Google Bigquery + :text: Google BigQuery :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the Google BigQuery Plugin. + Guide to setting up the Google BigQuery plugin. .. toctree:: :maxdepth: 1 - :name: GCP plugin Setup + :name: GCP plugin setup :hidden: bigquery diff --git a/rsts/deployment/plugins/index.rst b/rsts/deployment/plugins/index.rst index e4157e4734..d83bd79c54 100644 --- a/rsts/deployment/plugins/index.rst +++ b/rsts/deployment/plugins/index.rst @@ -1,15 +1,13 @@ .. _deployment-plugin-setup: -############ Plugin Setup -############ +============ -Flyte integrates with a wide variety of [data, ML, and analytical tools](https://flyte.org/integrations). -Some of these plugins, like the Databricks, Kubeflow, and Ray integrations, require -the Flyte cluster administrator to enable them. +Flyte integrates with a wide variety of `data, ML and analytical tools `__. +Some of these plugins, such as Databricks, Kubeflow, and Ray integrations, require the Flyte cluster administrator to enable them. This section of the *Deployment Guides* will cover how to configure your cluster -to use these plugins in your workflows written in `flytekit`. +to use these plugins in your workflows written in ``flytekit``. .. panels:: diff --git a/rsts/deployment/plugins/k8s/index.rst b/rsts/deployment/plugins/k8s/index.rst index f9622b7ea2..7a0bd22eab 100644 --- a/rsts/deployment/plugins/k8s/index.rst +++ b/rsts/deployment/plugins/k8s/index.rst @@ -1,73 +1,292 @@ .. _deployment-plugin-setup-k8s: -K8s Plugins ------------------------------------------ +Configure Kubernetes Plugins +============================ -.. tags:: Kubernetes, Integration, KubernetesOperator, Spark, AWS, GCP, MachineLearning, DistributedComputing, Advanced +.. tags:: Kubernetes, Integration, Spark, AWS, GCP, Advanced -This guide gives an overview of setting up the K8s Operator backend plugin in your Flyte deployment. +This guide provides an overview of setting up the Kubernetes Operator backend plugin in your Flyte deployment. -Add Flyte chart repo to Helm. +Spin up a cluster +----------------- -.. prompt:: bash $ +.. tabs:: - helm repo add flyteorg https://flyteorg.github.io/flyte + .. group-tab:: Flyte binary -Set up the cluster -================== + .. tabs:: -.. tabs:: + .. group-tab:: Demo cluster - .. tab:: Sandbox - - Start the sandbox cluster: - - .. prompt:: bash $ - - flytectl demo start - - Generate flytectl config: - - .. prompt:: bash $ - - flytectl config init - - .. tab:: AWS/GCP + .. tabs:: + + .. group-tab:: PyTorch + + Enable the PyTorch plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + pytorch: pytorch + enabled-plugins: + - container + - k8s-array + - sidecar + - pytorch + + .. group-tab:: TensorFlow + + Enable the TensorFlow plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + tensorflow: tensorflow + enabled-plugins: + - container + - k8s-array + - sidecar + - tensorflow + + .. group-tab:: MPI + + Enable the MPI plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + mpi: mpi + enabled-plugins: + - container + - k8s-array + - sidecar + - mpi + + .. group-tab:: Ray + + Enable the Ray plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + ray: ray + enabled-plugins: + - container + - k8s-array + - sidecar + - ray + + .. group-tab:: Spark + + Enable the Spark plugin on the demo cluster by adding the following config to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + spark: spark + enabled-plugins: + - container + - sidecar + - k8s-array + - spark + plugins: + spark: + spark-config-default: + - spark.driver.cores: "1" + - spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" + - spark.hadoop.fs.s3a.endpoint: "http://minio.flyte:9000" + - spark.hadoop.fs.s3a.access.key: "minio" + - spark.hadoop.fs.s3a.secret.key: "miniostorage" + - spark.hadoop.fs.s3a.path.style.access: "true" + - spark.kubernetes.allocation.batch.size: "50" + - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl" + - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" + - spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A" + - spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" + - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A" + - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" + - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A" + cluster_resources: + refreshInterval: 5m + customData: + - production: + - projectQuotaCpu: + value: "5" + - projectQuotaMemory: + value: "4000Mi" + - staging: + - projectQuotaCpu: + value: "2" + - projectQuotaMemory: + value: "3000Mi" + - development: + - projectQuotaCpu: + value: "4" + - projectQuotaMemory: + value: "5000Mi" + refresh: 5m + + Also add the following cluster resource templates to the ``~/.flyte/sandbox/cluster-resource-templates`` directory: + + 1. ``serviceaccount.yaml`` + + .. code-block:: yaml + + apiVersion: v1 + kind: ServiceAccount + metadata: + name: default + namespace: "{{ namespace }}" + annotations: + eks.amazonaws.com/role-arn: "{{ defaultIamRole }}" + + 2. ``spark_role.yaml`` + + .. code-block:: yaml + + apiVersion: rbac.authorization.k8s.io/v1 + kind: Role + metadata: + name: spark-role + namespace: "{{ namespace }}" + rules: + - apiGroups: + - "" + resources: + - pods + - services + - configmaps + verbs: + - "*" + + 3. ``spark_service_account.yaml`` + + .. code-block:: yaml + + apiVersion: v1 + kind: ServiceAccount + metadata: + name: spark + namespace: "{{ namespace }}" + annotations: + eks.amazonaws.com/role-arn: "{{ defaultIamRole }}" + + 4. ``spark_role_binding.yaml`` + + .. code-block:: yaml + + apiVersion: rbac.authorization.k8s.io/v1 + kind: RoleBinding + metadata: + name: spark-role-binding + namespace: "{{ namespace }}" + roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: spark-role + subjects: + - kind: ServiceAccount + name: spark + namespace: "{{ namespace }}" + + .. group-tab:: Dask + + Enable the Dask plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + dask: dask + enabled-plugins: + - container + - k8s-array + - sidecar + - dask + + Start the demo cluster by running the following command: + + .. code-block:: bash + + flytectl demo start + + .. group-tab:: Helm chart + + Install Flyte using the :ref:`flyte-binary helm chart `. - Make sure you have: + .. group-tab:: Flyte core - * A flyte cluster up and running in `AWS `__ / `GCP `__. - * The right ``kubeconfig`` and Kubernetes context. - * The right ``flytectl`` config at ``~/.flyte/config.yaml``. + If you hae installed Flyte using the `flyte-core helm chart + `__, please ensure: + + * You have the correct kubeconfig and have selected the correct Kubernetes context. + * You have configured the correct flytectl settings in ``~/.flyte/config.yaml``. + +.. note:: + + Add the Flyte chart repo to Helm if you're installing via the Helm charts. + .. code-block:: bash -Install the K8S Operator -======================== + helm repo add flyteorg https://flyteorg.github.io/flyte + +Install the Kubernetes operator +------------------------------- .. tabs:: .. group-tab:: PyTorch/TensorFlow/MPI - Build and apply the training-operator: - - .. code-block:: bash - - export KUBECONFIG=$KUBECONFIG:~/.kube/config:~/.flyte/k3s/k3s.yaml - kustomize build "https://github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.5.0" | kubectl apply -f - + First, `install kustomize `__. + Build and apply the training-operator. + + .. code-block:: bash + + export KUBECONFIG=$KUBECONFIG:~/.kube/config:~/.flyte/k3s/k3s.yaml + kustomize build "https://github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.5.0" | kubectl apply -f - - **Optional: Using a Gang Scheduler** + **Optional: Using a gang scheduler** - With the default Kubernetes scheduler, it can happen that some worker pods of distributed training jobs are scheduled - later than others due to resource constraints. This often causes the job to fail with a timeout error. To avoid - this you can use a gang scheduler, meaning that the worker pods are only scheduled once all of them can be scheduled at - the same time. + To address potential issues with worker pods of distributed training jobs being scheduled at different times + due to resource constraints, you can opt for a gang scheduler. This ensures that all worker pods are scheduled + simultaneously, reducing the likelihood of job failures caused by timeout errors. - To `enable gang scheduling for the Kubeflow training-operator `_, - you can install the `Kubernetes scheduler plugins `_ or `Apache YuniKorn scheduler `_. + To `enable gang scheduling for the Kubeflow training-operator `__, + you can install the `Kubernetes scheduler plugins `__ + or the `Apache YuniKorn scheduler `__. 1. Install the `scheduler plugin `_ or - `Apache YuniKorn `_ as a second scheduler + `Apache YuniKorn `_ as a second scheduler. 2. Configure the Kubeflow training-operator to use the new scheduler: Create a manifest called ``kustomization.yaml`` with the following content: @@ -83,7 +302,6 @@ Install the K8S Operator patchesStrategicMerge: - patch.yaml - Create a patch file called ``patch.yaml`` with the following content: .. code-block:: yaml @@ -101,19 +319,24 @@ Install the K8S Operator - /manager - --gang-scheduler-name= - - Install the patched kustomization with: + Install the patched kustomization with the following command: .. code-block:: bash kustomize build path/to/overlay/directory | kubectl apply -f - - 3. (Only for Apache YuniKorn) Configure ``template.metadata.annotations.yunikorn.apache.org/task-group-name`` , - ``template.metadata.annotations.yunikorn.apache.org/task-groups`` and - ``template.metadata.annotations.yunikorn.apache.org/schedulingPolicyParameters`` in Flyte pod templates. - See `Apache YuniKorn Gang-Scheduling `_ for more configuration detail. + (Only for Apache YuniKorn) To configure gang scheduling with Apache YuniKorn, + make sure to set the following annotations in Flyte pod templates: - 4. Use a Flyte pod template with ``template.spec.schedulerName: scheduler-plugins-scheduler`` + - ``template.metadata.annotations.yunikorn.apache.org/task-group-name`` + - ``template.metadata.annotations.yunikorn.apache.org/task-groups`` + - ``template.metadata.annotations.yunikorn.apache.org/schedulingPolicyParameters`` + + For more configuration details, + refer to the `Apache YuniKorn Gang-Scheduling documentation + `__. + + 3. Use a Flyte pod template with ``template.spec.schedulerName: scheduler-plugins-scheduler`` to use the new gang scheduler for your tasks. See the :ref:`using-k8s-podtemplates` section for more information on pod templates in Flyte. @@ -123,93 +346,133 @@ Install the K8S Operator gang scheduler as well. + For more information on pod templates in Flyte, refer to the :ref:`using-k8s-podtemplates` section. + You can set the scheduler name in the pod template passed to the ``@task`` decorator. + However, to avoid resource competition between the two different schedulers, + it is recommended to set the scheduler name in the pod template in the ``flyte`` namespace, + which is applied to all tasks. This allows non-distributed training tasks to be + scheduled by the gang scheduler as well. .. group-tab:: Ray - - Install the Ray Operator: + + To install the Ray Operator, run the following commands: .. code-block:: bash - export KUBERAY_VERSION=v0.3.0 + export KUBERAY_VERSION=v0.5.2 kubectl create -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=${KUBERAY_VERSION}&timeout=90s" kubectl apply -k "github.com/ray-project/kuberay/manifests/base?ref=${KUBERAY_VERSION}&timeout=90s" .. group-tab:: Spark - Add the Spark repository: + To add the Spark repository, run the following commands: .. code-block:: bash helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator - Install the Spark Operator: + To install the Spark operator, run the following command: .. code-block:: bash helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace - .. group-tab:: Dask - Add Dask repository + To add the Dask repository, run the following command: .. code-block:: bash helm repo add dask https://helm.dask.org - Install Dask Operator + To install the Dask operator, run the following command: .. code-block:: bash helm install dask-operator dask/dask-kubernetes-operator --namespace dask-operator --create-namespace +Specify plugin configuration +---------------------------- -Specify Plugin Configuration -=============================== +.. tabs:: -Create a file named ``values-override.yaml`` and add the following config to it: + .. group-tab:: PyTorch -.. tabs:: + .. tabs:: - .. group-tab:: PyTorch - - Enable PyTorch backend plugin: - - .. code-block:: yaml - - configmap: - enabled_plugins: - # -- Task specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable SageMaker*, Athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - pytorch - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - pytorch: pytorch - - .. group-tab:: TensorFlow - - Enable the TensorFlow backend plugin: + .. group-tab:: Flyte binary + + To specify the plugin when using the Helm chart, edit the relevant YAML file. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - pytorch + default-for-task-types: + - container: container + - container_array: k8s-array + - pytorch: pytorch + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + + configmap: + enabled_plugins: + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - pytorch + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + pytorch: pytorch - .. code-block:: yaml + .. group-tab:: TensorFlow + .. tabs:: + + .. group-tab:: Flyte binary + + To specify the plugin when using the Helm chart, edit the relevant YAML file. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - tensorflow + default-for-task-types: + - container: container + - container_array: k8s-array + - tensorflow: tensorflow + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + configmap: enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable SageMaker*, Athena if you install the backend - # plugins enabled-plugins: - container - sidecar @@ -221,20 +484,39 @@ Create a file named ``values-override.yaml`` and add the following config to it: container_array: k8s-array tensorflow: tensorflow - .. group-tab:: MPI - - Enable the MPI backend plugin: - - .. code-block:: yaml + .. group-tab:: MPI + .. tabs:: + + .. group-tab:: Flyte binary + + To specify the plugin when using the Helm chart, edit the relevant YAML file. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - mpi + default-for-task-types: + - container: container + - container_array: k8s-array + - mpi: mpi + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + configmap: enabled_plugins: - # -- Task specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable SageMaker*, Athena if you install the backend - # plugins enabled-plugins: - container - sidecar @@ -245,218 +527,68 @@ Create a file named ``values-override.yaml`` and add the following config to it: sidecar: sidecar container_array: k8s-array mpi: mpi - - .. group-tab:: Ray - - Enable the Ray backend plugin: - - .. code-block:: yaml - - configmap: - enabled_plugins: - # -- Task specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable SageMaker*, Athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - ray - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - ray: ray - - .. group-tab:: Spark - - .. tabbed:: Sandbox - - Since sandbox uses minio, it needs additional configuration. - - .. code-block:: yaml - - cluster_resource_manager: - # -- Enables the Cluster resource manager component - enabled: true - # -- Configmap for ClusterResource parameters - config: - # -- ClusterResource parameters - # Refer to the [structure](https://pkg.go.dev/github.com/lyft/flyteadmin@v0.3.37/pkg/runtime/interfaces#ClusterResourceConfig) to customize. - cluster_resources: - refreshInterval: 5m - templatePath: "/etc/flyte/clusterresource/templates" - customData: - - production: - - projectQuotaCpu: - value: "5" - - projectQuotaMemory: - value: "4000Mi" - - staging: - - projectQuotaCpu: - value: "2" - - projectQuotaMemory: - value: "3000Mi" - - development: - - projectQuotaCpu: - value: "4" - - projectQuotaMemory: - value: "5000Mi" - refresh: 5m - - # -- Resource templates to be applied - templates: - # -- Template for namespaces resources - - key: aa_namespace - value: | - apiVersion: v1 - kind: Namespace - metadata: - name: {{ namespace }} - spec: - finalizers: - - kubernetes - - - key: ab_project_resource_quota - value: | - apiVersion: v1 - kind: ResourceQuota - metadata: - name: project-quota - namespace: {{ namespace }} - spec: - hard: - limits.cpu: {{ projectQuotaCpu }} - limits.memory: {{ projectQuotaMemory }} - - - key: ac_spark_role - value: | - apiVersion: rbac.authorization.k8s.io/v1beta1 - kind: Role - metadata: - name: spark-role - namespace: {{ namespace }} - rules: - - apiGroups: ["*"] - resources: ["pods"] - verbs: ["*"] - - apiGroups: ["*"] - resources: ["services"] - verbs: ["*"] - - apiGroups: ["*"] - resources: ["configmaps", "persistentvolumeclaims"] - verbs: ["*"] + + .. group-tab:: Ray + + .. tabs:: + + .. group-tab:: Flyte binary + + To specify the plugin when using the Helm chart, edit the relevant YAML file. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - ray + default-for-task-types: + - container: container + - container_array: k8s-array + - ray: ray + + .. group-tab:: Flyte core - - key: ad_spark_service_account - value: | - apiVersion: v1 - kind: ServiceAccount - metadata: - name: spark - namespace: {{ namespace }} + Create a file named ``values-override.yaml`` and add the following config to it: - - key: ae_spark_role_binding - value: | - apiVersion: rbac.authorization.k8s.io/v1beta1 - kind: RoleBinding - metadata: - name: spark-role-binding - namespace: {{ namespace }} - roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: spark-role - subjects: - - kind: ServiceAccount - name: spark - namespace: {{ namespace }} + .. code-block:: yaml - sparkoperator: - enabled: true - plugin_config: - plugins: - spark: - # -- Spark default configuration - spark-config-default: - # We override the default credentials chain provider for Hadoop so that - # it can use the serviceAccount based IAM role or ec2 metadata based. - # This is more in line with how AWS works - - spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider" - - spark.hadoop.fs.s3a.endpoint: "http://minio.flyte.svc.cluster.local:9000" - - spark.hadoop.fs.s3a.access.key: "minio" - - spark.hadoop.fs.s3a.secret.key: "miniostorage" - - spark.hadoop.fs.s3a.path.style.access: "true" - - spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2" - - spark.kubernetes.allocation.batch.size: "50" - - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl" - - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - - spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A" - - spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A" - - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A" - - spark.hadoop.fs.s3a.multipart.threshold: "536870912" - - spark.excludeOnFailure.enabled: "true" - - spark.excludeOnFailure.timeout: "5m" - - spark.task.maxfailures: "8" - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - spark - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - spark: spark - - .. group-tab:: Dask + configmap: + enabled_plugins: + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - ray + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + ray: ray - Enable dask backend plugin + .. group-tab:: Spark - .. code-block:: yaml - - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - dask - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - dask: dask + .. tabs:: - .. tabbed:: AWS + .. group-tab:: Flyte binary + + To specify the plugin when using the Helm chart, edit the relevant YAML file. + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: - .. code-block:: yaml + .. code-block:: yaml cluster_resource_manager: - # -- Enables the Cluster resource manager component enabled: true - # -- Configmap for ClusterResource parameters config: - # -- ClusterResource parameters - # Refer to the [structure](https://pkg.go.dev/github.com/lyft/flyteadmin@v0.3.37/pkg/runtime/interfaces#ClusterResourceConfig) to customize. cluster_resources: refreshInterval: 5m templatePath: "/etc/flyte/clusterresource/templates" @@ -556,13 +688,10 @@ Create a file named ``values-override.yaml`` and add the following config to it: plugin_config: plugins: spark: - # -- Spark default configuration + # Edit the Spark configuration as you see fit spark-config-default: - # We override the default credentials chain provider for Hadoop so that - # it can use the serviceAccount based IAM role or ec2 metadata based. - # This is more in line with how AWS works + - spark.driver.cores: "1" - spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain" - - spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2" - spark.kubernetes.allocation.batch.size: "50" - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl" - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" @@ -571,18 +700,13 @@ Create a file named ``values-override.yaml`` and add the following config to it: - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A" - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem" - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A" - - spark.hadoop.fs.s3a.multipart.threshold: "536870912" - - spark.excludeOnFailure.enabled: "true" - - spark.excludeOnFailure.timeout: "5m" - - spark.task.maxfailures: "8" + - spark.network.timeout: 600s + - spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000 + - spark.executor.heartbeatInterval: 60s configmap: enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins enabled-plugins: - container - sidecar @@ -593,142 +717,83 @@ Create a file named ``values-override.yaml`` and add the following config to it: sidecar: sidecar container_array: k8s-array spark: spark + + .. group-tab:: Dask + + .. tabs:: -Upgrade the Flyte Helm release -============================== + .. group-tab:: Flyte binary -.. code-block:: bash + Edit the relevant YAML file to specify the plugin. - helm upgrade flyte-core flyteorg/flyte-core -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml -n flyte + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - dask + default-for-task-types: + - container: container + - container_array: k8s-array + - dask: dask + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + + configmap: + enabled_plugins: + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - dask + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + dask: dask -Register the plugin example -=========================== +Upgrade the deployment +---------------------- .. tabs:: - .. group-tab:: PyTorch - - .. code-block:: bash - - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.112/snacks-cookbook-integrations-kubernetes-kfpytorch.tar.gz --archive -p flytesnacks -d development --version latest - - .. group-tab:: TensorFlow - - .. code-block:: bash - - # TODO: https://github.com/flyteorg/flyte/issues/1757 - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.112/snacks-cookbook-integrations-kubernetes-kftensorflow.tar.gz --archive -p flytesnacks -d development --version latest - - .. group-tab:: MPI - - .. code-block:: bash - - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.112/snacks-cookbook-integrations-kubernetes-kfmpi.tar.gz --archive -p flytesnacks -d development --version latest - - .. group-tab:: Ray - - .. code-block:: bash - - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.112/snacks-cookbook-integrations-kubernetes-ray_example.tar.gz --archive -p flytesnacks -d development --version latest - - - .. group-tab:: Spark - - .. code-block:: bash - - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.112/snacks-cookbook-integrations-kubernetes-k8s_spark.tar.gz --archive -p flytesnacks -d development --version latest - - .. group-tab:: Dask - - .. code-block:: bash - - flytectl register files --config ~/.flyte/config.yaml https://github.com/flyteorg/flytesnacks/releases/download/v0.3.75/snacks-cookbook-integrations-kubernetes-k8s_dask.tar.gz --archive -p flytesnacks -d development --version latest + .. group-tab:: Flyte binary + If you are installing Flyte via the Helm chart, run the following command: -Launch an execution -=================== + .. note:: -.. tabs:: + There is no need to run ``helm upgrade`` for Spark. - .. tab:: Flyte Console - - * Navigate to the Flyte Console's UI (e.g. `sandbox `_) and find the relevant workflow. - * Click on `Launch` to open up a launch form. - * Specify **spark** as the service account if launching a Spark example. - * Submit the form to launch an execution. - - .. tab:: Flytectl + .. code-block:: bash - .. tabs:: - - .. group-tab:: PyTorch - - Retrieve an execution in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/config.yaml --project flytesnacks --domain development kfpytorch.pytorch_mnist.pytorch_training_wf --latest --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/config.yaml create execution -p -d --execFile ~/exec_spec.yaml - - .. group-tab:: TensorFlow - - Retrieve an execution in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/config.yaml --project flytesnacks --domain development --latest --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/config.yaml create execution -p -d --execFile ~/exec_spec.yaml - - .. group-tab:: MPI - - Retrieve an execution in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/config.yaml --project flytesnacks --domain development kfmpi.mpi_mnist.horovod_training_wf --latest --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/config.yaml create execution -p -d --execFile ~/exec_spec.yaml - - .. group-tab:: Ray - - Retrieve an execution in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/config.yaml --project flytesnacks --domain development ray_example.ray_example.ray_workflow --latest --execFile exec_spec.yaml - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/config.yaml create execution -p -d --execFile ~/exec_spec.yaml - - .. group-tab:: Spark - - Retrieve an execution in the form of a YAML file: - - .. code-block:: bash - - flytectl get launchplan --config ~/.flyte/config.yaml --project flytesnacks --domain development k8s_spark.pyspark_pi.my_spark --latest --execFile exec_spec.yaml - - Fill in the ``kubeServiceAcct`` as **spark** in the ``exec_spec.yaml`` file. - - Launch! 🚀 - - .. code-block:: bash - - flytectl --config ~/.flyte/config.yaml create execution -p -d --execFile ~/exec_spec.yaml + helm upgrade flyteorg/flyte-binary -n --values + + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. + + .. group-tab:: Flyte core + + .. code-block:: bash + + helm upgrade flyte/flyte-core -n --values values-override.yaml + + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). + +Wait for the upgrade to complete. You can check the status of the deployment pods by running the following command: + +.. code-block:: bash + + kubectl get pods -n --all-namespaces diff --git a/rsts/deployment/plugins/webapi/databricks.rst b/rsts/deployment/plugins/webapi/databricks.rst index ee74721aed..b6c7c3e654 100644 --- a/rsts/deployment/plugins/webapi/databricks.rst +++ b/rsts/deployment/plugins/webapi/databricks.rst @@ -1,121 +1,383 @@ .. _deployment-plugin-setup-webapi-databricks: -Databricks Plugin Setup ------------------------ +Databricks Plugin +================= -This guide gives an overview of how to set up Databricks in your Flyte deployment. +This guide provides an overview of how to set up Databricks in your Flyte deployment. -Add Flyte chart repo to Helm +Spin up a cluster +----------------- -.. prompt:: bash $ +.. tabs:: - helm repo add flyteorg https://flyteorg.github.io/flyte + .. group-tab:: Flyte binary + + You can spin up a demo cluster using the following command: + + .. code-block:: bash + + flytectl demo start + Or install Flyte using the :ref:`flyte-binary helm chart `. -Setup the Cluster -================= + .. group-tab:: Flyte core -.. tabs:: + If you've installed Flyte using the + `flyte-core helm chart `__, please ensure: - .. tab:: Sandbox + * You have the correct kubeconfig and have selected the correct Kubernetes context. + * You have configured the correct flytectl settings in ``~/.flyte/config.yaml``. - Start the sandbox cluster - - .. prompt:: bash $ - - flytectl demo start - - Generate flytectl config - - .. prompt:: bash $ - - flytectl config init - - .. tab:: AWS/GCP - - Follow the :ref:`deployment-deployment-cloud-simple` or - :ref:`deployment-deployment-multicluster` guide to set up your cluster. - After following these guides, make sure you have: - - * The correct kubeconfig and selected the correct kubernetes context - * The correct flytectl config at ``~/.flyte/config.yaml`` - -.. TODO: move this entrypoint.py script to an official Flyte repo - -Upload an `entrypoint.py `__ -to dbfs or s3. Spark driver node run this file to override the default command -in the dbx job. - - -Specify Plugin Configuration -============================ - -Create a file named ``values-override.yaml`` and add the following config to it: - -.. code-block:: yaml - - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - databricks - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - spark: databricks - databricks: - enabled: True - plugin_config: - plugins: - databricks: - entrypointFile: dbfs:///FileStore/tables/entrypoint.py - databricksInstance: dbc-a53b7a3c-614c - -Get an API Token -================ - -Create a `Databricks account `__ and follow the -docs for creating an `access token `__. - -Then, create a `Instance Profile `_ -for the Spark cluster, it allows the spark job to access your data in the s3 -bucket. +.. note:: + + Add the Flyte chart repo to Helm if you're installing via the Helm charts. + + .. code-block:: bash + + helm repo add flyteorg https://flyteorg.github.io/flyte + +Databricks workspace +-------------------- + +To set up your Databricks account, follow these steps: + +1. Create a `Databricks account `__. +2. Ensure that you have a Databricks workspace up and running. +3. Generate a `personal access token + `__ to be used in the Flyte configuration. + You can find the personal access token in the user settings within the workspace. + +.. note:: + + When testing the Databricks plugin on the demo cluster, create an S3 bucket because the local demo + cluster utilizes MinIO. Follow the `AWS instructions + `__ + to generate access and secret keys, which can be used to access your preferred S3 bucket. + +Create an `instance profile +`__ +for the Spark cluster. This profile enables the Spark job to access your data in the S3 bucket. +Please follow all four steps specified in the documentation. + +Upload the following entrypoint.py file to either +`DBFS `__ +(the final path can be ``dbfs:///FileStore/tables/entrypoint.py``) or S3. +This file will be executed by the Spark driver node, overriding the default command in the +`dbx `__ job. + +.. TODO: A quick-and-dirty workaround to resolve https://github.com/flyteorg/flyte/issues/3853 issue is to import pandas. + +.. code-block:: python + + import os + import sys + from typing import List + + import click + import pandas + from flytekit.bin.entrypoint import fast_execute_task_cmd as _fast_execute_task_cmd + from flytekit.bin.entrypoint import execute_task_cmd as _execute_task_cmd + from flytekit.exceptions.user import FlyteUserException + from flytekit.tools.fast_registration import download_distribution + + + def fast_execute_task_cmd(additional_distribution: str, dest_dir: str, task_execute_cmd: List[str]): + if additional_distribution is not None: + if not dest_dir: + dest_dir = os.getcwd() + download_distribution(additional_distribution, dest_dir) + + # Insert the call to fast before the unbounded resolver args + cmd = [] + for arg in task_execute_cmd: + if arg == "--resolver": + cmd.extend(["--dynamic-addl-distro", additional_distribution, "--dynamic-dest-dir", dest_dir]) + cmd.append(arg) + + click_ctx = click.Context(click.Command("dummy")) + parser = _execute_task_cmd.make_parser(click_ctx) + args, _, _ = parser.parse_args(cmd[1:]) + _execute_task_cmd.callback(test=False, **args) + + + def main(): + + args = sys.argv + + click_ctx = click.Context(click.Command("dummy")) + if args[1] == "pyflyte-fast-execute": + parser = _fast_execute_task_cmd.make_parser(click_ctx) + args, _, _ = parser.parse_args(args[2:]) + fast_execute_task_cmd(**args) + elif args[1] == "pyflyte-execute": + parser = _execute_task_cmd.make_parser(click_ctx) + args, _, _ = parser.parse_args(args[2:]) + _execute_task_cmd.callback(test=False, dynamic_addl_distro=None, dynamic_dest_dir=None, **args) + else: + raise FlyteUserException(f"Unrecognized command: {args[1:]}") + + + if __name__ == '__main__': + main() + +Specify plugin configuration +---------------------------- + +.. tabs:: + + .. group-tab:: Flyte binary + + .. tabs:: + + .. group-tab:: Demo cluster + + Enable the Databricks plugin on the demo cluster by adding the following config to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + spark: databricks + enabled-plugins: + - container + - sidecar + - k8s-array + - databricks + plugins: + databricks: + entrypointFile: dbfs:///FileStore/tables/entrypoint.py + databricksInstance: .cloud.databricks.com + k8s: + default-env-vars: + - FLYTE_AWS_ACCESS_KEY_ID: + - FLYTE_AWS_SECRET_ACCESS_KEY: + - AWS_DEFAULT_REGION: + remoteData: + region: + scheme: aws + signedUrls: + durationMinutes: 3 + propeller: + rawoutput-prefix: s3:/// + storage: + container: "" + type: s3 + stow: + kind: s3 + config: + region: + disable_ssl: true + v2_signing: false + auth_type: accesskey + access_key_id: + secret_key: + signedURL: + stowConfigOverride: + endpoint: "" + + Substitute ```` with the name of your Databricks account, + ```` with the region where you created your AWS bucket, + ```` with your AWS access key ID, + ```` with your AWS secret access key, + and ```` with the name of your S3 bucket. + + .. group-tab:: Helm chart + + Edit the relevant YAML file to specify the plugin. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - databricks + default-for-task-types: + - container: container + - container_array: k8s-array + - spark: databricks + + .. code-block:: yaml + :emphasize-lines: 3-5 + + inline: + plugins: + databricks: + entrypointFile: dbfs:///FileStore/tables/entrypoint.py + databricksInstance: .cloud.databricks.com + + Substitute ```` with the name of your Databricks account. + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + :emphasize-lines: 9,14,15-21 + + configmap: + enabled_plugins: + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - databricks + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + spark: databricks + databricks: + enabled: True + plugin_config: + plugins: + databricks: + entrypointFile: dbfs:///FileStore/tables/entrypoint.py + databricksInstance: .cloud.databricks.com + + Substitute ```` with the name of your Databricks account. + +Add the Databricks access token +------------------------------- Add the Databricks access token to FlytePropeller: -.. code-block:: bash +.. tabs:: + + .. group-tab:: Flyte binary + + .. tabs:: + + .. group-tab:: Demo cluster + + Add the access token as an environment variable to the ``flyte-sandbox`` deployment. + + .. code-block:: bash + + kubectl edit deploy flyte-sandbox -n flyte + + Update the ``env`` configuration: + + .. code-block:: yaml + :emphasize-lines: 12-13 + + env: + - name: POD_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.namespace + - name: FLYTE_SECRET_FLYTE_DATABRICKS_API_TOKEN + value: + image: flyte-binary:sandbox + ... + + .. group-tab:: Helm chart + + Create an external secret as follows: + + .. code-block:: bash + + cat < + EOF + + Reference the newly created secret in + ``.Values.configuration.auth.clientSecretsExternalSecretRef`` + in your YAML file as follows: + + .. code-block:: yaml + :emphasize-lines: 3 + + configuration: + auth: + clientSecretsExternalSecretRef: flyte-binary-client-secrets-external-secret + + Replace ```` with your access token. + + .. group-tab:: Flyte core + + Add the access token as a secret to ``flyte-secret-auth``. + + .. code-block:: bash + + kubectl edit secret -n flyte flyte-secret-auth + + .. code-block:: yaml + :emphasize-lines: 3 + + apiVersion: v1 + data: + FLYTE_DATABRICKS_API_TOKEN: + client_secret: Zm9vYmFy + kind: Secret + ... + + Replace ```` with your access token. + +Upgrade the deployment +---------------------- + +.. tabs:: + + .. group-tab:: Flyte binary + + .. tabs:: + + .. group-tab:: Demo cluster + + .. code-block:: bash + + kubectl rollout restart deployment flyte-sandbox -n flyte + + .. group-tab:: Helm chart + + .. code-block:: bash + + helm upgrade flyteorg/flyte-binary -n --values + + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. + + .. group-tab:: Flyte core - kubectl edit secret -n flyte flyte-secret-auth + .. code-block:: -The configuration should look as follows: + helm upgrade flyte/flyte-core -n --values values-override.yaml -.. code-block:: yaml + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). - apiVersion: v1 - data: - FLYTE_DATABRICKS_API_TOKEN: - client_secret: Zm9vYmFy - kind: Secret - metadata: - annotations: - meta.helm.sh/release-name: flyte - meta.helm.sh/release-namespace: flyte - ... +Wait for the upgrade to complete. You can check the status of the deployment pods by running the following command: -Where you need to replace ```` with your access token. +.. code-block:: -Upgrade the Flyte Helm release -============================== + kubectl get pods -n flyte -.. code-block:: bash +.. note:: - helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core + Make sure you enable `custom containers + `__ + on your Databricks cluster before you trigger the workflow. diff --git a/rsts/deployment/plugins/webapi/index.rst b/rsts/deployment/plugins/webapi/index.rst index be7ef945c3..c54cb3cb87 100644 --- a/rsts/deployment/plugins/webapi/index.rst +++ b/rsts/deployment/plugins/webapi/index.rst @@ -1,22 +1,22 @@ .. _deployment-plugin-setup-webapi: -#################### -Web API Plugin Setup -#################### +Configure Web APIs +================== .. tags:: WebAPI, Integration, Data, Advanced +Discover the process of setting up Web API plugins for Flyte. + .. panels:: :header: text-center :column: col-lg-12 p-2 - .. link-button:: deployment-plugin-setup-webapi-snowflake :type: ref :text: Snowflake Plugin :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the Snowflake Plugin. + Guide to setting up the Snowflake plugin. --- @@ -25,12 +25,12 @@ Web API Plugin Setup :text: Databricks Plugin :classes: btn-block stretched-link ^^^^^^^^^^^^ - Guide to setting up the Databricks Plugin. + Guide to setting up the Databricks plugin. .. toctree:: :maxdepth: 1 - :name: Web API plugin Setup + :name: Web API plugin setup :hidden: snowflake diff --git a/rsts/deployment/plugins/webapi/snowflake.rst b/rsts/deployment/plugins/webapi/snowflake.rst index 619da0f194..85f13fe115 100644 --- a/rsts/deployment/plugins/webapi/snowflake.rst +++ b/rsts/deployment/plugins/webapi/snowflake.rst @@ -1,148 +1,243 @@ .. _deployment-plugin-setup-webapi-snowflake: -Snowflake Plugin Setup ----------------------- +Snowflake Plugin +================ -This guide gives an overview of how to set up Snowflake in your Flyte deployment. +This guide provides an overview of how to set up Snowflake in your Flyte deployment. -Add Flyte Chart Repo to Helm -============================ +Spin up a cluster +----------------- -.. code-block:: +.. tabs:: + + .. group-tab:: Flyte binary + + You can spin up a demo cluster using the following command: + + .. code-block:: bash + + flytectl demo start + + Or install Flyte using the :ref:`flyte-binary helm chart `. - helm repo add flyteorg https://flyteorg.github.io/flyte + .. group-tab:: Flyte core + If you've installed Flyte using the + `flyte-core helm chart `__, + please ensure: -Setup the Cluster -================= + * You have the correct kubeconfig and have selected the correct Kubernetes context. + * You have configured the correct flytectl settings in ``~/.flyte/config.yaml``. + +.. note:: + + Add the Flyte chart repo to Helm if you're installing via the Helm charts. + + .. code-block:: bash + + helm repo add flyteorg https://flyteorg.github.io/flyte + +Specify plugin configuration +---------------------------- .. tabs:: - .. tab:: Sandbox + .. group-tab:: Flyte binary + + .. tabs:: + + .. group-tab:: Demo cluster + + Enable the Snowflake plugin on the demo cluster by adding the following block to ``~/.flyte/sandbox/config.yaml``: + + .. code-block:: yaml + + tasks: + task-plugins: + default-for-task-types: + container: container + container_array: k8s-array + sidecar: sidecar + snowflake: snowflake + enabled-plugins: + - container + - k8s-array + - sidecar + - snowflake + + .. group-tab:: Helm chart + + Edit the relevant YAML file to specify the plugin. + + .. code-block:: yaml + :emphasize-lines: 7,11 + + tasks: + task-plugins: + enabled-plugins: + - container + - sidecar + - k8s-array + - snowflake + default-for-task-types: + - container: container + - container_array: k8s-array + - snowflake: snowflake + + .. group-tab:: Flyte core + + Create a file named ``values-override.yaml`` and add the following config to it: + + .. code-block:: yaml + + configmap: + enabled_plugins: + # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) + tasks: + # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) + task-plugins: + # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend + # plugins + enabled-plugins: + - container + - sidecar + - k8s-array + - snowflake + default-for-task-types: + container: container + sidecar: sidecar + container_array: k8s-array + snowflake: snowflake + +Obtain and add the Snowflake JWT token +-------------------------------------- + +Create a Snowflake account, and follow the `Snowflake docs +`__ +to generate a JWT token. +Then, add the Snowflake JWT token to FlytePropeller. - Start the sandbox cluster - - .. prompt:: bash $ - - flytectl demo start - - Generate flytectl config - - .. prompt:: bash $ - - flytectl config init - - .. tab:: AWS/GCP - - Follow the :ref:`deployment-deployment-cloud-simple` or - :ref:`deployment-deployment-multicluster` guide to set up your cluster. - After following these guides, make sure you have: - - * The correct kubeconfig and selected the correct kubernetes context - * The correct flytectl config at ``~/.flyte/config.yaml`` - -Specify Plugin Configuration -============================ - -Create a file named ``values-override.yaml`` and add the following config to it: - -.. code-block:: yaml - - configmap: - enabled_plugins: - # -- Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) - tasks: - # -- Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) - task-plugins: - # -- [Enabled Plugins](https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend - # plugins - enabled-plugins: - - container - - sidecar - - k8s-array - - snowflake - default-for-task-types: - container: container - sidecar: sidecar - container_array: k8s-array - snowflake: snowflake - -Get an API Token -================ +.. tabs:: -Next, create a trial Snowflake account and follow the docs for creating an API -key. Add the snowflake JWT token to FlytePropeller. + .. group-tab:: Flyte binary -.. note:: - - Refer to the `Snowflake docs `__ - to understand setting up the Snowflake JWT token. + .. tabs:: + + .. group-tab:: Demo cluster + + Add the JWT token as an environment variable to the ``flyte-sandbox`` deployment. + + .. code-block:: bash + + kubectl edit deploy flyte-sandbox -n flyte + + Update the ``env`` configuration: -.. prompt:: bash $ + .. code-block:: yaml + :emphasize-lines: 12-13 - kubectl edit secret -n flyte flyte-secret-auth + env: + - name: POD_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.name + - name: POD_NAMESPACE + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.namespace + - name: FLYTE_SECRET_FLYTE_SNOWFLAKE_CLIENT_TOKEN + value: + image: flyte-binary:sandbox + ... -The configuration will look as follows: + .. group-tab:: Helm chart -.. code-block:: yaml + Create an external secret as follows: - apiVersion: v1 - data: - FLYTE_SNOWFLAKE_CLIENT_TOKEN: - client_secret: Zm9vYmFy - kind: Secret - metadata: - annotations: - meta.helm.sh/release-name: flyte - meta.helm.sh/release-namespace: flyte - ... + .. code-block:: bash -Replace ```` with your JWT token. + cat < + EOF + + Reference the newly created secret in + ``.Values.configuration.auth.clientSecretsExternalSecretRef`` + in your YAML file as follows: -Upgrade the Flyte Helm release -============================== + .. code-block:: yaml + :emphasize-lines: 3 -.. prompt:: bash $ + configuration: + auth: + clientSecretsExternalSecretRef: flyte-binary-client-secrets-external-secret + + Replace ```` with your JWT token. - helm upgrade -n flyte -f https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml -f values-override.yaml flyteorg/flyte-core + .. group-tab:: Flyte core -Register the Snowflake plugin example -===================================== + Add the JWT token as a secret to ``flyte-secret-auth``. -.. prompt:: bash $ + .. code-block:: bash - flytectl register files https://github.com/flyteorg/flytesnacks/releases/download/v0.2.226/snacks-cookbook-external_services-snowflake.tar.gz --archive -p flytesnacks -d development + kubectl edit secret -n flyte flyte-secret-auth + .. code-block:: yaml + :emphasize-lines: 3 -Launch an execution -=================== + apiVersion: v1 + data: + FLYTE_SNOWFLAKE_CLIENT_TOKEN: + client_secret: Zm9vYmFy + kind: Secret + ... + + Replace ```` with your JWT token. + +Upgrade the deployment +---------------------- .. tabs:: - .. tab:: Flyte Console - - * Navigate to Flyte Console's UI (e.g. `sandbox `_) and find the workflow. - * Click on `Launch` to open up the launch form. - * Submit the form. - - .. tab:: Flytectl - - Retrieve an execution form in the form of a yaml file: - - .. prompt:: bash $ - - flytectl get launchplan --config ~/.flyte/flytectl.yaml \ - --project flytesnacks \ - --domain development \ - snowflake.workflows.example.snowflake_wf \ - --latest \ - --execFile exec_spec.yaml - - Launch! 🚀 - - .. prompt:: bash $ - - flytectl --config ~/.flyte/flytectl.yaml create execution \ - -p flytesnacks \ - -d development \ - --execFile ~/exec_spec.yaml + .. group-tab:: Flyte binary + + .. tabs:: + + .. group-tab:: Demo cluster + + .. code-block:: bash + + kubectl rollout restart deployment flyte-sandbox -n flyte + + .. group-tab:: Helm chart + + .. code-block:: bash + + helm upgrade flyteorg/flyte-binary -n --values + + Replace ```` with the name of your release (e.g., ``flyte-backend``), + ```` with the name of your namespace (e.g., ``flyte``), + and ```` with the name of your YAML file. + + .. group-tab:: Flyte core + + .. code-block:: + + helm upgrade flyte/flyte-core -n --values values-override.yaml + + Replace ```` with the name of your release (e.g., ``flyte``) + and ```` with the name of your namespace (e.g., ``flyte``). + +Wait for the upgrade to complete. You can check the status of the deployment pods by running the following command: + +.. code-block:: + + kubectl get pods -n flyte \ No newline at end of file