From 79ce518c3fd5c1425f00090be8cee8cf73a69516 Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 12:23:56 -0700 Subject: [PATCH 1/9] wip Signed-off-by: wild-endeavor --- rsts/howto/template_only_tasks.rst | 39 ++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 rsts/howto/template_only_tasks.rst diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst new file mode 100644 index 0000000000..b73244304c --- /dev/null +++ b/rsts/howto/template_only_tasks.rst @@ -0,0 +1,39 @@ +.. _howto-template-only-tasks: + +##################################################################### +How do I write a custom task that doesn't depend on the user's image? +##################################################################### + +As we'll see throughout this how-to, the answer to the title question also addresses: + +#. What is meant by a "flytekit-only plugin"? +#. How do custom task authors optimize a task's Docker image? + +********************** +How normal tasks work +********************** + +Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details but from here, the process is + +#. At serialization time, a Docker container image is required. The assumption is that this Docker image has the task code. +#. The task is serialized into a ``TaskTemplate``. This template contains instructions to the container on how to reconstitute the task. +#. When Flyte runs the task, the container from step 1. is launched, and the instructions from step 2. recreate a Python object representing the task, using the user code in the container. +#. The task object is run. + +The key point here is that the task object that gets serialized at compile-time is recreated using the user's code at run time. + +************************** +Task Template based tasks +************************** + + +Using one of these tasks + + +Writing one of these tasks + + + + + + From a64423896e002104c21141fdf18bfa6de85c7ae3 Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 15:33:07 -0700 Subject: [PATCH 2/9] bump milestone version in conf.py Signed-off-by: wild-endeavor --- rsts/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/conf.py b/rsts/conf.py index feaf81a969..07ecad4880 100644 --- a/rsts/conf.py +++ b/rsts/conf.py @@ -29,7 +29,7 @@ # The short X.Y version version = u'' # The full version, including alpha/beta/rc tags -release = u'0.11.0' +release = u'0.13.0' # -- General configuration --------------------------------------------------- From 05f9f052efbdf9569a0daa3a7cc396ee6366b69c Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 15:35:13 -0700 Subject: [PATCH 3/9] add empty changelog file Signed-off-by: wild-endeavor --- CHANGELOG/CHANGELOG-v0.13.0.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 CHANGELOG/CHANGELOG-v0.13.0.md diff --git a/CHANGELOG/CHANGELOG-v0.13.0.md b/CHANGELOG/CHANGELOG-v0.13.0.md new file mode 100644 index 0000000000..e69de29bb2 From 8abad86220379f0d3647f40f9fe53e295356bee3 Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 15:39:11 -0700 Subject: [PATCH 4/9] header Signed-off-by: wild-endeavor --- CHANGELOG/CHANGELOG-v0.13.0.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG/CHANGELOG-v0.13.0.md b/CHANGELOG/CHANGELOG-v0.13.0.md index e69de29bb2..468c219297 100644 --- a/CHANGELOG/CHANGELOG-v0.13.0.md +++ b/CHANGELOG/CHANGELOG-v0.13.0.md @@ -0,0 +1,2 @@ +# Flyte v0.13.0 + From c59b43e10cbad18e1c19dcb991dfc5546bacf350 Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 15:39:35 -0700 Subject: [PATCH 5/9] wip Signed-off-by: wild-endeavor --- rsts/howto/template_only_tasks.rst | 67 ++++++++++++++++++++++++++---- 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst index b73244304c..3c213555ab 100644 --- a/rsts/howto/template_only_tasks.rst +++ b/rsts/howto/template_only_tasks.rst @@ -10,27 +10,78 @@ As we'll see throughout this how-to, the answer to the title question also addre #. How do custom task authors optimize a task's Docker image? ********************** -How normal tasks work +Background ********************** -Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details but from here, the process is +Process Differences +===================== + +Normal function tasks +--------------------- + +Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details. With the decorator in place, the process is #. At serialization time, a Docker container image is required. The assumption is that this Docker image has the task code. #. The task is serialized into a ``TaskTemplate``. This template contains instructions to the container on how to reconstitute the task. #. When Flyte runs the task, the container from step 1. is launched, and the instructions from step 2. recreate a Python object representing the task, using the user code in the container. #. The task object is run. -The key point here is that the task object that gets serialized at compile-time is recreated using the user's code at run time. +The key points here are that +* the task object that gets serialized at compile-time is recreated using the user's code at run time, and +* at platform-run-time, the user-decorated function is executed. + +TaskTemplate based tasks +------------------------ + +The execution process for task template based tasks differ from the above in that +#. At serialization time, the Docker container image is hardcoded into the task definition (by the author of that task type). +#. When serialized into a ``TaskTemplate``, the template should contain all the information needed to run that instance of the task (but not necessarily to reconstitute it). +#. When Flyte runs the task, the container from step 1. is launched. The container should have an executor built into it that knows how to execute the task, purely based on the ``TaskTemplate``. + +The two points above differ in that +* the task object that gets serialized at compile-time does not exist at run time. +* at platform-run-time, there is no user function, and the executor is responsible for producing outputs, given the inputs to the task. + +Why +=== +These tasks are useful because +* Shift the burden of writing the Dockerfile from the user using the task in workflows, to the author of the task type. +* Allow the author to optimize the image that the task runs. +* Make it possible to arbitrarily (mostly) extend Flyte task execution behavior without the need for a backend golang plugin. The caveat is that these tasks still cannot access the K8s cluster, so if you want a custom task type that creates some CRD, you'll still need a backend plugin. + +************************* +Using a Task +************************* +Take a look at the example PR where we moved the built-in SQLite3 task from the old style of writing a task to the new one. +From the user's perspective, not much changes. You still just + +#. Install whatever Python library contains the task type definition (for SQLite3, it's bundled into flytekit itself, but usually this will not be the case). +#. Import and instantiate the task as you would any other type of non-function based task. + +*************************** +Writing a Task +*************************** +There's three components to writing one of these new tasks. +* A Dockerfile - this is what is run when any user runs your task. It'll likely contain flytekit, Python, and your task extension code. +* Your task extension code, which consists of 1) a class for the Task and 2) a class for the Executor. + +Image +======= + + +Python Library +================ + +Task +------- + -************************** -Task Template based tasks -************************** +Executor +-------- -Using one of these tasks -Writing one of these tasks From 1bed92a59a8ee917e4f415dfea02a0a6f7c93616 Mon Sep 17 00:00:00 2001 From: wild-endeavor Date: Wed, 5 May 2021 18:06:32 -0700 Subject: [PATCH 6/9] more Signed-off-by: wild-endeavor --- rsts/howto/template_only_tasks.rst | 31 +++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst index 3c213555ab..f381a2f4e8 100644 --- a/rsts/howto/template_only_tasks.rst +++ b/rsts/howto/template_only_tasks.rst @@ -52,7 +52,7 @@ These tasks are useful because ************************* Using a Task ************************* -Take a look at the example PR where we moved the built-in SQLite3 task from the old style of writing a task to the new one. +Take a look at the `example PR `__ where we moved the built-in SQLite3 task from the old style of writing a task to the new one. From the user's perspective, not much changes. You still just #. Install whatever Python library contains the task type definition (for SQLite3, it's bundled into flytekit itself, but usually this will not be the case). @@ -62,29 +62,46 @@ From the user's perspective, not much changes. You still just Writing a Task *************************** There's three components to writing one of these new tasks. -* A Dockerfile - this is what is run when any user runs your task. It'll likely contain flytekit, Python, and your task extension code. * Your task extension code, which consists of 1) a class for the Task and 2) a class for the Executor. +* A Dockerfile - this is what is run when any user runs your task. It'll likely contain flytekit, Python, and your task extension code. -Image -======= - +The `aforementioned PR `__ where we migrate the SQLite3 task should be used to follow along the below. Python Library ================ Task ------- +Authors creating new tasks of this type will need to create a subclass of the ``PythonCustomizedContainerTask`` class. +Specifically, you'll need to customize these three arguments to the parent class constructor: -Executor --------- +* ``container_image`` This is the container image that will run when the user's invocation of the task is run on a Flyte platform. +* ``executor_type`` This should be the Python class that subclasses the ``ShimTaskExecutor``. +* ``task_type`` All types have a task type. This is just a string which the Flyte engine uses to determine which plugin to use when running a task. Anything that doesn't have an explicit match for will default to the container plugin (which is correct in this case). So you can call this anything, just not anything that's already taken by something else (like "spark" or something). +Referring to the SQLite3 example :: + container_image="ghcr.io/flyteorg/flytekit-py37:v0.18.1", + executor_type=SQLite3TaskExecutor, + task_type="sqlite3", +Note that the container is special in this case - because the definition of the Python classes themselves is bundled in flytekit, we just use the flytekit image. +Additionally, you will need to override the ``get_custom`` function. Keep in mind that the execution behavior of the task needs to be completely determined by the serialized form of the task (that is, the serialized ``TaskTemplate``). This function is how you can do that, as it's stored and inserted into the `custom field `__ of the template. Keep the total size of the task template reasonably small though. +Executor +-------- +The ``ShimTaskExecutor`` is an abstract class that you will need to subclass and override the ``execute_from_model`` function for. This function is where all the business logic for your task should go, and it will be called in both local workflow execution and at platform-run-time execution. +The signature of this execute function is different from the ``execute`` functions of most other tasks since here, all the business logic, the entirety of how the task is run, is determined from the ``TaskTemplate`` +Image +======= +This is the custom image that you supplied in the ``PythonCustomizedContainerTask`` subclass. Out of the box, these tasks will run a command that looks like the following when the container is run by Flyte :: + pyflyte-execute --inputs s3://inputs.pb --output-prefix s3://outputs --raw-output-data-prefix s3://user-data --resolver flytekit.core.python_customized_container_task.default_task_template_resolver -- {{.taskTemplatePath}} path.to.your.executor.subclass +This means that your Docker image will need to have Python and flytekit installed. The Python interpreter that is run by the container should be able to find your custom executor class at that ``path.to.your.executor.subclass`` import path. +Feel free to take a look at the flytekit Dockerfile as well. From 78323c8f2d88744ba49efa9fbd0f901bb803de3d Mon Sep 17 00:00:00 2001 From: SandraGH5 <80421934+SandraGH5@users.noreply.github.com> Date: Thu, 6 May 2021 09:44:55 -0700 Subject: [PATCH 7/9] Update template_only_tasks.rst --- rsts/howto/template_only_tasks.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst index f381a2f4e8..5f09fd4b38 100644 --- a/rsts/howto/template_only_tasks.rst +++ b/rsts/howto/template_only_tasks.rst @@ -1,7 +1,7 @@ .. _howto-template-only-tasks: ##################################################################### -How do I write a custom task that doesn't depend on the user's image? +How to Write a Custom Task That Doesn't Depend on the User's Image? ##################################################################### As we'll see throughout this how-to, the answer to the title question also addresses: @@ -16,7 +16,7 @@ Background Process Differences ===================== -Normal function tasks +Normal Function Tasks --------------------- Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details. With the decorator in place, the process is @@ -30,10 +30,10 @@ The key points here are that * the task object that gets serialized at compile-time is recreated using the user's code at run time, and * at platform-run-time, the user-decorated function is executed. -TaskTemplate based tasks +TaskTemplate Based Tasks ------------------------ -The execution process for task template based tasks differ from the above in that +The execution process for task template based tasks differs from the above in that #. At serialization time, the Docker container image is hardcoded into the task definition (by the author of that task type). #. When serialized into a ``TaskTemplate``, the template should contain all the information needed to run that instance of the task (but not necessarily to reconstitute it). #. When Flyte runs the task, the container from step 1. is launched. The container should have an executor built into it that knows how to execute the task, purely based on the ``TaskTemplate``. @@ -44,7 +44,7 @@ The two points above differ in that Why === -These tasks are useful because +These tasks are useful because they * Shift the burden of writing the Dockerfile from the user using the task in workflows, to the author of the task type. * Allow the author to optimize the image that the task runs. * Make it possible to arbitrarily (mostly) extend Flyte task execution behavior without the need for a backend golang plugin. The caveat is that these tasks still cannot access the K8s cluster, so if you want a custom task type that creates some CRD, you'll still need a backend plugin. @@ -61,7 +61,7 @@ From the user's perspective, not much changes. You still just *************************** Writing a Task *************************** -There's three components to writing one of these new tasks. +There are three components to writing one of these new tasks. * Your task extension code, which consists of 1) a class for the Task and 2) a class for the Executor. * A Dockerfile - this is what is run when any user runs your task. It'll likely contain flytekit, Python, and your task extension code. @@ -78,7 +78,7 @@ Specifically, you'll need to customize these three arguments to the parent class * ``container_image`` This is the container image that will run when the user's invocation of the task is run on a Flyte platform. * ``executor_type`` This should be the Python class that subclasses the ``ShimTaskExecutor``. -* ``task_type`` All types have a task type. This is just a string which the Flyte engine uses to determine which plugin to use when running a task. Anything that doesn't have an explicit match for will default to the container plugin (which is correct in this case). So you can call this anything, just not anything that's already taken by something else (like "spark" or something). +* ``task_type`` All types have a task type. This is just a string which the Flyte engine uses to determine which plugin to use when running a task. Anything that doesn't have an explicit match will default to the container plugin (which is correct in this case). So you can call this anything, just not anything that's already taken by something else (like "spark" or something). Referring to the SQLite3 example :: @@ -86,7 +86,7 @@ Referring to the SQLite3 example :: executor_type=SQLite3TaskExecutor, task_type="sqlite3", -Note that the container is special in this case - because the definition of the Python classes themselves is bundled in flytekit, we just use the flytekit image. +Note that the container is special in this case - because the definition of the Python classes themselves is bundled in Flytekit, we just use the Flytekit image. Additionally, you will need to override the ``get_custom`` function. Keep in mind that the execution behavior of the task needs to be completely determined by the serialized form of the task (that is, the serialized ``TaskTemplate``). This function is how you can do that, as it's stored and inserted into the `custom field `__ of the template. Keep the total size of the task template reasonably small though. @@ -94,7 +94,7 @@ Executor -------- The ``ShimTaskExecutor`` is an abstract class that you will need to subclass and override the ``execute_from_model`` function for. This function is where all the business logic for your task should go, and it will be called in both local workflow execution and at platform-run-time execution. -The signature of this execute function is different from the ``execute`` functions of most other tasks since here, all the business logic, the entirety of how the task is run, is determined from the ``TaskTemplate`` +The signature of this execute function is different from the ``execute`` functions of most other tasks since here, all the business logic, the entirety of how the task is run, is determined from the ``TaskTemplate``. Image ======= @@ -102,6 +102,6 @@ This is the custom image that you supplied in the ``PythonCustomizedContainerTas pyflyte-execute --inputs s3://inputs.pb --output-prefix s3://outputs --raw-output-data-prefix s3://user-data --resolver flytekit.core.python_customized_container_task.default_task_template_resolver -- {{.taskTemplatePath}} path.to.your.executor.subclass -This means that your Docker image will need to have Python and flytekit installed. The Python interpreter that is run by the container should be able to find your custom executor class at that ``path.to.your.executor.subclass`` import path. +This means that your Docker image will need to have Python and Flytekit installed. The Python interpreter that is run by the container should be able to find your custom executor class at that ``path.to.your.executor.subclass`` import path. -Feel free to take a look at the flytekit Dockerfile as well. +Feel free to take a look at the Flytekit Dockerfile as well. From a52858a95164de43ba5257ad753d774c191dbc1a Mon Sep 17 00:00:00 2001 From: SandraGH5 <80421934+SandraGH5@users.noreply.github.com> Date: Fri, 7 May 2021 13:49:28 -0700 Subject: [PATCH 8/9] Update rsts/howto/template_only_tasks.rst Co-authored-by: Ketan Umare <16888709+kumare3@users.noreply.github.com> --- rsts/howto/template_only_tasks.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst index 5f09fd4b38..de40ede8eb 100644 --- a/rsts/howto/template_only_tasks.rst +++ b/rsts/howto/template_only_tasks.rst @@ -22,7 +22,7 @@ Normal Function Tasks Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details. With the decorator in place, the process is #. At serialization time, a Docker container image is required. The assumption is that this Docker image has the task code. -#. The task is serialized into a ``TaskTemplate``. This template contains instructions to the container on how to reconstitute the task. +#. The task is serialized into a :std:ref:`api_msg_flyteidl.core.tasktemplate`. This template contains instructions to the container on how to reconstitute the task. #. When Flyte runs the task, the container from step 1. is launched, and the instructions from step 2. recreate a Python object representing the task, using the user code in the container. #. The task object is run. From 7fcbf596147c441b700ab2f88ef0b1e136ea9a65 Mon Sep 17 00:00:00 2001 From: SandraGH5 <80421934+SandraGH5@users.noreply.github.com> Date: Fri, 7 May 2021 13:52:36 -0700 Subject: [PATCH 9/9] Update template_only_tasks.rst --- rsts/howto/template_only_tasks.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/howto/template_only_tasks.rst b/rsts/howto/template_only_tasks.rst index de40ede8eb..e8f5755fcd 100644 --- a/rsts/howto/template_only_tasks.rst +++ b/rsts/howto/template_only_tasks.rst @@ -19,7 +19,7 @@ Process Differences Normal Function Tasks --------------------- -Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the ``@task`` decorator. Please see the basic Task concept doc for more details. With the decorator in place, the process is +Most tasks that are in the cookbook and other Flyte introductory material are basic Python function tasks. That is, they are created by decorating a Python function with the :py:function:flytekit.task decorator. Please see the basic Task concept doc for more details. With the decorator in place, the process is #. At serialization time, a Docker container image is required. The assumption is that this Docker image has the task code. #. The task is serialized into a :std:ref:`api_msg_flyteidl.core.tasktemplate`. This template contains instructions to the container on how to reconstitute the task.