Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customized container tasks and Shim tasks/executors #449

Merged
merged 47 commits into from
May 4, 2021

Conversation

wild-endeavor
Copy link
Contributor

@wild-endeavor wild-endeavor commented Apr 21, 2021

TL;DR

This PR introduces the notion of task container plugins (aka flytekit only plugins). That is, you can now write and publish a Flyte Tasks for others to use, that when run, run a Docker container that you (the author of the plugin) specifies, rather than the user container. That is, typically, when a user writes an @task task, at serialization time, the container image that gets associated with that task is the main workflow/task image, the image made by the person writing the @task decorator. This new construct allows the task author to tailor and control the execution container. The plugin author only has access to the TaskTemplate of course, since the user code won't exist in the author's image.

This relies on previous work done to expose the TaskTemplate to the execution end.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Usage

As a plugin-task author

Follow the sqlite3 external container example as a template. Basically you need to

  • Create a task type that subclasses PythonCustomizedContainerTask
    • Override the get_custom function - this object will be serialized into the TaskTemplate that the executor ultimately receives so put all the information you need to run that specific instance of the task in here. You can also override any other functions in the base class you need to.
  • Create an executor for that task type by subclassing ShimTaskExecutor and implementing the execute_from_model function.
  • In the constructor for your task subclass, point the executor_type argument to your executor subclass. An instance of this class will be created at execution time.

As a user

Simply import the task from the Python library that provides it, and use as normal. The task will be registered and run as normal, just picking up the task author's container instead of yours at platform run-time. It is possible that the plugin author did not provide execution behavior in the importable Python task and it is only available in the container. In that case you'll need to mock it out withe flytekit.patch in order to run it locally.

High Level Changes

  • Add the concept of a ExecutableTemplateShimTask.
  • Add the concept of a ShimTaskExecutor.
  • Add a PythonThirdPartyContainerTask which first uses a TaskTemplateResolver implementing the TaskResolverMixin interface to load the task, and then the ShimTaskExecutor object to run it.
  • The SQLite3Task has been copied into a new folder under extras. I was going to add it somewhere else, but I actually think having both versions around so that people can compare between them is better.
  • TypeEngine extended to add a guessing function to go backwards, from Flyte Literal types to Python types.

Additional Changes

  • TaskResolverMixin has been moved from the autocontainer file to base_task.py file and function signatures updated to reflect lower level classes.
  • The get_command function in PythonFunctionTask and PythonInstanceTask were the same so this has been abstracted into the base class.
  • entrypoint.py cleaned up a bit, setup step was extracted out.

Tracking Issue

NA

Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
@wild-endeavor wild-endeavor marked this pull request as ready for review April 29, 2021 01:17
Signed-off-by: wild-endeavor <[email protected]>

rename sqlalchemy

entrypoint flyte task

Signed-off-by: wild-endeavor <[email protected]>

init

Signed-off-by: wild-endeavor <[email protected]>

bump

Signed-off-by: wild-endeavor <[email protected]>
@wild-endeavor wild-endeavor force-pushed the third-party-container branch from ee4b12c to 760da2b Compare April 29, 2021 17:05
return python_types

@classmethod
def guess_python_type(cls, flyte_type: LiteralType) -> type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to add it to every transformer now - or is there a default behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default behavior is to raise a ValueError which gets caught and ignored.

Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #449 (5d9cda3) into master (2340bfa) will increase coverage by 0.05%.
The diff coverage is 94.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #449      +/-   ##
==========================================
+ Coverage   84.50%   84.56%   +0.05%     
==========================================
  Files         360      362       +2     
  Lines       26896    27009     +113     
  Branches     2214     2227      +13     
==========================================
+ Hits        22729    22839     +110     
- Misses       3567     3568       +1     
- Partials      600      602       +2     
Impacted Files Coverage Δ
flytekit/models/task.py 92.85% <ø> (ø)
....sqlalchemy/flytekitplugins/sqlalchemy/__init__.py 100.00% <ø> (ø)
...ekit.sqlalchemy/flytekitplugins/sqlalchemy/task.py 89.18% <ø> (ø)
flytekit/types/schema/types.py 77.53% <78.94%> (+0.12%) ⬆️
flytekit/core/type_engine.py 83.97% <87.09%> (+1.29%) ⬆️
flytekit/core/base_task.py 87.67% <95.00%> (+0.67%) ⬆️
flytekit/core/class_based_resolver.py 79.16% <100.00%> (+0.90%) ⬆️
flytekit/core/python_auto_container.py 81.52% <100.00%> (-0.63%) ⬇️
flytekit/core/python_function_task.py 87.61% <100.00%> (-0.57%) ⬇️
flytekit/core/tracked_abc.py 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2340bfa...5d9cda3. Read the comment docs.

Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
return obj


class TaskTemplateResolver(TrackedInstance, TaskResolverMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this rant

    a task resolver was supposed to
    * Be able to restore the same task when ``load_task`` is called as the object that ``loader_args`` was called on.
      That is, even though at run time it's in a container on a cluster and is obviously a different Python process,
      the Python object in memory should look the same.
    * Offer a one-to-one mapping between the list of strings returned by the ``loader_args`` function, an the task,
      at least within the container.```

Do we have this contract explicitly spelled out somewhere - if not this seems like a not so useful comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about - special resolver that resolves the ThirdPartyTask at runtime, using only the tasktemplate. Thus for any task, derived from x, this resolver is used and such tasks should be fully rehydrated from the task template alone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't think it was a rant. i'll add this comment and change the wording a bit.



@dataclass
class SQLite3Config(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about use asciiflow to explain whats happening in a SQLite3Task or just in general for a ThirdPartContainer task?

Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
Signed-off-by: wild-endeavor <[email protected]>
@wild-endeavor wild-endeavor changed the title Third party container tasks Customized container tasks May 4, 2021
@wild-endeavor wild-endeavor changed the title Customized container tasks Customized container tasks and Shim tasks/executors May 4, 2021
@kumare3
Copy link
Contributor

kumare3 commented May 4, 2021

Wonderful. Thank you for all the work @wild-endeavor. quick question - would it be better to use ExecutableShimTask even in the local flow?

@kumare3 kumare3 self-requested a review May 4, 2021 17:28
kumare3
kumare3 previously approved these changes May 4, 2021
@wild-endeavor wild-endeavor merged commit ad6c322 into master May 4, 2021
max-hoffman pushed a commit to dolthub/flytekit that referenced this pull request May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants