Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for python pickle type in flytekit/flyte #667

Merged
merged 32 commits into from
Oct 27, 2021
Merged

Conversation

pingsutw
Copy link
Member

Signed-off-by: Kevin Su [email protected]

TL;DR

Add a new type PythonPickle = FlyteFile[typing.TypeVar(PYTHON_PICKLE_FORMAT)].
Any type that flyte can't recognize will become PythonPickle. e.g. typing.Any, typing.List[int] -> PythonPickle
Function output (native python type) will save in PythonPickle

Let's assume that we have two tasks "A" and "B", We have several cases that need to handle.

Case 1: A.output (Pickle) -> B.input (Pickle)

  1. Run the Task A
  2. Serialize task output into the pickle
  3. Task B find the pickle path
  4. Deserialize the pickle and feed it into Task B input

Case 2: A.output (str) -> B.input (Pickle)

Seems like it only happens in local execution, we will fail to register this kind of workflow

  1. Run the Task A
  2. Native python type transform to literal and pass to Task B
  3. Extract python value from literals.scalar.primitive.value.
  4. feed python value into Task B input

We can't use a transformer to convert Literal to python value since we can't know literal's python type
Task A output type is str, but Task B expect input type is Any which will fall back to PythonPickle
In this scenario, We directly extract value from scalar, collection, or map.

Case 3: A.output (Pickle) -> B.input (str)

Seems like it only happens in local execution, we will fail to register this kind of workflow

  1. Task A serialize the output into a pickle
  2. Task A pass the FlyteFile which is just a pickle path in IDL to Task B
  3. We can just deserialize the data in the pickle file and use it. We don't need to use the transformer to translate Literal to python value here.

Case 4: Should not write the data to the same Pickle file

Each task output should write to a unique file path.
To solve this problem, the file name will be combined with the module name, function name, and output index.
e.g. __main__.add_question.o0

Case 5: Flytekit without type annotations

def _contains_explicit_return(f):
            return any(isinstance(node, ast.Return) for node in ast.walk(ast.parse(inspect.getsource(f))))

We can use the ast package to find whether a task or workflow has a return value.
if task output doesn't have type annotations and has a return value, then we will use PythonPickle by default.
It can make users more quickly run a flyte workflow on Kubernetes without handling the type that flyte doesn't support.

Here is an example without type annotations, and feel free to try it.
Workflow input needs type annotations to make Flyte console work. Otherwise, Flyte Console will expect the user to have a FlyteFile input.

from flytekit import task, workflow

@task
def greet(name):
    return f"Hi {name}"

@task
def add_question(greeting):
    return f"{greeting} How are you?"

@workflow
def welcome(name: str):
    greeting = greet(name=name)
    return add_question(greeting=greeting)


if __name__ == "__main__":
    print(welcome(name="Kevin"))

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

Still work in progress, will add test after I finish

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

How did you fix the bug, make the feature etc. Link to any design docs etc

Tracking Issue

flyteorg/flyte#1362

Follow-up issue

NA

@kumare3
Copy link
Contributor

kumare3 commented Sep 23, 2021

Each task output should write to a unique file path.
To solve this problem, the file name will be combined with the module name, function name, and output index.
e.g. main.add_question.o0

You do not need this. You are using the Flyte raw output path, so just simply use the output name like o0

@wild-endeavor
Copy link
Contributor

Can we talk about this? I think I may be missing something. the original issue was to support arbitrary types, not a new type right? so that if a user does

from lib.types import CustomClass

@task
def tn(in: CustomClass):
  ...

it will still work. I haven't looked at the code change but this description seems to imply that the user would have to change it to

@task
def tn(in: PythonPickle):
  ...

Am I understanding it correctly? we don't want to make the user do this, the code shouldn't have to change.

@wild-endeavor
Copy link
Contributor

Also let's leave the ast package out of it for now. I think we want the user to make things explicit.

@wild-endeavor
Copy link
Contributor

Regarding "Seems like it only happens in local execution, we will fail to register this kind of workflow"... if we're going to fail at registration time, we should fail earlier during local execution.

@kumare3
Copy link
Contributor

kumare3 commented Sep 28, 2021

@wild-endeavor we already talked about this, @pingsutw is making either a new PR or updating it

@pingsutw pingsutw marked this pull request as draft September 28, 2021 15:16
@pingsutw
Copy link
Member Author

Am I understanding it correctly? we don't want to make the user do this, the code shouldn't have to change.

We don't need to change anything now, and user-defined class will become pickle automatically

Also let's leave the ast package out of it for now. I think we want the user to make things explicit.

yeah, we've planned to remove it after meeting last week.

Regarding "Seems like it only happens in local execution, we will fail to register this kind of workflow"... if we're going to fail at registration time, we should fail earlier during local execution.

okay, I will make a new PR to address this issue.

@codecov
Copy link

codecov bot commented Sep 29, 2021

Codecov Report

Merging #667 (70158c7) into master (f876a33) will increase coverage by 0.02%.
The diff coverage is 91.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #667      +/-   ##
==========================================
+ Coverage   85.80%   85.83%   +0.02%     
==========================================
  Files         358      361       +3     
  Lines       29793    29947     +154     
  Branches     2428     2438      +10     
==========================================
+ Hits        25564    25705     +141     
- Misses       3590     3600      +10     
- Partials      639      642       +3     
Impacted Files Coverage Δ
flytekit/common/translator.py 90.38% <ø> (ø)
flytekit/core/promise.py 77.95% <ø> (ø)
tests/flytekit/unit/core/test_interface.py 85.29% <85.71%> (+0.03%) ⬆️
flytekit/types/pickle/pickle.py 86.53% <86.53%> (ø)
tests/flytekit/unit/core/test_flyte_pickle.py 90.69% <90.69%> (ø)
flytekit/core/interface.py 81.65% <96.42%> (+2.07%) ⬆️
flytekit/models/core/types.py 98.70% <100.00%> (+0.02%) ⬆️
flytekit/types/pickle/__init__.py 100.00% <100.00%> (ø)
...ts/flytekit/unit/core/functools/test_decorators.py 100.00% <100.00%> (ø)
tests/flytekit/unit/core/test_type_engine.py 99.71% <100.00%> (+<0.01%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f876a33...70158c7. Read the comment docs.

@pingsutw pingsutw changed the title [WIP] Add support for python pickle type in flytekit/flyte Add support for python pickle type in flytekit/flyte Sep 29, 2021
@pingsutw pingsutw marked this pull request as ready for review September 30, 2021 07:42
flytekit/core/interface.py Outdated Show resolved Hide resolved
flytekit/core/interface.py Outdated Show resolved Hide resolved
@kumare3
Copy link
Contributor

kumare3 commented Oct 7, 2021

minor nit, but this is great, +1 after you change the error message

kumare3
kumare3 previously approved these changes Oct 12, 2021
flytekit/types/pickle/pickle.py Outdated Show resolved Hide resolved
flytekit/types/pickle/pickle.py Outdated Show resolved Hide resolved
flytekit/types/pickle/pickle.py Outdated Show resolved Hide resolved

output = tf.to_python_value(ctx, lv, str)
assert output == python_val

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some more unit tests:

  1. a task that uses a list of Foo
  2. a task that uses a dict of str -> Foo
  3. a workflow that uses it, and testing that local workflow execution still works?

tests/flytekit/unit/core/test_type_hints.py Outdated Show resolved Hide resolved
@@ -78,6 +85,11 @@ def extract_value(
literal_list = [extract_value(ctx, v, sub_type, flyte_literal_type.collection_type) for v in input_val]
return _literal_models.Literal(collection=_literal_models.LiteralCollection(literals=literal_list))
elif isinstance(input_val, dict):
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should have to be here. can you explain why? why doesn't the recursive extract_value call work? and same for the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input value is a list of pickles, we will serialize the whole list into one pickle file.
Otherwise, we will get tons of files if using a very long list.
Does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumare3 can you chime in on this? I don't think this is the behavior we want. I think we should get a very long list of files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wild-endeavor I actually like getting one file with the list in there, as it all pickled. But, not at the cost of some weird implementation. Performance wise and usage wise this is much better

@pingsutw pingsutw requested a review from eapolinario as a code owner October 21, 2021 06:11
@pingsutw pingsutw force-pushed the pickle2 branch 3 times, most recently from 32e5eed to c6d219b Compare October 25, 2021 07:17
pingsutw and others added 20 commits October 27, 2021 03:37
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
"""
return self._metadata

@metadata.setter
Copy link
Contributor

@wild-endeavor wild-endeavor Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a unit test failure on trying to set metadata. The reason is because we had some help from OSS cleaning up the model file structure. See this PR.

Can you add the setters you need to the new location and then delete this file? It shouldn't be here anymore.

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw
Copy link
Member Author

Thanks all.

@pingsutw pingsutw merged commit 88b590c into master Oct 27, 2021
reverson pushed a commit to reverson/flytekit that referenced this pull request May 27, 2022
* Init python pickle

Signed-off-by: Kevin Su <[email protected]>

* Refactor pickle support

Signed-off-by: Kevin Su <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

* Fixed register error

Signed-off-by: Kevin Su <[email protected]>

* Added tests

Signed-off-by: Kevin Su <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

* Handle list of pickle

Signed-off-by: Kevin Su <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

* Added assert_type

Signed-off-by: Kevin Su <[email protected]>

* Updated comment

Signed-off-by: Kevin Su <[email protected]>

* Remove unnecessary cast

Signed-off-by: Kevin Su <[email protected]>

* Added more tests

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* Address comment

Signed-off-by: Kevin Su <[email protected]>

* Update list of pickle

Signed-off-by: Kevin Su <[email protected]>

* Update list of pickle

Signed-off-by: Kevin Su <[email protected]>

* Update list of pickle

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fix tests

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* Fixed test

Signed-off-by: Kevin Su <[email protected]>

* add test

Signed-off-by: Yee Hing Tong <[email protected]>

* add test

Signed-off-by: Yee Hing Tong <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

* Fixed import

Signed-off-by: Kevin Su <[email protected]>

* Handle nested list and dict

Signed-off-by: Kevin Su <[email protected]>

* Added tests

Signed-off-by: Kevin Su <[email protected]>

* Fixed lint

Signed-off-by: Kevin Su <[email protected]>

Co-authored-by: Yee Hing Tong <[email protected]>
Signed-off-by: Robert Everson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants