-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching to local execution #592
Conversation
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Thank you for opening this pull request! 🙌 |
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #592 +/- ##
==========================================
+ Coverage 85.51% 85.57% +0.06%
==========================================
Files 376 379 +3
Lines 29311 29682 +371
Branches 2357 2376 +19
==========================================
+ Hits 25064 25400 +336
- Misses 3611 3640 +29
- Partials 636 642 +6
Continue to review full report at Codecov.
|
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, can we add some more tests!
flytekit/core/base_task.py
Outdated
outputs_literal_map = self.dispatch_execute(ctx, input_literal_map) | ||
# if metadata.cache is set, check memoized version | ||
if self._metadata.cache: | ||
# The cache key is composed only of '(input_literal_map, cache_version)', i.e. all other parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also include the task name as part of the key?
similar to what we do with the identifier on hosted flyte https://github.com/flyteorg/flyteplugins/blob/master/go/tasks/pluginmachinery/catalog/client.go#L26
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key is already implicitly part of the cache key (in other words, joblib.Memory
only caches the result of each function call). I'm going to update the comment to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is a really good call. I'm going to add a test case to demonstrate the problem and subsequent fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by 130b432.
@@ -0,0 +1,78 @@ | |||
from pytest import fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test some more complex input and output types, like dataclasses, schemas, files?
flytekit/core/local_cache.py
Outdated
|
||
@staticmethod | ||
def initialize(): | ||
LocalCache._memory = Memory(CACHE_LOCATION, verbose=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the verbosity configurable too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we gain a lot from making this configurable, especially given that configuration files are not used at all in local executions. I'm going to leave a TODO as well.
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
Signed-off-by: Eduardo Apolinario <[email protected]>
|
||
def test_wf_custom_types(): | ||
@dataclass_json | ||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, this works? ran into issues with dataclass serialization last time i tried this but maybe joblib has since worked around this. woohoo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question around this, are we storing LiteralMap
or are we storing the native python values
. Also what happens to file refrences, do we store a shallow copy or a deep copy?
TL;DR
Enable the ability to cache the output of tasks in local executions
Type
Are all requirements met?
Complete description
joblib.Memory
provides a persistent store to cache the result of function invocations. Similarly to how we do in the hosted case, we use the pair(task inputs, cache version)
to memoize task outputs across different executions.We're also adding another command to
pyflyte
to help users clear the cache. In the future we might extend this to allow for the inspection of the values in the local cache.Tracking Issue
flyteorg/flyte#761
Follow-up issue
NA