-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Caching for non-flyte specific offloaded objects #1581
Comments
cc @eapolinario While writing the issue I realized, it is not that big a problem? |
I'm facing a similar dataframe caching issue in the weather forecasting project. I'm using a dynamic workflow to manage the training of a model and the tasks within it rely on training data in the form of a dataframe. Should I use |
solution proposal: caching for complex data types, e.g. dataframesFor blob and schema types
@task(
cache=True,
cache_output_fn={
pd.DataFrame: lambda x: hash(x),
},
cache_version=CACHE_VERSION,
)
def func(x: int) -> pd.DataFrame:
df = ... # get a dataframe
return df |
Update: |
Motivation: Why do you think this is important?
Currently, if users use pandas.DataFrame or a pyspark.DataFrame or pander.DataFrameSchema, Flytekit simply extracts the data from the transport
Literaltype.Schema
. So consider the following functionThe above function will be cached, because the input type has the file path that is cached and hence the function is not run.
But, now consider the following
This task will not be cached. This is because the dataframe is downloaded and then re-uploaded, as the underlying task transforms are not aware that the passed dataframe was not mutated. But, if FlyteSchema is used, this would work fine.
Goal: What should the final outcome look like, ideally?
Either this should work as the user expects, i.e., cache hit, else this abnormality should be documented.
Describe alternatives you've considered
NA
Propose: Link/Inline OR Additional context
No response
The text was updated successfully, but these errors were encountered: