-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy user files that were imported by workflow in pyflyte run #2663
Copy user files that were imported by workflow in pyflyte run #2663
Conversation
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
flytekit/tools/script_mode.py
Outdated
@@ -46,6 +50,8 @@ def compress_scripts(source_path: str, destination: str, module_name: str): | |||
|
|||
visited: typing.List[str] = [] | |||
copy_module_to_destination(source_path, destination_path, module_name, visited) | |||
add_imported_modules_from_source(source_path, destination_path, list(sys.modules.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove copy_module_to_destination
, does it work as well? If so, I think we can remove that.
For more context:
copy_module_to_destination
will only copy the module to remote if and only if that module has a flyte task/workflow.
in this case, pyflyte run --remote uploads foo_wf
and main
└── workflows
├── __init__.py
├── another_module
│ └── foo_wf.py
├── main.py # it import foo_wf
I did this because I didn't know how to do add_imported_modules_from_source
at that time.
Now you have added it, I think we should be able to remove copy_module_to_destination
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, remove this one line and merge. tested this against dev but feel free to again. the only difference i saw is that with the copy_module... call, the main script itself is already in the destination folder, and because it exists it's skipped when add_imported... sees it. Without it, the main file itself is copied by add_imported...
@@ -127,6 +133,65 @@ def tar_strip_file_attributes(tar_info: tarfile.TarInfo) -> tarfile.TarInfo: | |||
return tar_info | |||
|
|||
|
|||
def add_imported_modules_from_source(source_path: str, destination: str, modules: List[ModuleType]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If utils
imports another local module, will we copy that to the remote storage as well?
.
└── workflows
├── __init__.py
├── another_module
│ └── wow.py
├── main.py # only import util.py
└── util.py # import util2.py
└── util2.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any files that is a part of the import chain from the original workflow will be copied over.
Details: If main.py
imports from util.py
and util.py
imports from util2.py
then util2.py
is part of sys.modules
when main.py
is loaded. With this PR, if the module is in sys.modules
and in the same working directory as main.py
, then it will be uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, perfect
Signed-off-by: Thomas J. Fan <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2663 +/- ##
===========================================
- Coverage 92.53% 46.23% -46.31%
===========================================
Files 27 187 +160
Lines 1206 19162 +17956
Branches 0 2766 +2766
===========================================
+ Hits 1116 8859 +7743
- Misses 90 9867 +9777
- Partials 0 436 +436 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
@@ -493,7 +493,7 @@ def _update_flyte_context(params: RunLevelParams) -> FlyteContext.Builder: | |||
if output_prefix and ctx.file_access.is_remote(output_prefix): | |||
with tempfile.TemporaryDirectory() as tmp_dir: | |||
archive_fname = pathlib.Path(os.path.join(tmp_dir, "script_mode.tar.gz")) | |||
compress_scripts(params.computed_params.project_root, str(archive_fname), params.computed_params.module) | |||
compress_scripts(params.computed_params.project_root, str(archive_fname), list(sys.modules.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pingsutw I'm concerned with this change. At this point, is the main workflow imported by flytekit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
entity = load_naive_entity(module, exe_entity, project_root) |
If not, could we params.computed_params.module
to the list as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new get_all_modules
function that adds params.computed_params.module
if it is not already imported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, thank you!
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
…rg#2663) Signed-off-by: Thomas J. Fan <[email protected]>
…rg#2663) Signed-off-by: Thomas J. Fan <[email protected]> Signed-off-by: Mecoli1219 <[email protected]>
Tracking issue
Related to flyteorg/flyte#5343
Why are the changes needed?
Currently,
pyflyte run
requires--copy-all
when the workflow imports files from other directories. This is a common gotcha becausepyflyte run workflow.py
works, butpyflyte run --remote workflow.py
does not.What changes were proposed in this pull request?
With this PR:
sys.modules
and copy over all files that are necessary for the workflow to run.pyflyte run --remote workflow.py
will work out of the box most of the time.--copy-all
because it only copies over the required files. If you have a project with tests or docs, it will not be copied.I think the new coping mechanism does what
copy_module_to_destination
does, but I did not want to adjust it just in case I break anything. I added a newadd_imported_modules_from_source
to additionally copy over files.How was this patch tested?
I added unit tests to make sure the copying works. I also had a project with this structure:
and ran
pyflyte run --remote workflows/main.py
and everything copies correctly. Simple workflow in the root directory also works: