Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline fast register options with new flag #2690

Merged
merged 18 commits into from
Aug 30, 2024
Merged

Conversation

wild-endeavor
Copy link
Contributor

@wild-endeavor wild-endeavor commented Aug 17, 2024

Background

Flytekit has a notion of a "fast register" (additional docs) which copies a set of files from the local host, compresses and uploads them, and instructs tasks registered to download and extract those files before execution.

Current UX

The behavior today and how the user invokes the behavior is as follows

pyflyte run

pyflyte run was initially written as a way for users to quickly run one script. The fast registration aspect of this command copied this one script into the zip file and into the final container. Over time, support for additional Python files was added via the --copy-all flag. Setting this flag copies all unignored files from the detected root folder into the archive.

More recently, default behavior for run was updated to automatically detect and include all local Python files that were loaded in the course of loading the target script.

pyflyte register

pyflyte register by default runs in 'fast' mode as well. By default, it behaves like the --copy-all flag in run, copying all unignored files from the detected root folder.

To turn off this behavior, the --non-fast flag was added.

pyflyte package

pyflyte package by default does not run in 'fast' mode. That is by default, there are no files copied/extracted.

To enable fast packaging, --fast must be added.

What changes were proposed in this pull request?

The differences in behavior, the differences in switches, and the fact that a new way to more intelligently detect which files should be copied means we have the opportunity to streamline the ux here.

Aligned UX

This PR introduces a --copy flag (listed currently as a beta) that will deprecate the current set of flags. For now, both sets of flags can be used.

In short, use

  • --copy all for the current fast register behavior (copying all non-ignored files)
  • --copy auto for the new behavior of copying only loaded Python modules.
  • --copy none to not copy any files. This is basically turning off fast registration entirely.

Examples

Example use of old and new flag. Again, all old flags will continue to work. In the examples, the --copy switch is explicitly written to a) just highlight what the default is and b) if specified, the logic is slightly different than if not specifying it (see the discussion on changes below).

Run

To smart-copy just loaded python modules (the default)
Old: pyflyte run a/c/d.py
New: pyflyte run [--copy auto] a/c/d.py

To copy all files
Old: pyflyte run --copy-all a/c/d.py
New: pyflyte run --copy all a/c/d.py

Register

To copy all unignored files (the default)
Old: pyflyte register a/b,a/c/d.py
New: pyflyte register [--copy all] a/b,a/c/d.py

To smart copy only relevant Python modules
pyflyte register --copy auto a/b,a/c/d.py

To turn off fast register completely:
Old: pyflyte register --non-fast a/b,a/c/d.py
New: pyflyte register --copy none a/b,a/c/d.py

Package

To package normally with no files copied (the default)
Old: pyflyte --pkgs a.b,c.d package
New: `pyflyte --pkgs a.b,c.d package [--copy none]

To package with all files:
Old: pyflyte --pkgs a.b,c.d package --fast
New: pyflyte --pkgs a.b,c.d package --copy all

To package with only loaded Python modules:
pyflyte --pkgs a.b,c.d package --copy auto

Debugging

To help users understand what files are slated to be copied, each command also takes a --ls-files flag.

Files to be copied for fast registration...
📂 /Users/you/dev/project1/src (detected source root)
┣━━ my.img
┣━━ resources
┃   ┗━━ file.txt
┗━━ work
    ┣━━ __init__.py
    ┣━━ tt.py
    ┗━━ wfs.py

Serialize

pyflyte serialize is missing from the above discussions because it is for the most part a duplicate of the package command. We are planning to deprecate serialize actually.

Changes

To make the new flag possible, especially in the context of the auto behavior of detecting which modules are loaded, this pr had to

  • Load the user requested modules before uploading. This changes the behavior for register which previously had uploaded files first.
  • The manner in which files are added to the tar file are different. Previously, folders were added into the tar file wholesale, with the tar library responsible for adding files recursively. With the new logic, files are listed and added one at a time. For folders with a large number of files, this may take longer.

Note

If you use the default/use the old flags, the old tar logic will be triggered. Only if you use the new --copy flag will the new logic for tar be used.

Note

This means that your code can no longer depend on the result of the upload. We think this is an extreme edge case, but let us know if not.

Code updates

  • The serialize function was split up into two functions to allow pyflyte to first load python modules, then upload files, then actually compile the Flyte entities.
  • load_packages_and_modules renamed to list_packages_and_modules because it no longer loads anything.
  • Two fields were added to the FastPackageOptions class to capture the new behavior.
  • Two helper functions were moved from fast registration to script mode.
  • Added a simple error handler to the walk_packages call just to not swallow errors.

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Copy link

codecov bot commented Aug 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.67%. Comparing base (1cd8160) to head (438d67f).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2690      +/-   ##
==========================================
+ Coverage   91.99%   94.67%   +2.68%     
==========================================
  Files          18       87      +69     
  Lines         874     4113    +3239     
==========================================
+ Hits          804     3894    +3090     
- Misses         70      219     +149     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
@wild-endeavor wild-endeavor marked this pull request as ready for review August 20, 2024 20:03
@wild-endeavor wild-endeavor changed the title Cli/copy options Streamline fast register options with new flag Aug 20, 2024
Signed-off-by: Yee Hing Tong <[email protected]>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first past review.

flytekit/clis/sdk_in_container/package.py Outdated Show resolved Hide resolved
flytekit/clis/sdk_in_container/package.py Outdated Show resolved Hide resolved
flytekit/tools/fast_registration.py Outdated Show resolved Hide resolved
flytekit/tools/fast_registration.py Outdated Show resolved Hide resolved
flytekit/clis/sdk_in_container/package.py Outdated Show resolved Hide resolved
flytekit/clis/sdk_in_container/package.py Show resolved Hide resolved
flytekit/clis/sdk_in_container/register.py Show resolved Hide resolved
flytekit/tools/script_mode.py Show resolved Hide resolved
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Copy link
Member

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool!
Screenshot 2024-08-22 at 1 28 43 PM

flytekit/clis/sdk_in_container/register.py Outdated Show resolved Hide resolved
pingsutw
pingsutw previously approved these changes Aug 30, 2024
flytekit/tools/fast_registration.py Outdated Show resolved Hide resolved
flytekit/clis/sdk_in_container/register.py Outdated Show resolved Hide resolved
@@ -58,28 +100,74 @@ def fast_package(
ignores = default_ignores
ignore = IgnoreGroup(source, ignores)

# Remove this after original tar command is removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we keeping the original tar command? Is it for backward compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup.

flytekit/clis/sdk_in_container/register.py Outdated Show resolved Hide resolved
@wild-endeavor wild-endeavor merged commit 8bad8e6 into master Aug 30, 2024
101 checks passed
@wild-endeavor wild-endeavor mentioned this pull request Sep 5, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants