Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up tar packing by lower compresslevel and create symbolic links for same files #887

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gdh1995
Copy link
Contributor

@gdh1995 gdh1995 commented Aug 26, 2024

What's the problem

The current tar packaging process is not efficient enough, as evidenced by:

  • The default gzip compression level "6" can take 200% more time than compression level "1", but the size is only reduced by 10%-20%.
  • For tar packages that reach gigabyte levels and are only transmitted and used within an organization’s intranet, packing speed might be more critical than size.
  • If other tar packages are relied upon, the respective tar package must be created and then decompressed, causing a single file to possibly be compressed multiple times.
  • If multiple srcs and deps projects' runfiles directories reference the same file, i.e., in the case of diamond dependencies, add_file will create N identical copies, significantly increasing the package size and time.

How to solve

  1. add compresslevel: str which can be "" (auto, 6) | "0" | "1" | ... | "9"
  2. provide MappingManifestInfo besides DefaultInfo to expose manifest_file and package_dir info to downstream targets
  • add merge_mappings: bool to enable this behavior manually
  1. add auto_deduplicate: bool to identify added files across all manifest files by content paths and realpaths, and then auto-create symbolic links

@gdh1995 gdh1995 requested review from aiuto and cgrindel as code owners August 26, 2024 09:25
Copy link

google-cla bot commented Aug 26, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gdh1995 gdh1995 force-pushed the pass_through_and_symlinks branch from b9ed911 to 4a90992 Compare August 26, 2024 09:31
@gdh1995 gdh1995 force-pushed the pass_through_and_symlinks branch from 4a90992 to ee4999c Compare August 26, 2024 11:04
@cgrindel
Copy link
Collaborator

These changes seem reasonable to me. However, I wonder if we should break it out into two or three PRs:

  • Compression level
  • MappingManifestInfo
  • Deduplicate

What do you think?

@gdh1995
Copy link
Contributor Author

gdh1995 commented Aug 27, 2024

Ah in fact my original work in my private workspace just has exact 3 commits to add such 3 features.

I'll split it tomorrow.

Copy link
Collaborator

@aiuto aiuto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this two distinct PRs.
The compression level should be easy but the other thing looks like it has more general comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants