Skip to content

Commit

Permalink
New "deterministic_output" argument to produce consistent zip files a…
Browse files Browse the repository at this point in the history
…cross all host platforms

Summary:
### Motivation

My team has a concrete need for buck to generate 100% matching zip files for the same sets of inputs on all host platforms (macOS, Linux, Windows). Current limitations:
1. File order can be different on file system with different case sensitivity.
2. Windows can't write correct posix mode (i.e. permissions) for any entries.

Although the entries themselves might fully match, those discrepancies result in different metadata, which results in a different zip file.

See D67149264 for an in-depth explanation of the use case that requires this level of determinism.

### Tentative solution #1

In D66386385, I made it so the asset generation rule was only executable from Linux. Paired with buck cross builds, it made so that outputs from macOS and Linux matched, but did not work on Windows [due to some lower level buck problem](https://fb.workplace.com/groups/930797200910874/posts/1548299102494011) (still unresolved).

### Tentative solution #2

In D66404381, I wrote my own Python script to create zip files. I got all the files and metadata to match everywhere, but I could not get around differences in the compression results. Decided not to pursue because compression is important for file size.

###  Tentative solution #3

In D67149264, I duplicated and tweaked buck's zip binary. It did work, but IanChilds rightfully pointed out that I'd be making maintenance on those libraries more difficult and that the team is even planning on deleting those, at some point.

### Tentative solution #4 (this diff!)

IanChilds advised me to try to fix buck itself to produce consistent results, so this is me giving it a try.

Because the root problem could not have been done in a backwards compatible way (the file permissions, specifically; see inlined comment), I decided to use an argument to control whether the zip tool should strive to produce a deterministic output or not, at the expense of some loss of metadata. The changes are simple and backwards compatible, but any feedback on the root problem, idea and execution are welcome.

Reviewed By: christolliday

Differential Revision: D67301945

fbshipit-source-id: c42ef7a52efd235b43509337913d905bcbaf3782
  • Loading branch information
Thiago Goulart authored and facebook-github-bot committed Dec 19, 2024
1 parent be81f0f commit 736bff0
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 0 deletions.
5 changes: 5 additions & 0 deletions decls/core_rules.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -1485,6 +1485,11 @@ zip_file = prelude_rule(
The regexes must be defined using `java.util.regex.Pattern` syntax.
"""),
"deterministic_output": attrs.option(attrs.bool(), default = None, doc = """
If set to true, Buck ensures that all files in the generated zip and their associated metadata are
consistent across all platforms, resulting in an identical zip file everywhere. Note that this might
come at the expense of losing some otherwise relevant metadata, like file permissions and timestamps.
"""),
"on_duplicate_entry": attrs.enum(OnDuplicateEntry, default = "overwrite", doc = """
Action performed when Buck detects that zip\\_file input contains multiple entries with the same
name.
Expand Down
4 changes: 4 additions & 0 deletions zip_file/zip_file.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ def _zip_file_impl(ctx: AnalysisContext) -> list[Provider]:

on_duplicate_entry = ctx.attrs.on_duplicate_entry
entries_to_exclude = ctx.attrs.entries_to_exclude
deterministic_output = ctx.attrs.deterministic_output
zip_srcs = ctx.attrs.zip_srcs
srcs = ctx.attrs.srcs

Expand Down Expand Up @@ -59,6 +60,9 @@ def _zip_file_impl(ctx: AnalysisContext) -> list[Provider]:
create_zip_cmd.append("--entries_to_exclude")
create_zip_cmd.append(entries_to_exclude)

if deterministic_output:
create_zip_cmd.append("--deterministic_output")

ctx.actions.run(cmd_args(create_zip_cmd), category = "zip")

return [DefaultInfo(default_output = output)]
Expand Down

0 comments on commit 736bff0

Please sign in to comment.