Add support for normalizing the archived files metadata #47

64kramsystem · 2019-07-05T14:12:11Z

UPDATE: a few important notes if anybody happens to read this Pr:

This is a universal solution, which covers also a concept sometimes forgotten - different O/Ss can produce different compressed streams
This solution causes an incompatibility in some cases, which is solvable, but needs extra logic, and certainly good testing (see Zip files are incompatible with Java's java.util.zip 64kramsystem/terraform-provider-archive-dev#1 (comment)).
There are other, more adhoc, solutions applicable (e.g. granular options); they are easy to implement - as a matter of fact, this PR is a sort of blend of all
I'm not interested in participating to this feature nor to the related discussion, so please don't (manually) CC me; the mainters showed through facts (as opposite to words), uninterest in proceeding with any solution at all; it'd be polite (and good practice) if they just declared so, rather than rushing to say that this will be taken care of. For me, this is poor maintenance, and I don't want to be involved with it.

When archiving files from different machines, even if the source content is the same, different archive files may be generated due to a variety of factors, like compression libraries or file metadata difference (permissions, etc.).

This creates problems in situations where it's expected that the exact same archive should always be generated from a given source content (typically, a Lambda function resource).

An example context that causes differences is Git on different O/Ss - filesystems may have different mount permission masks, and be the same for Git (which doesn't distinguish 0644 from 0664), but end up generating a different archive.

The normalize_files_metadata option makes the archiver produce a "normalized" archive, which solves the problem. In the Zip case, this is obtained by setting:

the compression method to Store
the modification date to a fixed one
the permissions to 0644

Disabling compression is a tradeoff, however, in the context where this functionality is used (eg. Lambda function resources), the difference is negligible.

Close #34.

The functionality is easy and fits nicely with the code. The code itself doesn't have a very string case convention for variable names (some are entirely lower case without underscore, some in camel case, etc.), so that may be subject to change.

The name of the option is entirely open to change. "Normalization" is the concept that I think is fit, but of course, there are different ways of viewing it.

bboe · 2019-07-05T16:13:23Z

archive/zip_archiver.go

+func normalizeCompressingFile(fh *zip.FileHeader) {
+	fh.Method = zip.Store
+	fh.SetModTime(time.Date(1981, 4, 10, 0, 0, 0, 0, time.UTC))
+	fh.SetMode(0644)


What if the file is executable?

See here for an example: https://github.com/terraform-providers/terraform-provider-archive/pull/41/files#diff-572809c26c1b9c23a665ebaea5dd7ea8R70

I'm not familiar with executing binaries inside a Lambda functions; based on a brief research, maybe Lambda resets the permissions of the archived files - see, for example:

https://www.reddit.com/r/aws/comments/7k0txe/how_can_i_give_executables_inside_a_lambda

https://stackoverflow.com/questions/48194886/update-permissions-on-aws-lambda-file-to-make-it-executable

It's not 100% clear from those resources where their problem is - it could be the zip library not storing the file permissions (not all do), or Lambda resetting the permissions when unarchiving; if it's the latter case, managing executable bit would not yield any effect.

Do you have direct experience on this use case, so that I know with certainty what's the exact behavior?

When archiving files from different machines, even if the source content is the same, different archive files may be generated due to a variety of factors, like compression libraries or file metadata difference (permissions, etc.). This creates problems in situations where it's expected that the exact same archive should always be generated from a given source content (typically, a Lambda function resource). An example context that causes differences is Git on different O/Ss - filesystems may have different mount permission masks, and be the same for Git (which doesn't distinguish 0644 from 0664), but end up generating a different archive. The `normalize_files_metadata` option makes the archiver produce a "normalized" archive, which solves the problem. In the Zip case, this is obtained by setting: - the compression method to Store - the modification date to a fixed one - the permissions to 0644 Disabling compression is a tradeoff, however, in the context where this functionality is used (eg. Lambda function resources), the difference is negligible.

64kramsystem · 2019-07-12T09:54:52Z

Removed the Travis go_import_path setting commit. It was necessary to make the build run on the branch, but it's not necessary anymore now that the a PR has been opened.

sveniu · 2019-10-10T08:51:34Z

Did this fizzle out? I'm eagerly awaiting an update on this or #41.

I'm using a pretty ugly work-around for now, to unify the umask across platforms:

git () {
  origin="$(command git remote get-url --push origin 2>/dev/null)"
  [ "$origin" = "[email protected]:foo/terraform.git" ] &&
    (umask 0022; command git "$@") ||
    command git "$@"
}

64kramsystem · 2019-10-10T08:57:52Z

Did this fizzle out? I'm eagerly awaiting an update on this or #41.

I'm using a pretty ugly work-around for now, to unify the umask across platforms:

Hello! Sadly, the upstream maintainer(s) entirely ignored both the issue and the PR - the PR is long standing, and hasn't been commented, closed or accepted, while there's activity on master.

We've been using this archive fork at my company successfully for a long time by now, so likely, the best strategy is to officially release the fork.

I'll close this PR, then give updates on the fork.

64kramsystem · 2019-10-10T08:58:08Z

Closing for inactivity.

callaa3 · 2020-01-09T10:50:35Z

It's been an issue for a while for us using 3 different OSs in the office. Any chance we can get some visibility back on the issue?

64kramsystem · 2020-01-12T22:16:35Z

It's been an issue for a while for us using 3 different OSs in the office. Any chance we can get some visibility back on the issue?

I think the project is essentially unmaintained (and when it wasn't, this PR was entirely ignored, which is "not cool" 🙄).
I'd like to produce and public my fork, but I don't have resources right now.

Regardless, something important to keep in mind for people needing this functionality is that there is no need for the code to be upstream. Just clone my fork and compile the plugin, then place it in the appropriate location - either in the per-user (see manual) or in the per-project TF configuration directory.

The fork is stable, at least, for our use case - in my company, we've been using it for a long time, and we just keep the compiled binary into our project's TF configuration directory.

WtfJoke · 2020-02-17T16:10:55Z

@saveriomiroddi cant you leave the pull-request open so it eventually gets merged or receive some comments from the maintainers?

Which fork do you mean? Yours https://github.com/saveriomiroddi/terraform-provider-archive-dev seems to be "outdated" as well?

EDIT: You probably referr to https://github.com/saveriomiroddi/terraform-provider-archive-dev/tree/sav-normalize_files_metadata_option

64kramsystem · 2020-02-21T12:15:18Z

@saveriomiroddi cant you leave the pull-request open so it eventually gets merged or receive some comments from the maintainers?

Hello!

So! I have a few considerations in relation to the PR closure.

First, most important from the practical perspective: the changes applied by the PR are free 😄 This means everybody is 100% free to, say, fork the original repository, copy the commit and open a new PR.

In this case the procedure is:

# fork the original repository from the Web UI (or the GitHub commandline tool)

git clone <forked_repository_address>
cd <local_repository_dir>

git remote add saverio https://github.com/saveriomiroddi/terraform-provider-archive-dev.git
git fetch --all
git checkout -b <feature_branch_name> saverio/sav-normalize_files_metadata_option
git push --set-upstream origin HEAD

Now you can create the PR from the GitHub Web UI (or cmline tool) 😄

Second: repository maintainers can re-open PRs; this happened to a contribution of mine on Bundler. It's crucial to understand that the important thing is that the issue remains open - the typical maintainance workflow uses issues for tracking purposes, and in this case, an issue is existing and open.

Now, regarding the ideological aspect.

I close PRs after a set time because I think it's disrespectful from maintainers to wholesale ignore conversations. I'm perfectly fine if an issue/PR is closed by the maintainers because:

it doesn't fit the overall design,
or there aren't enough resources to take full care of it,
or it's not deemed as needed,
or it's deemed as poorly engineered,
or the wind blows from north rather than south.

But I find ignoring contributions (conversations and/or PRs) very poor maintenance practice, unless the project is expressly unmaintained.

Most important of all, this maintenance attitude implies that contributors must be thankful to the maintainers if their contributions are accepted.
I find this an extremely important topic. There should be a maintainer's code of conduct, not only the contributors' 😉.

Having said that, back to practice: Terraform plugins are trivial to build and use. In this case, it's just a matter of cloning the fork, building the plugin, and placing it in the appropriate directory. At my company, we keep those binaries in the SCM, so using them is friction-free. No need to chase careless maintainers 😄.

64kramsystem · 2020-02-21T12:16:39Z

Which fork do you mean? Yours https://github.com/saveriomiroddi/terraform-provider-archive-dev seems to be "outdated" as well?

EDIT: You probably referr to https://github.com/saveriomiroddi/terraform-provider-archive-dev/tree/sav-normalize_files_metadata_option

I'm not sure what you mean with this; we've been using a compiled version of that branch/PR since it was published, until now - and now we're using Terraform v0.12.19 without any problems, so I think this should work for everybody.

WtfJoke · 2020-03-06T09:21:32Z

First, most important from the practical perspective: the changes applied by the PR are free 😄 This means everybody is 100% free to, say, fork the original repository, copy the commit and open a new PR.
In this case the procedure is:

Sure, but I think its not a nice gesture to copy the commit and open a new PR.
And I contributed in the past too, so I'm capable of creating a PR using git, but thank you for the details 😉

Second: repository maintainers can re-open PRs; this happened to a contribution of mine on Bundler. It's crucial to understand that the important thing is that the issue remains open - the typical maintainance workflow uses issues for tracking purposes, and in this case, an issue is existing and open.

I see your point but in my opinion PR can stay open as long as its relevant and up to date to get merged. But I know now, why you closed, that's ok for me as an explanation :)

Now, regarding the ideological aspect.

I close PRs after a set time because I think it's disrespectful from maintainers to wholesale ignore conversations. I'm perfectly fine if an issue/PR is closed by the maintainers because:

You're a bit harsh with that opinion, maintainers life is hard I think.
I agree that contributions in any form shouldn't be ignored completely, at least leaving a comment as a maintainer should be the minimal effort/thing to be done.

Having said that, back to practice: Terraform plugins are trivial to build and use. In this case, it's just a matter of cloning the fork, building the plugin, and placing it in the appropriate directory. At my company, we keep those binaries in the SCM, so using them is friction-free. No need to chase careless maintainers 😄.

But its better to keep changes upstream and forks tend to outdate very fast. But I'm aware of that its trivial to build and use it.

I find this an extremely important topic. There should be a maintainer's code of conduct, not only the contributors' 😉.

In my opinion maintainers are also contributors and should follow these principles. But maybe you are right and there needs to be an extended version aiming for maintainers 😄

I'm not sure what you mean with this; we've been using a compiled version of that branch/PR since it was published, until now - and now we're using Terraform v0.12.19 without any problems, so I think this should work for everybody.

I just wasn't sure which fork/branch you were referring, but I figured it out. So everything fine :) 👍

Thanks for your detailed answer. I got your overall opinion and understand it.
I want to avoid that its gets off topic too much here, in case your PR gets reopened (which would be a great thing).

64kramsystem · 2020-03-24T09:00:17Z

Sure, but I think its not a nice gesture to copy the commit and open a new PR.
And I contributed in the past too, so I'm capable of creating a PR using git, but thank you for the details

Sorry if that sounded patronizing - part of my job is to write scripts, so it's kind of a reflex 😬 .

Regarding the discussion as a whole, somebody, on the referenced issue, pinged the maintaners as you were suggesting, which I think is the best idea - hopefully they'll at least participate to the discussion 🙂

To clarify, I'm a maintaner as well (of other projects) 😬, so I understand the problems 🙂

appilon · 2020-05-12T20:09:54Z

Hello @saveriomiroddi. Our sincerest apologies for not engaging sooner on the longstanding #34 and for not addressing the proposed fix here, we sincerely appreciate your contributions in the Terraform Ecosystem.

With regards to this PR, after I reviewed it, I do think it would have been hard to approve a generic "normalization" formula. What this PR does reveal though are the areas of an archive that can create differences in the computed hash output. So I think we will look to open up things like file mode, compression algorithm, and modification date as configurable options to help address problems such as #34.

64kramsystem · 2020-05-12T20:40:38Z

Hello @saveriomiroddi. Our sincerest apologies for not engaging sooner on the longstanding #34 and for not addressing the proposed fix here, we sincerely appreciate your contributions in the Terraform Ecosystem.

With regards to this PR, after I reviewed it, I do think it would have been hard to approve a generic "normalization" formula. What this PR does reveal though are the areas of an archive that can create differences in the computed hash output. So I think we will look to open up things like file mode, compression algorithm, and modification date as configurable options to help address problems such as #34.

Hello! Thanks for the attention. It makes sense to make the parameters granular and available.

Ultimately, achieving the objective can be "not-complex" (of course, with the assumption the our tested use cases cover a large portion of, or hopefully all, the potential differences between archives), since the parameters involved are only three (compression, modification time and permissions).

sveniu · 2020-10-12T18:52:17Z

archive/zip_archiver.go

+// - fixed file permissions.
+//
+func normalizeCompressingFile(fh *zip.FileHeader) {
+	fh.Method = zip.Store


This causes a very sneaky compatibility issue with Java's standard java.util.zip, resulting in:

java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor at java.base/java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:311) at java.base/java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:123) at com.meh.test.UnzipFiles.unzip(UnzipFiles.java:30) at com.meh.test.UnzipFiles.main(UnzipFiles.java:17)

The quick summary seems to be: Go's zip uses streaming, meaning it first writes the file entry body, and then writes a so-called "ext" data descriptor with size information. Then, java.util.zip can extract these if they use DEFLATE, since DEFLATE encodes data in such a way that you can know when you have all the data. STORE, on the other hand, has no such encoded info, and Java fails to handle it.

Details:

https://bugs.openjdk.java.net/browse/JDK-8143613

https://github.com/30x/zipper

https://groups.google.com/g/golang-nuts/c/0iae5Ng-I-0

https://stackoverflow.com/questions/47208272/android-zipinputstream-only-deflated-entries-can-have-ext-descriptor

Impact case: The Amazon ECS (Blue/Green) deploy provider of AWS CodePipeline raises an exception if a normalized zip file is fed to it:

Exception while trying to read the task definition artifact file from: <source stage>

We use archive_file to produce the zip, containing appspec.yaml and taskdef.json files for CodeDeploy and ECS, respectively.

AWS services seem to be heavily Java based, so while we don't see the full Java exception output, it seems likely that this is the problem.

github-actions · 2024-05-27T13:53:44Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

ghost added the size/L label Jul 5, 2019

64kramsystem mentioned this pull request Jul 5, 2019

archive_file produces different results on different OSs #34

Closed

bboe reviewed Jul 5, 2019

View reviewed changes

64kramsystem added 3 commits July 12, 2019 11:45

Add missing line separator in zip_archiver_test.go

6a2a1fd

Add helper API ensureFileChecksum() to zip_archiver_test.go

4ff3294

64kramsystem force-pushed the sav-normalize_files_metadata_option branch from 0bfacd0 to 347a1af Compare July 12, 2019 09:46

64kramsystem closed this Oct 10, 2019

ThinkBriK mentioned this pull request Oct 25, 2019

Hash-code of lambda function code file changes if run in diffrent shell/os #48

Open

appilon mentioned this pull request May 12, 2020

Force file mode inside archive #53

Closed

EamonHetherton mentioned this pull request Sep 2, 2020

Support directories in file* crypto functions or introduce directory equivalents hashicorp/terraform#26091

Closed

sveniu reviewed Oct 12, 2020

View reviewed changes

This was referenced Oct 12, 2020

Zip files are incompatible with Java's java.util.zip 64kramsystem/terraform-provider-archive-dev#1

Open

[ECS/CodeDeploy] [request]: Support uncompressed streaming zip aws/containers-roadmap#1112

Open

github-actions bot locked as resolved and limited conversation to collaborators May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for normalizing the archived files metadata #47

Add support for normalizing the archived files metadata #47

64kramsystem commented Jul 5, 2019 •

edited

Loading

bboe Jul 5, 2019 •

edited

Loading

64kramsystem Jul 12, 2019 •

edited

Loading

64kramsystem commented Jul 12, 2019 •

edited

Loading

sveniu commented Oct 10, 2019

64kramsystem commented Oct 10, 2019

64kramsystem commented Oct 10, 2019

callaa3 commented Jan 9, 2020

64kramsystem commented Jan 12, 2020

WtfJoke commented Feb 17, 2020 •

edited

Loading

64kramsystem commented Feb 21, 2020 •

edited

Loading

64kramsystem commented Feb 21, 2020

WtfJoke commented Mar 6, 2020

64kramsystem commented Mar 24, 2020

appilon commented May 12, 2020 •

edited

Loading

64kramsystem commented May 12, 2020

sveniu Oct 12, 2020

github-actions bot commented May 27, 2024

Add support for normalizing the archived files metadata #47

Add support for normalizing the archived files metadata #47

Conversation

64kramsystem commented Jul 5, 2019 • edited Loading

bboe Jul 5, 2019 • edited Loading

Choose a reason for hiding this comment

64kramsystem Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

64kramsystem commented Jul 12, 2019 • edited Loading

sveniu commented Oct 10, 2019

64kramsystem commented Oct 10, 2019

64kramsystem commented Oct 10, 2019

callaa3 commented Jan 9, 2020

64kramsystem commented Jan 12, 2020

WtfJoke commented Feb 17, 2020 • edited Loading

64kramsystem commented Feb 21, 2020 • edited Loading

64kramsystem commented Feb 21, 2020

WtfJoke commented Mar 6, 2020

64kramsystem commented Mar 24, 2020

appilon commented May 12, 2020 • edited Loading

64kramsystem commented May 12, 2020

sveniu Oct 12, 2020

Choose a reason for hiding this comment

github-actions bot commented May 27, 2024

64kramsystem commented Jul 5, 2019 •

edited

Loading

bboe Jul 5, 2019 •

edited

Loading

64kramsystem Jul 12, 2019 •

edited

Loading

64kramsystem commented Jul 12, 2019 •

edited

Loading

WtfJoke commented Feb 17, 2020 •

edited

Loading

64kramsystem commented Feb 21, 2020 •

edited

Loading

appilon commented May 12, 2020 •

edited

Loading