Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets: file order in zip archive is inconsistent #2759

Closed
spg opened this issue Jun 5, 2019 · 5 comments · Fixed by #2931
Closed

Assets: file order in zip archive is inconsistent #2759

spg opened this issue Jun 5, 2019 · 5 comments · Fixed by #2931
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@spg
Copy link
Contributor

spg commented Jun 5, 2019

Describe the bug
When the assets module builds zip archives, file order in the zip file is inconsistent, resulting in variations in calculated hashes of the zip files. This causes unexpected uploads of assets to S3, which, in turn, might update CDK resources (Lambda functions, Lambda layers) that use those assets.

To Reproduce
Run the following:

import fs = require('fs-extra');
import {contentHash, zipDirectory} from "aws-cdk/lib/archive";

async function main() {
  await zipDirectory('myDir', 'a.zip');
  await zipDirectory('myDir', 'b.zip');

  const aData = await fs.readFile('a.zip');
  const bData = await fs.readFile('b.zip');
  console.log(aData.compare(bData));
  console.log(contentHash(aData));
  console.log(contentHash(bData));
}

main().then();

Assuming that myDir is a directory containing more than 1 file, running the above script will print differing hashes. Here's an example:

> npx ts-node testzip.ts
1
2442619fe6e58701327941fcc574e2ab409340f97e8e3f7d5ad29798d7247544
b35cf71950f741c9fa4ddf8c9ad7fe2f5bf36164813b38d3f6b55e00b2c479b0
> unzip -l a.zip
Archive:  a.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  06-05-2019 15:26   1.txt
        0  06-05-2019 15:56   10.txt
        0  06-05-2019 15:26   2.txt
        0  06-05-2019 15:56   3.txt
        0  06-05-2019 15:56   4.txt
        0  06-05-2019 15:56   5.txt
        0  06-05-2019 15:56   6.txt
        0  06-05-2019 15:56   8.txt
        0  06-05-2019 15:56   9.txt
---------                     -------
        0                     9 files
> unzip -l b.zip
Archive:  b.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  06-05-2019 15:26   1.txt
        0  06-05-2019 15:56   10.txt
        0  06-05-2019 15:56   3.txt
        0  06-05-2019 15:26   2.txt
        0  06-05-2019 15:56   4.txt
        0  06-05-2019 15:56   5.txt
        0  06-05-2019 15:56   6.txt
        0  06-05-2019 15:56   8.txt
        0  06-05-2019 15:56   9.txt
---------                     -------
        0                     9 files
>

Note how the order of the files in a.zip differs from b.zip.

Expected behavior
Creating many zip archives of the same directory should yield consistent hashes.

Version:

  • OS: OS X 10.14.15
  • Programming Language: Typescript
  • CDK Version: 0.33.0
@jogold
Copy link
Contributor

jogold commented Jun 5, 2019

Interesting.

Another problem with zip archives is when you have some kind of build step generating files in the directory to be zipped: even if the generated content is exactly the same, the hash of the zip file will change because the creation date/times are updated.

@spg
Copy link
Contributor Author

spg commented Jun 6, 2019

You are right, this is another problem.

One alternate approach would be to calculate the hash of an asset based on the md5 of every file in the asset's directory. I'm not sure what would be the performance impact (for directory hierarchies containing a large number of files, or large files) for CDK users though.

@jogold
Copy link
Contributor

jogold commented Jun 6, 2019

Or find a way to zip a directory and have all file timestamps in it set to Epoch.

@RomainMuller
Copy link
Contributor

I believe our archiver already neuters the timestamps (setting them to some fixed date far in the past, although not Epoch because this causes problems on some certain versions of certain platforms -- looking at you, Windows).

It would be worth writing a test that confirms this is the case however... As well as ensuring files are listed in alphabetical order, so the files are consistent.

@jogold
Copy link
Contributor

jogold commented Jun 7, 2019

From the output above (unzip -l), it seems that datetimes are preserved (haven't tried it myself).

@NGL321 NGL321 added the needs-triage This issue or PR still needs to be triaged. label Jun 17, 2019
jogold added a commit to jogold/aws-cdk that referenced this issue Jun 19, 2019
Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.

Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantess same hash for same content).

Closes aws#1997, Closes aws#2759
jogold added a commit to jogold/aws-cdk that referenced this issue Jun 19, 2019
Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.

Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantees same hash for same content).

Closes aws#1997, Closes aws#2759
rix0rrr pushed a commit that referenced this issue Jun 21, 2019
Zip files were not consistent across deploys resulting in unnecessary S3 uploads and stack updates.

Ensure consistency by appending files in series (guarantees file ordering in the zip) and reseting
dates (guarantees same hash for same content).

Closes #1997, Closes #2759
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants