Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-prefetch-git: fix determinism with leaveDotGit #4767

Closed

Conversation

bjornfor
Copy link
Contributor

@bjornfor bjornfor commented Nov 1, 2014

Remove all remote branches, remove tags not reachable from the given
'rev', do a full repack and then garbage collect unreferenced objects.

According to my testing, the result is fully deterministic. As in "any
change done to the upstream repo, ahead of 'rev', will not affect the
output hash". But only time will tell.

A new version of git can of course change store format, but that's
unavoidable.

For big repositories, the repack operation may be a bit heavy. But as
far as I can see there is no cheaper way to determinism.

@peti
Copy link
Member

peti commented Nov 1, 2014

👍

@bjornfor bjornfor force-pushed the fix-fetchgit-with-leavedotgit branch from 58f294d to 26905cb Compare November 1, 2014 17:18
@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 1, 2014

Oh, one more possible source of non-determinism: origin/HEAD. Will do some testing...

@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 1, 2014

Don't merge yet....

@bjornfor bjornfor force-pushed the fix-fetchgit-with-leavedotgit branch 6 times, most recently from a9557f5 to 81c9513 Compare November 1, 2014 23:12
@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 1, 2014

Seems fine now (fully deterministic). Unfortunately, removing the index (already done in nixpkgs master) makes the tree "dirty" (as reported by "git describe --dirty").

@bjornfor bjornfor force-pushed the fix-fetchgit-with-leavedotgit branch from 81c9513 to 30f938b Compare November 1, 2014 23:24
@madjar
Copy link
Member

madjar commented Nov 2, 2014

That's great! Happy to see I'm wrong with #4752.

As for the index, the non-determinism comes from timestamps. I started studying the binary format of the index (a fixed sequence that is repeated, and a checksum at the end), but it didn't seem worth it to go in that direction.

Another way might be to find a way to run git with a fake time, but I don't know if it's doable.

Add more files to the delete list:

 * .git/FETCH_HEAD
 * .git/ORIG_HEAD
 * .git/refs/remotes/origin/HEAD
 * .git/config

Further, remove all remote branches, remove tags not reachable from the
given 'rev', do a full repack and then garbage collect unreferenced
objects.

According to my testing, the result is fully deterministic. As in "any
change done to the upstream repo, ahead of 'rev', will not affect the
hash of the resulting 'clone'". Even changing the clone URL will not
change the output hash, because .git/config is removed.

A new version of git can of course change store format, but that's
unavoidable.

For big repositories, the repack operation may be a bit heavy. But as
far as I can see there is no cheaper way to determinism.
@bjornfor bjornfor force-pushed the fix-fetchgit-with-leavedotgit branch from 30f938b to d94ca33 Compare November 2, 2014 10:33
@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 2, 2014

@madjar: Yes, I was thinking of libfaketime too. I think I'll merge this and then play with fake time.

@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 2, 2014

Pushed to master (53614cf).

@bjornfor bjornfor closed this Nov 2, 2014
@bjornfor bjornfor deleted the fix-fetchgit-with-leavedotgit branch November 2, 2014 12:25
@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 2, 2014

@madjar: Using libfaketime fixes the timestamps in the index. But there are more "bad things" (for determinsim) stored in the index:

$ git ls-files --debug
README.txt
  ctime: -3600:0
  mtime: -3600:0
  dev: 2049 ino: 765485
  uid: 1000 gid: 100
  size: 6   flags: 0

dev, ino (inode number), uid and gid are not deterministic.

For reference, here is the index format description: https://github.com/git/git/blob/master/Documentation/technical/index-format.txt.

@madjar
Copy link
Member

madjar commented Nov 2, 2014

Okay, then death to the index. There's no use trying to tweak the values, because it will probably result in a dirty index as before (or worst).

@bjornfor
Copy link
Contributor Author

bjornfor commented Nov 3, 2014

Last fix I hope: 96cacf0 ("nix-prefetch-git: run single-threaded 'git repack'").

mweinelt added a commit to mweinelt/nixpkgs that referenced this pull request Mar 8, 2020
Version 1.1.11 (2020-03-08)

Compatibility notes:

    When upgrading from borg 1.0.x to 1.1.x, please note:
        read all the compatibility notes for 1.1.0*, starting from 1.1.0b1.
        borg upgrade: you do not need to and you also should not run it.
        borg might ask some security-related questions once after upgrading. You can answer them either manually or via environment variable. One known case is if you use unencrypted repositories, then it will ask about a unknown unencrypted repository one time.
        your first backup with 1.1.x might be significantly slower (it might completely read, chunk, hash a lot files) - this is due to the --files-cache mode change (and happens every time you change mode). You can avoid the one-time slowdown by using the pre-1.1.0rc4-compatible mode (but that is less safe for detecting changed files than the default). See the --files-cache docs for details.
    1.1.11 removes WSL autodetection (Windows 10 Subsystem for Linux). If WSL still has a problem with sync_file_range, you need to set BORG_WORKAROUNDS=basesyncfile in the borg process environment to work around the WSL issue.

Fixes:

    fixed potential index corruption / data loss issue due to bug in hashindex_set, NixOS#4829 Please read and follow the more detailled notes close to the top of this document.
    upgrade bundled xxhash to 0.7.3, NixOS#4891 0.7.2 is the minimum requirement for correct operations on ARMv6 in non-fixup mode, where unaligned memory accesses cause bus errors. 0.7.3 adds some speedups and libxxhash 0.7.3 even has a pkg-config file now.
    upgrade bundled lz4 to 1.9.2
    upgrade bundled zstd to 1.4.4
    fix crash when upgrading erroneous hints file, NixOS#4922
    extract:
        fix KeyError for "partial" extraction, NixOS#4607
        fix "partial" extract for hardlinked contentless file types, NixOS#4725
        fix preloading for old (0.xx) remote servers, NixOS#4652
        fix confusing output of borg extract --list --strip-components, NixOS#4934
    delete: after double-force delete, warn about necessary repair, NixOS#4704
    create: give invalid repo error msg if repo config not found, NixOS#4411
    mount: fix FUSE mount missing st_birthtime, NixOS#4763 NixOS#4767
    check: do not stumble over invalid item key, NixOS#4845
    info: if the archive doesn't exist, print a pretty message, NixOS#4793
    SecurityManager.known(): check all files, NixOS#4614
    Repository.open: use stat() to check for repo dir, NixOS#4695
    Repository.check_can_create_repository: use stat() to check, NixOS#4695
    fix invalid archive error message
    fix optional/non-optional location arg, NixOS#4541
    commit-time free space calc: ignore bad compact map entries, NixOS#4796
    ignore EACCES (errno 13) when hardlinking the old config, NixOS#4730
    --prefix / -P: fix processing, avoid argparse issue, NixOS#4769

New features:

    enable placeholder usage in all extra archive arguments
    new BORG_WORKAROUNDS mechanism, basesyncfile, NixOS#4710
    recreate: support --timestamp option, NixOS#4745
    support platforms without os.link (e.g. Android with Termux), NixOS#4901 if we don't have os.link, we just extract another copy instead of making a hardlink.
    support linux platforms without sync_file_range (e.g. Android 7 with Termux), NixOS#4905

Other:

    ignore --stats when given with --dry-run, but continue, NixOS#4373
    add some ProgressIndicator msgids to code / fix docs, NixOS#4935
    elaborate on "Calculating size" message
    argparser: always use REPOSITORY in metavar, also use more consistent help phrasing.
    check: improve error output for matching index size, see NixOS#4829
    docs:
        changelog: add advisory about hashindex_set bug NixOS#4829
        better describe BORG_SECURITY_DIR, BORG_CACHE_DIR, NixOS#4919
        infos about cache security assumptions, NixOS#4900
        add FAQ describing difference between a local repo vs. repo on a server.
        document how to test exclusion patterns without performing an actual backup
        timestamps in the files cache are now usually ctime, NixOS#4583
        fix bad reference to borg compact (does not exist in 1.1), NixOS#4660
        create: borg 1.1 is not future any more
        extract: document limitation "needs empty destination", NixOS#4598
        how to supply a passphrase, use crypto devices, NixOS#4549
        fix osxfuse github link in installation docs
        add example of exclude-norecurse rule in help patterns
        update macOS Brew link
        add note about software for automating backups, NixOS#4581
        AUTHORS: mention copyright+license for bundled msgpack
        fix various code blocks in the docs, NixOS#4708
        updated docs to cover use of temp directory on remote, NixOS#4545
        add restore docs, NixOS#4670
        add a pull backup / push restore how-to, NixOS#1552
        add FAQ how to retain original paths, NixOS#4532
        explain difference between --exclude and --pattern, NixOS#4118
        add FAQs for SSH connection issues, NixOS#3866
        improve password FAQ, NixOS#4591
        reiterate that 'file cache names are absolute' in FAQ
    tests:
        cope with ANY error when importing pytest into borg.testsuite, NixOS#4652
        fix broken test that relied on improper zlib assumptions
        test_fuse: filter out selinux xattrs, NixOS#4574
    travis / vagrant:
        misc python versions removed / changed (due to openssl 1.1 compatibility) or added (3.7 and 3.8, for better borg compatibility testing)
        binary building is on python 3.5.9 now
    vagrant:
        add new boxes: ubuntu 18.04 and 20.04, debian 10
        update boxes: openindiana, darwin, netbsd
        remove old boxes: centos 6
        darwin: updated osxfuse to 3.10.4
        use debian/ubuntu pip/virtualenv packages
        rather use python 3.6.2 than 3.6.0, fixes coverage/sqlite3 issue
        use requirements.d/development.lock.txt to avoid compat issues
    travis:
        darwin: backport some install code / order from master
        remove deprecated keyword "sudo" from travis config
        allow osx builds to fail, NixOS#4955 this is due to travis-ci frequently being so slow that the OS X builds just fail because they exceed 50 minutes and get killed by travis.
mweinelt added a commit to mweinelt/nixpkgs that referenced this pull request Mar 8, 2020
Version 1.1.11 (2020-03-08)

Compatibility notes:

    When upgrading from borg 1.0.x to 1.1.x, please note:
        read all the compatibility notes for 1.1.0*, starting from 1.1.0b1.
        borg upgrade: you do not need to and you also should not run it.
        borg might ask some security-related questions once after upgrading. You can answer them either manually or via environment variable. One known case is if you use unencrypted repositories, then it will ask about a unknown unencrypted repository one time.
        your first backup with 1.1.x might be significantly slower (it might completely read, chunk, hash a lot files) - this is due to the --files-cache mode change (and happens every time you change mode). You can avoid the one-time slowdown by using the pre-1.1.0rc4-compatible mode (but that is less safe for detecting changed files than the default). See the --files-cache docs for details.
    1.1.11 removes WSL autodetection (Windows 10 Subsystem for Linux). If WSL still has a problem with sync_file_range, you need to set BORG_WORKAROUNDS=basesyncfile in the borg process environment to work around the WSL issue.

Fixes:

    fixed potential index corruption / data loss issue due to bug in hashindex_set, NixOS#4829 Please read and follow the more detailled notes close to the top of this document.
    upgrade bundled xxhash to 0.7.3, NixOS#4891 0.7.2 is the minimum requirement for correct operations on ARMv6 in non-fixup mode, where unaligned memory accesses cause bus errors. 0.7.3 adds some speedups and libxxhash 0.7.3 even has a pkg-config file now.
    upgrade bundled lz4 to 1.9.2
    upgrade bundled zstd to 1.4.4
    fix crash when upgrading erroneous hints file, NixOS#4922
    extract:
        fix KeyError for "partial" extraction, NixOS#4607
        fix "partial" extract for hardlinked contentless file types, NixOS#4725
        fix preloading for old (0.xx) remote servers, NixOS#4652
        fix confusing output of borg extract --list --strip-components, NixOS#4934
    delete: after double-force delete, warn about necessary repair, NixOS#4704
    create: give invalid repo error msg if repo config not found, NixOS#4411
    mount: fix FUSE mount missing st_birthtime, NixOS#4763 NixOS#4767
    check: do not stumble over invalid item key, NixOS#4845
    info: if the archive doesn't exist, print a pretty message, NixOS#4793
    SecurityManager.known(): check all files, NixOS#4614
    Repository.open: use stat() to check for repo dir, NixOS#4695
    Repository.check_can_create_repository: use stat() to check, NixOS#4695
    fix invalid archive error message
    fix optional/non-optional location arg, NixOS#4541
    commit-time free space calc: ignore bad compact map entries, NixOS#4796
    ignore EACCES (errno 13) when hardlinking the old config, NixOS#4730
    --prefix / -P: fix processing, avoid argparse issue, NixOS#4769

New features:

    enable placeholder usage in all extra archive arguments
    new BORG_WORKAROUNDS mechanism, basesyncfile, NixOS#4710
    recreate: support --timestamp option, NixOS#4745
    support platforms without os.link (e.g. Android with Termux), NixOS#4901 if we don't have os.link, we just extract another copy instead of making a hardlink.
    support linux platforms without sync_file_range (e.g. Android 7 with Termux), NixOS#4905

Other:

    ignore --stats when given with --dry-run, but continue, NixOS#4373
    add some ProgressIndicator msgids to code / fix docs, NixOS#4935
    elaborate on "Calculating size" message
    argparser: always use REPOSITORY in metavar, also use more consistent help phrasing.
    check: improve error output for matching index size, see NixOS#4829
    docs:
        changelog: add advisory about hashindex_set bug NixOS#4829
        better describe BORG_SECURITY_DIR, BORG_CACHE_DIR, NixOS#4919
        infos about cache security assumptions, NixOS#4900
        add FAQ describing difference between a local repo vs. repo on a server.
        document how to test exclusion patterns without performing an actual backup
        timestamps in the files cache are now usually ctime, NixOS#4583
        fix bad reference to borg compact (does not exist in 1.1), NixOS#4660
        create: borg 1.1 is not future any more
        extract: document limitation "needs empty destination", NixOS#4598
        how to supply a passphrase, use crypto devices, NixOS#4549
        fix osxfuse github link in installation docs
        add example of exclude-norecurse rule in help patterns
        update macOS Brew link
        add note about software for automating backups, NixOS#4581
        AUTHORS: mention copyright+license for bundled msgpack
        fix various code blocks in the docs, NixOS#4708
        updated docs to cover use of temp directory on remote, NixOS#4545
        add restore docs, NixOS#4670
        add a pull backup / push restore how-to, NixOS#1552
        add FAQ how to retain original paths, NixOS#4532
        explain difference between --exclude and --pattern, NixOS#4118
        add FAQs for SSH connection issues, NixOS#3866
        improve password FAQ, NixOS#4591
        reiterate that 'file cache names are absolute' in FAQ
    tests:
        cope with ANY error when importing pytest into borg.testsuite, NixOS#4652
        fix broken test that relied on improper zlib assumptions
        test_fuse: filter out selinux xattrs, NixOS#4574
    travis / vagrant:
        misc python versions removed / changed (due to openssl 1.1 compatibility) or added (3.7 and 3.8, for better borg compatibility testing)
        binary building is on python 3.5.9 now
    vagrant:
        add new boxes: ubuntu 18.04 and 20.04, debian 10
        update boxes: openindiana, darwin, netbsd
        remove old boxes: centos 6
        darwin: updated osxfuse to 3.10.4
        use debian/ubuntu pip/virtualenv packages
        rather use python 3.6.2 than 3.6.0, fixes coverage/sqlite3 issue
        use requirements.d/development.lock.txt to avoid compat issues
    travis:
        darwin: backport some install code / order from master
        remove deprecated keyword "sudo" from travis config
        allow osx builds to fail, NixOS#4955 this is due to travis-ci frequently being so slow that the OS X builds just fail because they exceed 50 minutes and get killed by travis.

(cherry picked from commit dbff9b5)
mweinelt added a commit to mweinelt/nixpkgs that referenced this pull request Mar 8, 2020
Version 1.1.11 (2020-03-08)

Compatibility notes:

    When upgrading from borg 1.0.x to 1.1.x, please note:
        read all the compatibility notes for 1.1.0*, starting from 1.1.0b1.
        borg upgrade: you do not need to and you also should not run it.
        borg might ask some security-related questions once after upgrading. You can answer them either manually or via environment variable. One known case is if you use unencrypted repositories, then it will ask about a unknown unencrypted repository one time.
        your first backup with 1.1.x might be significantly slower (it might completely read, chunk, hash a lot files) - this is due to the --files-cache mode change (and happens every time you change mode). You can avoid the one-time slowdown by using the pre-1.1.0rc4-compatible mode (but that is less safe for detecting changed files than the default). See the --files-cache docs for details.
    1.1.11 removes WSL autodetection (Windows 10 Subsystem for Linux). If WSL still has a problem with sync_file_range, you need to set BORG_WORKAROUNDS=basesyncfile in the borg process environment to work around the WSL issue.

Fixes:

    fixed potential index corruption / data loss issue due to bug in hashindex_set, NixOS#4829 Please read and follow the more detailled notes close to the top of this document.
    upgrade bundled xxhash to 0.7.3, NixOS#4891 0.7.2 is the minimum requirement for correct operations on ARMv6 in non-fixup mode, where unaligned memory accesses cause bus errors. 0.7.3 adds some speedups and libxxhash 0.7.3 even has a pkg-config file now.
    upgrade bundled lz4 to 1.9.2
    upgrade bundled zstd to 1.4.4
    fix crash when upgrading erroneous hints file, NixOS#4922
    extract:
        fix KeyError for "partial" extraction, NixOS#4607
        fix "partial" extract for hardlinked contentless file types, NixOS#4725
        fix preloading for old (0.xx) remote servers, NixOS#4652
        fix confusing output of borg extract --list --strip-components, NixOS#4934
    delete: after double-force delete, warn about necessary repair, NixOS#4704
    create: give invalid repo error msg if repo config not found, NixOS#4411
    mount: fix FUSE mount missing st_birthtime, NixOS#4763 NixOS#4767
    check: do not stumble over invalid item key, NixOS#4845
    info: if the archive doesn't exist, print a pretty message, NixOS#4793
    SecurityManager.known(): check all files, NixOS#4614
    Repository.open: use stat() to check for repo dir, NixOS#4695
    Repository.check_can_create_repository: use stat() to check, NixOS#4695
    fix invalid archive error message
    fix optional/non-optional location arg, NixOS#4541
    commit-time free space calc: ignore bad compact map entries, NixOS#4796
    ignore EACCES (errno 13) when hardlinking the old config, NixOS#4730
    --prefix / -P: fix processing, avoid argparse issue, NixOS#4769

New features:

    enable placeholder usage in all extra archive arguments
    new BORG_WORKAROUNDS mechanism, basesyncfile, NixOS#4710
    recreate: support --timestamp option, NixOS#4745
    support platforms without os.link (e.g. Android with Termux), NixOS#4901 if we don't have os.link, we just extract another copy instead of making a hardlink.
    support linux platforms without sync_file_range (e.g. Android 7 with Termux), NixOS#4905

Other:

    ignore --stats when given with --dry-run, but continue, NixOS#4373
    add some ProgressIndicator msgids to code / fix docs, NixOS#4935
    elaborate on "Calculating size" message
    argparser: always use REPOSITORY in metavar, also use more consistent help phrasing.
    check: improve error output for matching index size, see NixOS#4829
    docs:
        changelog: add advisory about hashindex_set bug NixOS#4829
        better describe BORG_SECURITY_DIR, BORG_CACHE_DIR, NixOS#4919
        infos about cache security assumptions, NixOS#4900
        add FAQ describing difference between a local repo vs. repo on a server.
        document how to test exclusion patterns without performing an actual backup
        timestamps in the files cache are now usually ctime, NixOS#4583
        fix bad reference to borg compact (does not exist in 1.1), NixOS#4660
        create: borg 1.1 is not future any more
        extract: document limitation "needs empty destination", NixOS#4598
        how to supply a passphrase, use crypto devices, NixOS#4549
        fix osxfuse github link in installation docs
        add example of exclude-norecurse rule in help patterns
        update macOS Brew link
        add note about software for automating backups, NixOS#4581
        AUTHORS: mention copyright+license for bundled msgpack
        fix various code blocks in the docs, NixOS#4708
        updated docs to cover use of temp directory on remote, NixOS#4545
        add restore docs, NixOS#4670
        add a pull backup / push restore how-to, NixOS#1552
        add FAQ how to retain original paths, NixOS#4532
        explain difference between --exclude and --pattern, NixOS#4118
        add FAQs for SSH connection issues, NixOS#3866
        improve password FAQ, NixOS#4591
        reiterate that 'file cache names are absolute' in FAQ
    tests:
        cope with ANY error when importing pytest into borg.testsuite, NixOS#4652
        fix broken test that relied on improper zlib assumptions
        test_fuse: filter out selinux xattrs, NixOS#4574
    travis / vagrant:
        misc python versions removed / changed (due to openssl 1.1 compatibility) or added (3.7 and 3.8, for better borg compatibility testing)
        binary building is on python 3.5.9 now
    vagrant:
        add new boxes: ubuntu 18.04 and 20.04, debian 10
        update boxes: openindiana, darwin, netbsd
        remove old boxes: centos 6
        darwin: updated osxfuse to 3.10.4
        use debian/ubuntu pip/virtualenv packages
        rather use python 3.6.2 than 3.6.0, fixes coverage/sqlite3 issue
        use requirements.d/development.lock.txt to avoid compat issues
    travis:
        darwin: backport some install code / order from master
        remove deprecated keyword "sudo" from travis config
        allow osx builds to fail, NixOS#4955 this is due to travis-ci frequently being so slow that the OS X builds just fail because they exceed 50 minutes and get killed by travis.

(cherry picked from commit dbff9b5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants