Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Commit Hash Algorithm #120

Merged
merged 8 commits into from
Sep 10, 2019
Merged

New Commit Hash Algorithm #120

merged 8 commits into from
Sep 10, 2019

Conversation

rlizzo
Copy link
Member

@rlizzo rlizzo commented Sep 6, 2019

Motivation and Context

Why is this change required? What problem does it solve?:

This method fixes some legacy behavior which was used in order to calculate commit hashs. The current implementation calculates the hash of commit record contents based on the compressed and binary packed (via msgpack) lmdb ref value. This is undesireable as we are not calculating the digest based on some intrinsic (and inspectable) hangar record format, but the output of two dependent libraries.

In addition, verification of record integrity is rather lacking in the current implemenation, this is addressed as well.

Description

Describe your changes in detail:

Digests are now calculated based on each lmdb key/value pair (for every record in the commit), The final digest is calculated from hashing each of these digests.

Upon checkout, we verify that the stored record data matches the expected commit hash, and will raise an exception warning of disk issues if it does not match.

This is a WIP at the time of submission, and there are still a few issues to iron out. This is a breaking change to repositories written in any previous version of hangar.

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Documentation update
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Is this PR ready for review, or a work in progress?

  • Ready for review
  • Work in progress

How Has This Been Tested?

Put an x in the boxes that apply:

  • Current tests cover modifications made
  • New tests have been added to the test suite
  • Modifications were made to existing tests to support these changes
  • Tests may be needed, but they are not included when the PR was proposed
  • I don't know. Help!

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have signed (or will sign when prompted) the tensorwork CLA.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

…nable degree at the low end. Just need to make sure that all the higher level calling functions are updated to refspec the new returns. Rather than just returning the raw bytes, we return the digest of the uncompressed bytes alongside them
… by adding special case to verification method
@codecov
Copy link

codecov bot commented Sep 6, 2019

Codecov Report

Merging #120 into master will increase coverage by 0.03%.
The diff coverage is 95.92%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
+ Coverage   92.09%   92.12%   +0.03%     
==========================================
  Files          59       60       +1     
  Lines        9859     9973     +114     
  Branches      990      993       +3     
==========================================
+ Hits         9079     9187     +108     
- Misses        568      574       +6     
  Partials      212      212
Impacted Files Coverage Δ
tests/conftest.py 97.55% <ø> (ø) ⬆️
src/hangar/backends/hdf5_00.py 87.01% <100%> (+0.85%) ⬆️
src/hangar/records/parsing.py 98.51% <100%> (+0.22%) ⬆️
src/hangar/constants.py 100% <100%> (ø) ⬆️
tests/test_arrayset.py 100% <100%> (ø) ⬆️
tests/test_commit_ref_verification.py 100% <100%> (ø)
tests/test_initiate.py 100% <100%> (ø) ⬆️
src/hangar/records/vcompat.py 51.79% <12.5%> (-6.55%) ⬇️
src/hangar/records/commiting.py 92.12% <96.88%> (-0.62%) ⬇️

@rlizzo rlizzo requested a review from hhsecond September 6, 2019 14:59
@rlizzo rlizzo changed the title WIP: New Commit Hash Algorithm New Commit Hash Algorithm Sep 6, 2019
@rlizzo rlizzo added the Awaiting Review Author has determined PR changes area nearly complete and ready for formal review. label Sep 6, 2019
@rlizzo
Copy link
Member Author

rlizzo commented Sep 6, 2019

Hey @hhsecond I fixed up the bugs. this is ready to be merged. Can you give it a once over as a sanity check?

src/hangar/backends/hdf5_00.py Show resolved Hide resolved
@rlizzo rlizzo merged commit c64f32f into tensorwerk:master Sep 10, 2019
@rlizzo rlizzo deleted the cmt-hashing branch October 9, 2019 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Review Author has determined PR changes area nearly complete and ready for formal review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants