-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using checksums that aren't SHA-256 #901
Conversation
// This lookup should never fail -- we should already have validated in | ||
// the BagReader class and the bag verifier that if an algorithm is used, | ||
// it is used to describe every file in the bag. | ||
val checksum = multiChecksum.getValue(algorithm).get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should still throw a sensible error though?
* | ||
* Implementation notes: | ||
* | ||
* - We know the tag manifest is empty because it has to describe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* - We know the tag manifest is empty because it has to describe | |
* - We know the tag manifest is not empty because it has to describe |
throw MultiChecksumException.NoChecksum | ||
} | ||
|
||
// We support MD5 and SHA1 for backwards compatibility (see RFC 8493 § 2.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if linking directly to the IETF version of the manifest would be useful (here and elsewhere).
I wonder if we should check that the |
This design seems sensible to me. Interesting to compare this implementation to https://github.com/LibraryOfCongress/bagit-java I wonder if we should be pushing our verification code in a direction that makes it less dependent underlying storage provider (in our case we require S3, the LOC version requires a filesystem). |
Sorry a bit late to this, but it seems like a really good idea to me. This bit of work is turning into a good opportunity for us to make sure we are compliant with the 1.0 spec. Can't speak to the code, but the logic seems sensible. |
Closes #900
This is taking us towards the BagIt 1.0 definition of a "complete" bag:
Currently we only support SHA-256 manifests: we'll ignore checksums in other manifest files, and if a bag doesn't include SHA-256 checksums we're unable to verify/register it.
This patch implements full support for SHA-256 and SHA-512, and partial support for MD5 an SHA-1. In particular:
In a bit more detail, we add the following new checks to follow the BagIt spec:
manifest-md5.txt
andmanifest-sha256.txt
would fail.The bag verifier is an incredibly important service – a mistake in this code would be catastrophic. The patch is too big to safely review as-is, so I'm opening this PR to get high-level feedback. Is this design sensible? Have I missed anything? Is there anything you'd like to double-check?
I'll gradually break this up into smaller PRs for proper review.