-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File hashes (MD5) #398
Comments
Particularly wedded to MD5, or something more modern/secure okay? SHA-2? |
I wonder if we can align with the W3C's Subresource Integrity specification. Perhaps we add an "integrity" member to the manifest? And then use algo+hash as in the spec? {
"type": "LinkedResource",
"url": "fonts/STIXGeneral.otf",
"encodingFormat": "application/vnd.ms-opentype",
"integrity": "sha384-dOTZf16X8p34q2/kYyEFm0jh89uTjikhnzjeLeF0FHsEaYKb1A1cv+Lyv4Hk8vHd"
}
|
Can you give some details? Do you store a hash for each resource separately, or one global hash (some sort of a merkle tree)? |
B.t.w., this should probably be a 'core' manifest feature and not audio specific. |
See https://github.com/blackstoneaudio/audiobook-spec/blob/master/draft.yaml |
@GarthConboy We use MD5s because they are cheap to compute, and they are used for download validation not security. Same purpose as 'eTag' keys in S3 or equivalent. We use the word 'md5' as the key, but anything equivalent (hash, checksum, etc) would be fine, as long as we know what it is. For us the string value would be an MD5. For others it could be anything. @dauwhe That seems very reasonable to me. @iherman We store a hash for each resource separately see https://github.com/blackstoneaudio/audiobook-spec/blob/cfd468bb27b890b0e4a59a3345e806221a702fce/draft.json#L59 We do also store a 'hash of hashes' which we use as a sort of 'version' see https://github.com/blackstoneaudio/audiobook-spec/blob/cfd468bb27b890b0e4a59a3345e806221a702fce/draft.json#L11 |
Is there anything in schema.org that we could use for that? I don't think that we should be tied to any specific algorithm, which potentially means:
|
Please use the Subresource Integrity syntax. The last thing we need to add to the web platform is yet another way to compute, store, and parse hashes. Use the platform, use existing mechanisms rather than inventing new ones. Also, just because your current use isn't thinking about security doesn't mean future uses wont. Adding weak hashes is doing a disservice to future users. |
I haven't found any... :-( |
I am fully on board with using the |
One question looking at how to integrate this: given that we aren't restricted to an HTML attribute, how do we handle the ability to define multiple hash expressions for each resource? Do we:
|
I think [1] is too limited. [2] has the advantage of being consistent with SRI "integrity: "sha384-dOTZf16X8p34q2/kYyEFm0jh89uTjikhnzjeLeF0FHsEaYKb1A1cv+Lyv4Hk8vHd
sha512-Q2bFTOhEALkN8hOms2FKTDLy7eugP2zFZ1T8LCvX42Fp3WoNr3bjZSAHeOsHrbV1Fu9/A0EzCinRE7Af1ofPrw==" Not sure about [2] vs [3]. |
Ya, this is the particularly tricky thing to answer. We don't have to use whitespace to delimit, but SRI is defined with that expectation. It feels like we should seek input from that spec's authors. |
I guess I lean towards [2] both because of consistency, and because it's way easier to type a space then create an array in JSON. Consider users over authors over implementors over specifiers over theoretical purity ;) |
I agree with @dauwhe's reasoning. In addition, if you use a json array, then you either have to always use an array (even for one value, which is likely to be the most common case, putting an additional burden on authors), or give users the burden of testing for string vs array values. Keeping it entirely consistent with SRI also makes it easier to copy values between the manifest and an attribute should the need ever arise. It also allows the wpub manifest spec to simply refer to the SRI spec and avoid re-specifying something potentially introducing inconsistencies as each spec evolves. |
This issue was discussed in a meeting.
View the transcriptfile hashesWendy Reid: #398 Laurent Le Meur: we just need a name for the resource level property … Wendy Reid: the issue is around file hashes, so content creators can provide identifiable hashes to individual resources … the proposal is to use SRI Ivan Herman: what term should we use … this is not in schema, so we need to pick a term Dave Cramer: Garth brought up the question of requirements on reading systems, it’s a problem in RSs, EPUB has signatures but RSs don’t always understand them … if an integrity hash is present, the UA must check it and terminate processing if it does not pass Brady Duga: hashes are great. If you want to pretend that these have anything to do with security or integrity I object. … they do not provide this at all. … they do not provide security. Laurent Le Meur: I agree with the objection about security. I think it says something about integrity. … I’m worried that some user agents might not be able to deal with any algorithms that is expressed … is there a closed list of algorithms? Dave Cramer: Can someone educate me as to why the SRI spec exists? Ivan Herman: the big difference between SRI on HTML is that there it is mainly used for the JS you bring in when you use external JS … I can’t really answer brady’s concerns … if I trust what I get from a URL as JS, has the same hash that I expected, then I can believe it’s the correct JS … but it may be different for audio files Garth Conboy: I was going to disagree with Dave. I have no objection to this, but don’t want user agents to have to deal with this. Geoff Jukes: it’s doesn’t provide security or integrity. we use it to communicate to our apps that a file was downloaded completely. … we just use it to detect bad downloads. Wendy Reid: do we want to include this? Ivan Herman: how important is this? Geoff Jukes: our apps rely on this utterly. We deliver to cellphones. Not everyone has 5G. We have to deal with unreliable delivery. We’re OK with this in the spec and optional. … . we will use this Wendy Reid: this sounds like something that a distributor/reading system can handle on its own … perhaps we ask other distributors/UAs? Ivan Herman: isn’t that the definition of an optional thing? … we know someone uses it. … is it important to have a standard format? Proposed resolution: add the optional integrity property for linked resources, using the subresource integrity format (Ivan Herman) Wendy Reid: let’s add it as optional Wendy Reid: +1 Garth Conboy: +1 Brady Duga: +1 Geoff Jukes: +1 Laurent Le Meur: +1 Ivan Herman: +1 Bill Kasdorf: +1 Tzviya Siegman: +1 (i think) Joshua Pyle: +1 Tim Cole: +1 Resolution #5: add the optional integrity property for linked resources, using the subresource integrity format |
@llemeurfr asked, during the meeting (quoting the minutes):
The SRI recommendations says in 3.2:
Though we refer to SRI normatively, i.e., we inherit this list, it is probably worth calling this out in our document as well. Cc @mattgarrish |
I'd prefer to avoid duplicating the requirement, if that's what you mean. We can refer across to the list, of course, but once we replicate the statement we put ourselves in the position of falling out of synch. |
@mattgarrish I agree, and I did not mean to repeat the list. Just put a note in the text that there is such a list, with a reference to the Rec. |
Hi,
We use a standalone manifest file to describe our audiobooks, which includes MD5s for all file assets (audio files, artwork, and any supplemental material). The audiobook player use the MD5 to check downloaded assets, and for asset update/invalidation.
Any spec that we adopt must at least allow these hashes somewhere in the dataset.
Geoff
The text was updated successfully, but these errors were encountered: