-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc10: use hash digests in raw message payloads not blobrefs #330
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for conscientiously keeping these RFCs up to date with the implementation!
Problem: RFC 10 refers to the well understood properties of content addressable storage but does not list them. List some properties that are interesting in this application.
Problem: in our implementation we have decided to restrict direct access to the content service to instance owner only but this is not reflected in the RFC. Mention guest restriction.
Problem: the rfc10 implementation section says nothing about hash algorithm selection. List some hash algorithm requirements, note that the hash algorithm is fixed for the lifetime of the instance, and note that the hash selection is available via broker attribute.
Problem: The content RPCs are defined as carrying space-inefficient blobrefs as raw payload rather than message digests, but the need to support multiple hash algorithms within an instance is gone. Use the space-efficient message digest in the load and store RPCs.
Problem: rfc10 references the camlistore project, but it was renamed to perkeep. Update references.
Ooops just noticed a punctuation problem and fixed it, forcing a push. |
Oops accidentally "resolved" the conversation to which I was replying. I meant to say that yes, I did see that this PR proposes a change in the protocol, but it is nice to see the RFC updated before or along with the implementation! |
I'll set MWP - thanks. |
This changes the content load and store message definitions to carry raw hash digests rather than blobrefs, which reduces message size and processing overhead slightly. This simplification is possible because we no longer expect to need to support multiple hash algorithms within a given instance, nor sharing of raw content between instances. Also, we have restricted access by guests and recast the service as a KVS layer rather than general purpose.
It also feels like protocol cleanup, given that the blobref strings are being sent as raw payload. If we're not sending them as strings within a JSON object, why not just send the more space efficient raw representation?
I also added some info that fills out this rather sparse RFC a bit.
I do have a prototype branch which implements these changes and uses the hash digest to as a zhashx key in the broker content cache, rather than running blobrefs through the default Bernstein hash that is the zhashx default which is...redundant. I didn't notice any difference in the job throughput numbers in this branch though FWIW. This still seems like good cleanup however.