Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc10: use hash digests in raw message payloads not blobrefs #330

Merged
merged 5 commits into from
Apr 23, 2022

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 23, 2022

This changes the content load and store message definitions to carry raw hash digests rather than blobrefs, which reduces message size and processing overhead slightly. This simplification is possible because we no longer expect to need to support multiple hash algorithms within a given instance, nor sharing of raw content between instances. Also, we have restricted access by guests and recast the service as a KVS layer rather than general purpose.

It also feels like protocol cleanup, given that the blobref strings are being sent as raw payload. If we're not sending them as strings within a JSON object, why not just send the more space efficient raw representation?

I also added some info that fills out this rather sparse RFC a bit.

I do have a prototype branch which implements these changes and uses the hash digest to as a zhashx key in the broker content cache, rather than running blobrefs through the default Bernstein hash that is the zhashx default which is...redundant. I didn't notice any difference in the job throughput numbers in this branch though FWIW. This still seems like good cleanup however.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for conscientiously keeping these RFCs up to date with the implementation!

spec_10.rst Outdated Show resolved Hide resolved
garlick added 5 commits April 23, 2022 08:48
Problem: RFC 10 refers to the well understood properties of content
addressable storage but does not list them.

List some properties that are interesting in this application.
Problem: in our implementation we have decided to restrict direct
access to the content service to instance owner only but this is
not reflected in the RFC.

Mention guest restriction.
Problem: the rfc10 implementation section says nothing about hash
algorithm selection.

List some hash algorithm requirements, note that the hash algorithm
is fixed for the lifetime of the instance, and note that the hash
selection is available via broker attribute.
Problem: The content RPCs are defined as carrying space-inefficient
blobrefs as raw payload rather than message digests, but the need to
support multiple hash algorithms within an instance is gone.

Use the space-efficient message digest in the load and store RPCs.
Problem: rfc10 references the camlistore project, but it was renamed
to perkeep.

Update references.
@garlick
Copy link
Member Author

garlick commented Apr 23, 2022

Ooops just noticed a punctuation problem and fixed it, forcing a push.

@grondo
Copy link
Contributor

grondo commented Apr 23, 2022

Oops accidentally "resolved" the conversation to which I was replying. I meant to say that yes, I did see that this PR proposes a change in the protocol, but it is nice to see the RFC updated before or along with the implementation!

@garlick
Copy link
Member Author

garlick commented Apr 23, 2022

I'll set MWP - thanks.

@mergify mergify bot merged commit 277d7d0 into flux-framework:master Apr 23, 2022
@garlick garlick deleted the rfc10_raw branch April 23, 2022 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants