Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: support splitting directories over multiple blobs #1206

Open
garlick opened this issue Sep 26, 2017 · 0 comments
Open

kvs: support splitting directories over multiple blobs #1206

garlick opened this issue Sep 26, 2017 · 0 comments

Comments

@garlick
Copy link
Member

garlick commented Sep 26, 2017

Similar to issue #1202 on splitting valref objects over multiple blobs, RFC 11 defines a dirref object in terms of an array of blobrefs, but the implementation currently only supports one. As with valref, large directories should be split when transferred between the API and KVS service to avoid head of line blocking.

looking up a directory
When the lookup API functions request a directory with FLUX_KVS_READDIR, and the KVS service looks it up to find a multi-blob dirref, the KVS service should return the dirref rather than the assembled dir object. The client API should then load the pieces from the content store before assembling and decoding them, and fulfilling the lookup's future.

writing a directory
It is common to commit an empty dir (result of a mkdir), and currently I believe it would be allowed to commit a dir object that is non-empty, at least as far as the KVS service is concerned. However, as I'm not sure there is a use case for this, constructing an API that automatically splits large directories on the client end is probably not required.

manipulating directories during commits and lookups
More commonly, KVS service internals must assemble and decode directories in order to follow a path lookup or process a commit. Here, code is needed to split a directory across multiple blobs when it reaches a threshold size. This creates a complication for the KVS's internal cache that maps hash references to decoded JSON objects (such as directories). Possibly it means the dirref object, encoded, rather than simply the SHA-1 hash becomes the key to the cache entry? (Some design work needs to be done here).

really large directories
Splitting directories over multiple blobs in some respects creates additional overhead for large directories. For example, adding an entry to a multi-blob directory requres the entire object to be assembled and decoded, modified as JSON, and re-encoded, and dissembled. The RFC 11 hdir object is the proposed solution to this problem, covered in #1207

Internal design discussion points

  • how to key the cache for a multi-blob dir object
  • could we possibly dispense with this issue and jump directly to the hdir object when blob size threshold is reached? (would require hdir resize upon reaching blob size threshold in any given hash bucket)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant