Draft: HyperDB #3

bnewbold · 2018-02-05T09:53:31Z

This DEP is submitted for review for Draft status.

A diagram showing the trie structure might be nice to help with learning, but maybe the specification isn't the right place for that. The examples are verbose, but I think are important to clarify exactly how things work. I think exact binary protobuf output (hexdump style?) could also be included for some example messages, to make encoding and endianness completely unambiguous.

pfrazee · 2018-03-05T17:45:51Z

proposals/0000-hyperdb.md

+# Summary
+[summary]: #summary
+
+HyperDB is a new abstraction layer providing a general purpose distributed


Is it "new"? Aint this all new?

(Not just nitpicking on prose; I'm concerned "new" will be confusing, both now and later when nothing is comparatively newer.)

Good point; i'm still finding the right "voice" to use: tense, "we/I", "should/could/must", "one/you".

I was thinking of adding a CI pass with something like http://proselint.com/; we should probably have at least a pass that validates the markdown, and maybe checks out-links for valid HTTP responses.

pfrazee · 2018-03-05T17:55:55Z

proposals/0000-hyperdb.md

+Hyperdrive (used by the Dat application) is expected to be re-implemented on
+top of HyperDB for improved performance with many files (eg, millions). The
+hyperdrive API should be largely unchanged, but the `metadata` format will be
+backwards-incompatible.


Yeah I suggest we move this paragraph about hyperdrive into the motivation, where the plans and reasoning at the time of writing can all be contained.

pfrazee · 2018-03-05T17:57:39Z

proposals/0000-hyperdb.md

+A secondary benefit is to refactor the [trie][trie]-structured key/value API
+out of hyperdrive, allowing third party code to build applications directly on
+this abstraction layer.
+


Third motivation: add multi-writer support

Oh scratch that, multi-writer isn't discussed in this doc

pfrazee · 2018-03-05T18:01:53Z

proposals/0000-hyperdb.md

+`db.get(key)`: Reading a non-existant `key` is an error. Read-only.
+
+`db.delete(key)`: Removes the key from the database. Deleting a non-existant
+key is an error. Requires read-write access.


Does deletion of a path-segment cause the deletion of all keys which the segment is a prefix of to be deleted?

Good question! I think "No", a separate delete_recursive() (or a flag) would be needed.
I've tried to minimize the API surface here in the spec; eg, I haven't included the diff or history streaming stuff. I think in this case it's pretty straightforward to combined list(prefix) and delete(key).

Supporting recursive deletes thru a single append is actually not hard and could be added in the future

pfrazee · 2018-03-05T18:05:42Z

proposals/0000-hyperdb.md

+- `path`: a 2-bit hash sequence of `key`'s components.
+- `seq`: the zero-indexed sequence number of this Node (hyperdb) entry in
+  the feed. Note that this may be offset from the hypercore entry index if
+  there are any leading non-hyperdb entries.


Does this mean that seq only increments when a hyperdb Node is written to the feed?

pfrazee · 2018-03-05T18:25:38Z

proposals/0000-hyperdb.md

+HyperDB is not backwards compatible with the existing hyperdrive metadata,
+meaning dat clients may need to support both versions during a transition
+period. This applies both to archives saved to disk (eg, in SLEEP) and to
+archives received and published to peers over the network.


How can a client detect which metadata protocol to use?

Good question. This is related to the issue of hashbase (and other tools) choking when they try to download a non-hyperdrive hypercore.
Two long-term solutions I could imagine are:

add a Feed-level "type" field to hypercore itself. The obvious place for this would be the Feed message, except this would then leak type information in clear text on the wire, which is bad. A new message type? I could imagine refactoring hypercore messages to have a new "connection init" message for crypto setup, then re-send a Feed encrypted. This would be a good place to toss in a protocol version header as well. hypercore has been pretty stable though, this would be disruptive. Also, (thinking out loud) how would the software figure out the type of feeds on disk (aka, mystery meat SLEEP files)?

extend the convention of always writing a "special" protobuf message as the first entry in every hypercore feed, and encode type info there. hyperdrive currently does this with a pointer to the data feed, but no other hypercore feed "types" do this (AFAIK).

@mafintosh, thoughts?

pfrazee · 2018-03-05T18:28:56Z

Looking really good!

dcposch · 2018-03-14T21:51:20Z

proposals/0000-hyperdb.md

+An example pseudo-code session working with a database might be:
+
+    db.put('/life/animal/mammal/kitten', '{"cuteness": 500.3}')
+    db.put('/life/plant/bush/bananna', '{"delicious": 103.4}')


don't know if you care but the banana has an extra n

Thanks for the catch!

dcposch · 2018-03-14T22:00:25Z

proposals/0000-hyperdb.md

+
+1. Calculate the `path` array for the key you are looking for.
+2. Select the most-recent ("latest") Node for the feed.
+3. Compare path arrays. If the paths match exactly, you have found the Node you


SipHash is not collision resistant, so two distinct paths could have the same path array, right?

Yes, in case of a collision the final node bucket will just have more than one node in it. The full final node bucket is read on every read (usually only a single node) to do key full key comparison to check for collisions

dcposch · 2018-03-14T22:08:48Z

proposals/0000-hyperdb.md

+implementation. The current implementation requires the most recent node in a
+directory to point to all other nodes in the directory. Even with pointer
+compression, this requires on the order of `O(N^2)` bytes; the HyperDB
+implementation scales with `O(N log(N))`.


👍 👍

this is what prevented us from doing datpedia the straightforward way

dcposch · 2018-03-14T22:16:41Z

proposals/0000-hyperdb.md

+specified limit of 2 GByte (due to 32-bit signed arthimetic), and most
+implementations impose a (configurable) 64 MByte limit. Should this DEP impose
+specific limits on key and value sizes? Would be good to decide before Draft
+status.


since dat does rabin chunking, the 2GB limit seems fine ~ large files will always be divided into chunks smaller than that.

will the chunks always be below 64MB though? @mafintosh

@dcposch note this actually not for the files but just for the file metadata. in our case that is just a fs.Stat object with an additional numeric pointer to another feed where the actual file content is stored. This gives basically as large files as you'd want

Also, FWIW, dat (via hyperdrive) does not actually use Rabin chunking at the moment, just fixed size chunks. The protocol is agnostic to chunking, so this can evolve in the future with little disruption if/when content-aware chunking is more performent.

To remind, dat (via hyperdrive) is only one possible application on top of hyperdrive; while hyperdrive should be pretty comfortable even with a couple KByte metadata size limit, other applications might want to store arbitrary data in the hyperdb values; setting expectations around this could save frustration or growing pains later.

@bnewbold agreed. imo you should always only store "small" values directly in the hyperdb, say < 1mb. anything you should put in a "content feed" and then link it from the value.

I think for Draft we can mention the inherent limits and then mention:

"It is STRONGLY recommended to keep both keys and values to less than 2 megabytes (individually and combined). If larger value sizes are desired, a separate content can be pointed to (via contentFeed), and the value in this feed can be metadata pointing to data (eg, contiguous blocks of entries) in that feed."

dcposch · 2018-03-14T22:17:52Z

proposals/0000-hyperdb.md

+
+Are the "deletion" semantics here correct, in that deletion Nodes persist as
+"tombstones", or should deleted keys be omitted from future `trie` fields to
+remove their presence?


it seems important to leave deletion nodes in place--that way, list() can iterate over all current and deleted nodes under a directory and then omit the deleted ones

otherwise, i think list() might return partial results

@dcposch thats how it works atm actually

dcposch · 2018-03-15T00:18:12Z

proposals/0000-hyperdb.md

+array. How unlikely is this situation? Eg, how many keys in a database before
+the birthday paradox results in a realistic chance of a collision? How should
+implementations behave if they detect a collision (matching path but not
+matching key)?


i found a collision so we can test this

after an hour or so with a little Go script on a 122GB r4.4xlarge instance...

Checked 7059000000 hashes Checked 7060000000 hashes Checked 7061000000 hashes Checked 7062000000 hashes Checked 7063000000 hashes Checked 7064000000 hashes COLLISION: SipHash24(mpomeiehc) = SipHash24(idgcmnmna) = 11615559218517537840

that's SipHash24 with an all-zero key, as described above. so if you make a dat and add the following files:

/mpomeiehc /idgcmnmna

...you should be able to test a path collision

@mafintosh @bnewbold

mafintosh/hyperdb@fd60344 <-- i've added @dcposch's hash collision to the tests. We support hash collisions, the two nodes just end up in the same bucket similar to have a multiwriter fork would

@dcposch

Thanks @dcposch!

Will need copy editing, and there are a number of new changes that need to be understood and integrated.

ralphtheninja · 2018-04-18T18:24:40Z

proposals/0000-hyperdb.md

+When converting the 8-bytehash to an array of 2-bit bytes, the ordering is
+proceed byte-by-byte, and for each take the two lowest-value bits (aka, `hash &
+0x3`) as byte index `0`, the next two bits (aka, `hash & 0xC`) as byte index
+`1`, etc. When concatanating path hashes into a longer array, the first


concatenating

ralphtheninja · 2018-04-18T18:27:57Z

proposals/0000-hyperdb.md

+For a `trie` with `N` buckets, each may have zero or more pointers. Typically
+there are a maximum of 3 pointers per bucket, corresponding to the 4 possible
+values minus the current Entry's value, but in the event of hash collisions (in
+the path array space) , there may be multiple pointers in the same bucket


ralphtheninja · 2018-04-18T18:29:54Z

proposals/0000-hyperdb.md

+   a pointer to the Entry with the same hash at the final array index.
+5. If the paths don't entirely match, find the first index at which the two
+   arrays differ. Copy all `trie` elements (empty or not) into the new `trie`
+   for indicies between the "current index" and the "differing index".


ralphtheninja · 2018-04-18T18:31:16Z

proposals/0000-hyperdb.md

+field of an Entry protobuf message.
+
+Consider a trie array with `N` buckets and `M` non-empty buckets (`0 <= M <=
+N`). In the encoded field, there will be `M` concatanated bytestrings of the


concatenated

ralphtheninja · 2018-04-18T18:32:40Z

proposals/0000-hyperdb.md

+the second bytestring chunk would be:
+
+- index varint is `64` (65th element in the trie array; the terminating value)
+- btifield is `0b10000` (varint 1); there is a single pointer set... but it


ralphtheninja · 2018-04-18T18:34:48Z

proposals/0000-hyperdb.md

+hash arrays, we now get all the way to index 34 before there is a difference.
+We again look in the `trie`, find a pointer to entry index 0, and fetch the
+first Entry and recurse. Now the path elements match exactly; we have found the
+Entry we are looking for, and it has an existant `value`, so we return the


ralphtheninja · 2018-04-18T18:37:43Z

proposals/0000-hyperdb.md

+
+The basic key/value semantics of hyperdb (as discussed in this DEP, not
+considering multi-writer changes) are not known to introduce new privacy issues
+when compared with, eg, storing binary values at key-like paths using the


ralphtheninja · 2018-04-18T18:40:06Z

proposals/0000-hyperdb.md

+
+Hyperdb is not backwards compatible with the existing hyperdrive metadata,
+meaning dat clients may need to support both versions during a transition
+period. This applies both to archives saved to disk (eg, in SLEEP) and to


ralphtheninja · 2018-04-18T18:42:07Z

proposals/0000-hyperdb.md

+a feed? See also <https://github.com/datprotocol/DEPs/issues/13>
+
+Need to think through deletion process with respect to listing a path prefix;
+will previously deleted nodes be occulded, or potentially show up in list


pfrazee · 2018-04-18T19:22:42Z

proposals/0000-hyperdb.md

+    db.get('/life/animal/mammal/kitten')
+    => {"cuteness": 500.3}
+    db.list('/life/')
+    => ['/life/animal/mammal/kitten', '/life/plant/tree/banana']


Is this example listing correct? It's showing keys multiple prefix-segments below.

I intended for it to be recursive; looks like there is a recursive flag in maf's hyperdb library. I've added a clarification.

pfrazee · 2018-04-18T19:24:09Z

proposals/0000-hyperdb.md

+care to the following scenarios: A malicious writer may be able to produce very
+frequent key path hash collisions, which could degrade to linear performance. A
+malicious writer could send broken trie structures that contain pointer loops,
+duplicate pointers, or other invalid contents. A malicous writer could write


Pointer loops sound like they could be a pretty serious DoS attack vector. Do we have any way to mitigate that?

There's a calculable upper limit on lookups based on the number of prefixes... Perhaps the recursive algorithm needs to track how many lookups it does and then abort if it hits the limit?

@ralphtheninja

Thank you @ralphtheninja!

bnewbold · 2018-04-19T00:44:42Z

@noffle I've still got you in here as a co-author in attribution of your ARCHITECTURE.md document that was very helpful when starting this DEP. However, while this is derivative, almost none of that document remains in here, and I don't want to "co-author" you without consent. Any thoughts? I'll move you down to an Acknowledgements section if I don't hear back in a day or two.

hackergrrl · 2018-04-19T01:03:27Z

@bnewbold whatever you think is appropriate is fine by me!

bnewbold · 2018-04-26T19:04:20Z

Any further comments here? Poke @mafintosh. If we could approve/merge this week that would be great (eg, at the protocol WG meeting next Wednesday at the latest).

mafintosh · 2018-04-26T19:19:41Z

Let me give this a thorough review over the weekend. At last glance this looked great!

…

On Thu, Apr 26, 2018, 21:04 bnewbold ***@***.***> wrote: Any further comments here? Poke @mafintosh <https://github.com/mafintosh>. If we could approve/merge this week that would be great (eg, at the protocol WG meeting next Wednesday at the latest). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAW_VdxWDQxlTew8YRQEA145okU8ogFUks5tsho1gaJpZM4R5LAf> .

mafintosh · 2018-04-09T08:32:26Z

proposals/0000-hyperdb.md

+
+For a database with most keys having N path segments, the cost of a `get()`
+scales with the number of entries M as `O(log(M))` with best case 1 lookup and
+worst case `4 * 32 * N = 128 * N` lookups (for a saturated `trie`).


This worst case will only happen if all the hashes in the db are colliding which is extremely unlikely. The worst case for a "perfect" hash is still log4(n) afik. The best case is O(1) and the normal case is log4(n) / k where k depends on how many shared hash prefixes there are

mafintosh · 2018-05-02T17:11:46Z

proposals/0000-hyperdb.md

+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+Should we declare that `contendFeed` pointers *not* change over the history of


mafintosh · 2018-05-02T17:12:29Z

proposals/0000-hyperdb.md

+mafintosh mentioned this might be in the works. Does this DEP need to "leave
+room" for those changes, or should we call out the potential for future change?
+Probably not, should only describe existing solutions. This can be resolved
+after Draft.


it can and should be improved in the future (backwards compat). step one is shipping what we have

mafintosh · 2018-05-02T17:12:49Z

proposals/0000-hyperdb.md

+after Draft.
+
+Review (or prove?) the `O(log(M))` intuition for `get()` operations. This could
+happen after Draft status.


it's log(M) yes

bnewbold · 2018-05-07T03:08:17Z

To be explicit: left @noffle as a co-author. Thanks for kicking this whole thing off!

bnewbold added 4 commits February 28, 2018 22:02

start work on hyperdb DEP

53b72f0

hyperdb progress

6ca744c

hyperdb progress

6570f5f

significantly more progress on hyperdb DEP

7dc1759

bnewbold force-pushed the dep-hyperdb branch from 549a0ac to 7dc1759 Compare March 5, 2018 00:50

bnewbold mentioned this pull request Mar 5, 2018

Nitty Gritty Questions mafintosh/hyperdb#57

Closed

bnewbold added 2 commits March 4, 2018 17:11

hyperdb metadata updates

271bae5

change github PR link syntax

a3433dd

pfrazee reviewed Mar 5, 2018

View reviewed changes

dcposch reviewed Mar 14, 2018

View reviewed changes

dcposch reviewed Mar 15, 2018

View reviewed changes

bnewbold added 9 commits March 14, 2018 21:31

my spelling is banannas

885848a

Thanks @dcposch!

start documenting hash collision procedure

3f1b3fc

document hyperdb deletions

6ed881f

confirmed all-zero hashing key

deb2cf2

it's all 'new'

1dea887

hyperdb or HyperDB?

d8cfc17

start refactoring for 'version 3' hyperdb changes

639fa24

Will need copy editing, and there are a number of new changes that need to be understood and integrated.

HyperDB -> hyperdb

355cb31

what should be resolved before Draft

5c0d406

ralphtheninja reviewed Apr 18, 2018

View reviewed changes

pfrazee reviewed Apr 18, 2018

View reviewed changes

bnewbold added 3 commits April 18, 2018 17:29

embaressing nmber of tipos

2cffaa1

Thank you @ralphtheninja!

run a spellcheck

0f6294f

clarify list() defaults recursive

89e0315

pfrazee mentioned this pull request Apr 30, 2018

Upcoming Meeting Agenda - 2 May 2018 dat-ecosystem/consortium#18

Closed

6 tasks

mafintosh reviewed May 2, 2018

View reviewed changes

pfrazee mentioned this pull request May 2, 2018

Action Items - Meeting #9 (2 May) dat-ecosystem/consortium#20

Closed

17 tasks

bnewbold added 4 commits May 6, 2018 19:51

log(M) reviewed by mafintosh

21f48cf

contend typos

6643a25

clarify that contentFeed is immutable

7eadbff

DEP-0003: Hyperdb as Draft

aede603

bnewbold merged commit 23fa355 into dat-ecosystem-archive:master May 7, 2018

bnewbold deleted the dep-hyperdb branch May 7, 2018 03:07

Draft: HyperDB #3

Draft: HyperDB #3

Conversation

bnewbold commented Feb 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfrazee commented Mar 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcposch Mar 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold commented Apr 19, 2018

hackergrrl commented Apr 19, 2018

bnewbold commented Apr 26, 2018

mafintosh commented Apr 26, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnewbold commented May 7, 2018

bnewbold commented Feb 5, 2018 •

edited

Loading

dcposch Mar 15, 2018 •

edited

Loading