Draft: Multi-Writer DEP #10

bnewbold · 2018-03-05T01:27:07Z

Current status: needs proof-reading, and there are some unresolved issues, but ready for review

Big TODOs:

write "Semantics and Usage" section
re-structure and re-write implementation details
examples (with valid messages)
security and privacy concerns

mafintosh · 2018-05-02T16:10:06Z

proposals/0000-multiwriter.md

+hypercore feed, and it is broadly considered best practice not to distribute
+secret keys between multiple users or multiple devices. In fact, the current
+hypercore implementation has no mechanism to resolve disputes or recover if
+multiple agents used the same secret key to append to the same feed.


Perhaps note that this is by design (for simplicity, scalability etc)

mafintosh · 2018-05-02T16:11:47Z

proposals/0000-multiwriter.md

+- secure key distribution and authentication (eg, if a friend should be given
+  write access to a hyperdb database, how is that friend's feed key found and
+  verified?)
+- merge conflict resolution, potentially using application-layer semantics


the last part could be misunderstood. we detect conflicts and provide apis for the user to deal with those multiple values (if they want)

mafintosh · 2018-05-02T16:13:22Z

proposals/0000-multiwriter.md

+replicated data type" ([CRDT][crdt]).
+
+[vc]: https://en.wikipedia.org/wiki/Vector_clock
+[crdt]: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type


great section

mafintosh · 2018-05-02T16:14:41Z

proposals/0000-multiwriter.md

+[scaling]: #scaling
+
+TODO: brief note on scaling properties (eg, reasonable numbers of writers per
+database)


up to 1000 in the first impl will prob scale thing. noting that we have ways of scaling to millions of writers but thats for later versions :)

mafintosh · 2018-05-02T16:15:37Z

proposals/0000-multiwriter.md

+is written to `./local/`. All other feeds are written under directories
+prefixed `./peers/<feed-discovery-key>/`.
+
+TODO: the above disk format is incorrect?


What is the actual on-disk format? I don't have any actual hyperdbs with more than source and local on my laptop.

mafintosh · 2018-05-02T16:18:04Z

proposals/0000-multiwriter.md

+- when this node was written, the largest seq # in the third feed I have is 5
+
+For example, Bob's vector clock for Alice's seq=3 entry above would be `[0, 3]`
+since he knows of her latest entry (seq=3) and his own (seq=0).


worth noting here that the order of the clock is not consistent across writers. it's only consistent the inflatedfeed message of the same feed

mafintosh · 2018-05-02T16:19:14Z

proposals/0000-multiwriter.md

+
+TODO:
+
+- Why is this design the best in the space of possible designs?


maps almost perfectly to append-only logs, proven tech (a HAMT is nothing new)

mafintosh · 2018-05-02T16:22:56Z

proposals/0000-multiwriter.md

+TODO:
+
+- Why is this design the best in the space of possible designs?
+- What other designs have been considered and what is the rationale for not choosing them?


we looked at a bunch of different data structures. it basically boils down to this: we need a kv store that is sparse friendly (ie no external indexing needed), not too much of a space overhead, roundtrip friendly, and that can be mapped on an append-only log.

not many approaches exists that fit this over than a trie. a normal prefix trie might end up being better (lex ordered instead of hash ordered) though but it's close to trivial to support both.

you mention complexity here. it's actually the same complexity to implement this datastructure in single writer mode as it is to impl the current hyperdrive one (same scheme, just uses hashes). the multiwriter one has more potential race conditions but a continuation of the the same idea.

Thanks for these notes! TBC, these questions came straight from the DEP template.

mafintosh · 2018-05-02T16:23:58Z

proposals/0000-multiwriter.md

+
+- Why is this design the best in the space of possible designs?
+- What other designs have been considered and what is the rationale for not choosing them?
+- What is the impact of not doing this?


leaving it up to the ecosystem to figure out multiwriter?

i think the only over valid approach would have been to try and solve multiwriter at the hypercore layer but it becomes just as tricky there as it has been getting this right

mafintosh · 2018-05-02T16:24:20Z

proposals/0000-multiwriter.md

+[unresolved]: #unresolved-questions
+
+What is the technical term for the specific CRDT we are using?
+"Operation-based"?


mafintosh · 2018-05-02T16:25:11Z

proposals/0000-multiwriter.md

+"Operation-based"?
+
+If there are conflicts resulting in ambiguity of whether a key has been deleted
+or has a new value, does `db.get(key)` return an array of `[new_value, None]`?


db.get always return an array of nodes and not values. this is to provide more context. in case of the scenario you describe one of the nodes would have value === null (we are considering adding a .deleted flag instead)

mafintosh · 2018-05-02T16:25:42Z

proposals/0000-multiwriter.md

+
+
+# Changelog
+[changelog]: #changelog


@jimpick has been instrumental and getting this production ready as well

aral · 2018-06-06T16:59:13Z

I’m sure you guys have already looked into the current academic research around CRDTs but I just came out of a deep dive where I reached the same conclusions (DAG with publickey auth weaved into the graph). On the conflict resolution side, have you evaluated the causal tree/Logoot/LSEQTree approaches at all?

Links to all three from the quick summary of my own research:
https://indienet.info/other/spikes/crdt/

PS. Given that you’ve basically built what I was planning to, I believe we are going to be using DAT for Indie Site going forward which makes me extremely happy :)

bnewbold · 2018-06-10T08:29:58Z

@mafintosh needs proof-reading, and there are some unresolved issues, but I think this is mostly ready for review. I left some questions in "unresolved".

I had been procrastinating this for a long time because I thought it was going to be really gnarly, but it's actually super simple compared to all the hyperdb trie nitty gritties.

Looks like the hyperdb DEP will need a small update with the deleted and Header changes.

mafintosh · 2018-06-20T16:22:57Z

proposals/0000-multiwriter.md

+multiple nodes will be returned. If a key has been deleted from the database, a
+node with the `deleted` flag will be returned; note that such "tombstone" nodes
+can still have a `value` field, which may contain application-specific metadata
+(such as a timestamp).


if all nodes to be returned are deleted null will be returned instead (similar to looking up a non-written key)

mafintosh · 2018-06-20T17:28:57Z

proposals/0000-multiwriter.md

+
+Couldn't the local feed's sequence number be skipped from vector clocks,
+because it's implied by the sequence number of the hypercore entry itself? Same
+with the key in the feed list (for inflated entries).


yep, just added for now for simplicity

mafintosh · 2018-06-20T17:29:54Z

proposals/0000-multiwriter.md

+"Operation-based" or "State-based"?
+
+What is the actual on-disk layout (folder structure), if not what is documented
+here?


content/<disc-key> <-- any content feed peers/<disc-key> <-- writers source <-- original writer local <-- local writer if not original

pfrazee · 2018-06-21T18:54:13Z

Great work, @bnewbold.

Do we need to take some time to discuss the permissions schemes? I thought we were going to have an owner / writer distinction, where owners can authorize and writers can only write.

pfrazee · 2018-06-21T18:33:46Z

proposals/0000-multiwriter.md

+[unresolved]: #unresolved-questions
+
+What is the technical term for the specific CRDT we are using?
+"Operation-based" or "State-based"?


This is tough to answer because there's only one op currently (put) but I'm fairly sure it's ops-based.

Op-based vs State-based

@mafintosh

Thanks @mafintosh, @pfrazee

bnewbold · 2018-07-06T15:37:37Z

I integrated most comments.

@pfrazee I think this is the first I've heard of an owner/writer permissions distinction; is that in current hyperdb? If it wasn't in the 3.0 release, I think at this point i'd say we should ship this as draft and update as things change.

pfrazee · 2018-07-06T15:38:23Z

@bnewbold sounds good

joehand mentioned this pull request Mar 6, 2018

Data redundancy dat-ecosystem-archive/dat-node#191

Closed

mafintosh reviewed May 2, 2018

View reviewed changes

pfrazee mentioned this pull request Jun 6, 2018

Questions about WebDB and Injest beakerbrowser/specs#5

Closed

bnewbold added 4 commits June 9, 2018 18:12

WIP on multiwriter DEP

fc894b5

work in progress on multi-writer DEP

27d1cdc

address early maf comments on PR

33541cb

tweaks, updates, some progress

89c4174

bnewbold force-pushed the dep-multiwriter branch from 7c8f44f to 89c4174 Compare June 10, 2018 05:10

bnewbold added 3 commits June 9, 2018 22:56

update unresolved, security, usage

714563f

update impl reference and example

3b8215a

tweak vector_clock and inflated refs in example

0e9bf58

bnewbold changed the title ~~WIP: Multi-Writer DEP~~ Draft: Multi-Writer DEP Jun 17, 2018

bnewbold mentioned this pull request Jun 17, 2018

Upcoming Meeting Agenda - 20th June 2018 dat-ecosystem/consortium#24

Closed

6 tasks

mafintosh reviewed Jun 20, 2018

View reviewed changes

pfrazee reviewed Jun 21, 2018

View reviewed changes

bnewbold added 2 commits July 6, 2018 11:28

multiwriter: op-based CRDT

8950475

multiwriter: review feedback

a81c069

Thanks @mafintosh, @pfrazee

pfrazee mentioned this pull request Jul 6, 2018

Upcoming Meeting Agenda - 4th July 2018 dat-ecosystem/consortium#25

Closed

6 tasks

multiwriter on-disk folder structure

eb13349

multiwriter as DEP-0008

96eb0d3

bnewbold merged commit 3ad7ed8 into dat-ecosystem-archive:master Jul 6, 2018

bnewbold deleted the dep-multiwriter branch July 6, 2018 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Multi-Writer DEP #10

Draft: Multi-Writer DEP #10

bnewbold commented Mar 5, 2018 •

edited

Loading

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

bnewbold Jun 10, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

bnewbold Jun 10, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

mafintosh May 2, 2018

aral commented Jun 6, 2018

bnewbold commented Jun 10, 2018

mafintosh Jun 20, 2018

mafintosh Jun 20, 2018

mafintosh Jun 20, 2018

pfrazee commented Jun 21, 2018

pfrazee Jun 21, 2018

bnewbold commented Jul 6, 2018

pfrazee commented Jul 6, 2018


		TODO:

		- Why is this design the best in the space of possible designs?

Draft: Multi-Writer DEP #10

Draft: Multi-Writer DEP #10

Conversation

bnewbold commented Mar 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aral commented Jun 6, 2018

bnewbold commented Jun 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfrazee commented Jun 21, 2018

Choose a reason for hiding this comment

bnewbold commented Jul 6, 2018

pfrazee commented Jul 6, 2018

bnewbold commented Mar 5, 2018 •

edited

Loading