Skip to content

Commit

Permalink
address early maf comments on PR
Browse files Browse the repository at this point in the history
  • Loading branch information
bnewbold committed Jun 10, 2018
1 parent 27d1cdc commit 33541cb
Showing 1 changed file with 23 additions and 5 deletions.
28 changes: 23 additions & 5 deletions proposals/0000-multiwriter.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ design and implement:
- secure key distribution and authentication (eg, if a friend should be given
write access to a hyperdb database, how is that friend's feed key found and
verified?)
- merge conflict resolution, potentially using application-layer semantics
- merge conflict resolution (using the provided API), potentially using
application-layer semantics

Before we go any further, a few definitions:

Expand Down Expand Up @@ -120,8 +121,14 @@ feed (corresponding to `key`) to be included in the database.
## Scaling
[scaling]: #scaling

TODO: brief note on scaling properties (eg, reasonable numbers of writers per
database)
There is some overhead associated with each "writer" added to the feed,
impacting the number of files on disk, memory use, and the computational cost
of some lookup oprations. The design should easily accomodate dozens of
writers, and should scale to 1,000 writers without too much additional
overhead. Note that a large number of writers also implies a larger number and
rate of append operations, and additional network connections, which may cause
scaling issues on their own. More real-world experience and benchmarking is
needed in this area.


# Implementation
Expand Down Expand Up @@ -298,7 +305,8 @@ A vector clock on a node of, say, `[0, 2, 5]` means:
- when this node was written, the largest seq # in the third feed I have is 5

For example, Bob's vector clock for Alice's seq=3 entry above would be `[0, 3]`
since he knows of her latest entry (seq=3) and his own (seq=0).
since he knows of her latest entry (seq=3) and his own (seq=0). Note that the
order of clocks is not consistent across writers, only within the same feed.

The vector clock is used for correctly traversing history. This is necessary for
the `db#heads` API as well as `db#createHistoryStream`.
Expand Down Expand Up @@ -329,6 +337,13 @@ developers.
# Rationale and alternatives
[alternatives]: #alternatives

Design goals for hyperdb (including the multi-writer feature) included:

- ability to execute operations (get, put) with a sparse (partial) replication
of the database, using as few additional network requests as possible
- minimal on-disk and on-wire overhead
- implemented on top of an append-only log (to build on top of hypercore)

TODO:

- Why is this design the best in the space of possible designs?
Expand All @@ -344,6 +359,7 @@ What is the technical term for the specific CRDT we are using?

If there are conflicts resulting in ambiguity of whether a key has been deleted
or has a new value, does `db.get(key)` return an array of `[new_value, None]`?
Answer: `get` always returns nodes (not just values), so context is included. In the case of a deletion, a the value within the node will be `null`.

What is a reasonable large number of writers to have in a single database?
Write "Scaling" section.
Expand All @@ -356,8 +372,10 @@ As of March 2018, Mathias Buus (@mafintosh) is leading development of a hyperdb
nodejs module on [github](https://github.com/mafintosh/hyperdb), which includes
multi-writer features and is the basis for this DEP.

Jim Pick (@jimpick) has been an active contributor working out multi-writer details.

- 2017-12-06: @noffle publishes `ARCHITECTURE.md` overview in the
[hyperdb github repo][arch_md]
- 2018-03-XX: First partial draft submitted for review
- 2018-06-XX: First partial draft submitted for review

[arch_md]: https://github.com/mafintosh/hyperdb/blob/master/ARCHITECTURE.md

0 comments on commit 33541cb

Please sign in to comment.