-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate design document in preparation for update #9634
Conversation
I'm happy to take on some of the work, though 1:30pm on Friday's is rough for me due to various shuttling of kids to activities. |
Why don't you comment in this PR for sections you'd like to take on. |
In order to support diverse client usage, Cockroach clients connect to | ||
any node via HTTPS using protocol buffers or JSON. The connected node | ||
proxies involved client work including key lookups and write buffering. | ||
|
||
# Keys | ||
|
||
[Update referencing concepts in keys/constants.go required here] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tackle this section. Will probably rename it to Internal Keys
to delineate it from SQL
.
@@ -633,6 +663,7 @@ A really good reference on tuning Linux installations with RocksDB is | |||
|
|||
# Range Metadata | |||
|
|||
[Peter: did you actually have numbers for the size of meta2 records?] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Easy enough to recompute, though it was below 256 bytes and that upper bound still seems reasonable. I can take this section as well.
+cc @knz |
|
||
|
||
|
||
![Distributed SQL] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take this
such as schemas, tables, columns, and indexes. The structured data API | ||
in turn depends on the [distributed key value store](#key-value-api), | ||
which handles the details of range addressing to provide the abstraction | ||
of a single, monolithic key value store. The distributed KV store | ||
communicates with any number of physical cockroach nodes. Each node | ||
contains one or more stores, one per physical device. | ||
|
||
![SQL] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knz was volunteered
I'm happy to do whatever I'm being volunteered for. |
@@ -64,15 +74,25 @@ grained data on the level of entity groups. | |||
|
|||
Cockroach implements a layered architecture. The highest level of | |||
abstraction is the SQL layer (currently unspecified in this document). | |||
[FIX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I will
@@ -83,6 +103,7 @@ raft. The color coding shows associated range replicas. | |||
|
|||
![Ranges](media/ranges.png) | |||
|
|||
[Update RoachNode concept] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix references to RoachNode
.
@@ -101,12 +122,14 @@ Up to `F` failures can be tolerated, where the total number of replicas `N = 2F | |||
|
|||
# Cockroach Client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll delete this whole section.
I'll get to the Linearizability section. |
I'll take Raft (+ Quiescing + Coalescing). |
@@ -377,6 +401,7 @@ An exploration of retries with contention and abort times with abandoned | |||
transaction is | |||
[here](https://docs.google.com/document/d/1kBCu4sdGAnvLqpT-_2vaTbomNmX3_saayWEGYu1j7mQ/edit?usp=sharing). | |||
|
|||
[Update (conceptually) s/table/records/g] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird could you address this comment?
@@ -4,6 +4,9 @@ by Spencer Kimball from early 2014. | |||
|
|||
# Overview | |||
|
|||
[Spencer to edit overview. This stuff is so ripe it's starting to | |||
stink: "structured data? WTF is that?] | |||
|
|||
Cockroach is a distributed key:value datastore (SQL and structured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm rewriting the SQL reference here.
@@ -14,6 +17,10 @@ disruption and **no manual intervention**. Cockroach nodes are | |||
symmetric; a design goal is **homogeneous deployment** (one binary) with | |||
minimal configuration. | |||
|
|||
[Start at top of the new stack here: SQL, mention of distributed SQL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this over
@@ -64,15 +74,25 @@ grained data on the level of entity groups. | |||
|
|||
Cockroach implements a layered architecture. The highest level of | |||
abstraction is the SQL layer (currently unspecified in this document). | |||
[FIX] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I will
@@ -1305,6 +1356,7 @@ write-optimized (HBase, Cassandra, SQLite3/LSM, CockroachDB). | |||
|
|||
## Architecture | |||
|
|||
[Keep this diagram; update if necessary] | |||
CockroachDB implements a layered architecture, with various | |||
subdirectories implementing layers as appropriate. The highest level of | |||
abstraction is the [SQL layer][5], which depends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewriting this to clean up the SQL references
@@ -1331,6 +1383,7 @@ replicas. | |||
|
|||
## Client Architecture | |||
|
|||
[Update this diagram] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking this over to explain SQL connections
Review status: 0 of 1 files reviewed at latest revision, 14 unresolved discussions, all commit checks successful. docs/design.md, line 812 at r1 (raw file):
Taking this. docs/design.md, line 878 at r1 (raw file):
Taking this. Comments from Reviewable |
@@ -609,6 +638,7 @@ for further details. | |||
|
|||
# Node Storage | |||
|
|||
[Updates necessary below] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll clean this section up a bit. Remove the range tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #9652
@@ -660,6 +691,7 @@ is sparse, the successor key is defined as the next key which is present. The | |||
found using the same process. The *meta2* record identifies the range | |||
containing `key1`, which is again found the same way (see examples below). | |||
|
|||
[Updates required below] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamird could you update here as well?
# Range-Spanning Binary Tree | ||
|
||
[Remove this section I think.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done #9643
@@ -903,6 +941,7 @@ expired, the command will be rejected by the replica. | |||
|
|||
# Splitting / Merging Ranges | |||
|
|||
[Accurate still? Needs review.] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still discuss merges if we don't do them yet?
@@ -805,6 +839,7 @@ Future optimizations may include two-phase elections and quiescent ranges | |||
|
|||
# Range Leases | |||
|
|||
[Cleanup, add section on epoch-based range leases and motivation & link to range lease RFC] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerkimball to do range leases
@@ -840,6 +875,7 @@ offset. | |||
|
|||
## Relationship to Raft leadership | |||
|
|||
[Is this all still accurate?] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjunravinarayan to do this raft leadership section
@@ -1091,6 +1136,8 @@ The gossip protocol itself contains two primary components: | |||
|
|||
# Node Accounting | |||
|
|||
[Replace with description of MVCC, store-level aggregation, recorder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrtracy would be great if you could replace Node Accounting
section with an Accounting
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like @tschottdorf is handling this bit.
@@ -1225,6 +1275,7 @@ it discovers differences, it reconfigures ranges in the same way | |||
that it rebalances away from busy nodes, via special-case 1:1 | |||
split to a duplicate range comprising the new configuration. | |||
|
|||
[Rest of this document is tossable] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerkimball to delete tail of document.
If everyone is making edits to this doc, the merge might be painful. What is the plan? |
@petermattis just stay in your assigned section. |
This adds a little bit on repair and rebalancing as well. Part of cockroachdb#9634.
This adds a little bit on repair and rebalancing as well. Part of cockroachdb#9634.
@@ -609,6 +638,7 @@ for further details. | |||
|
|||
# Node Storage | |||
|
|||
[Updates necessary below] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #9652
@@ -609,6 +638,7 @@ for further details. | |||
|
|||
# Node Storage | |||
|
|||
[Updates necessary below] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@mjibson @petermattis @mrtracy @tschottdorf @arjunravinarayan @andreimatei @jordanlewis @jseldess @sploiselle
All, you've been called to a higher purpose. Our design doc is in parlous condition. It's time to give it a miracle makeover. For those of you who feel the call of duty, we're going to organize a designdoc hackathon tomorrow at 1:30p to split design.md into areas of concern, parcel them out amongst ourselves, knock out the asbestos and water-stained drywall, and erect a modern, glass and steel structure in its place.
I've yet to delineate the missing sections (too late already tonight); we can spend some time on that tomorrow. But if we just get as far as dumping no-longer-applicable sections and cleaning up what remains, we'll be in significantly better shape.
This change is