mini-rfc: Non-primary column families #42038

bdarnell · 2019-10-30T17:38:41Z

This idea was inspired by brainstorming around partitioning and the effort to close the gap between primary and secondary indexes (#41989). I don't have a concrete use case for this yet so I'm just writing it up briefly to see if there's any interest in pursuing this idea further.

Column families allow the primary index (and currently only the primary index) to be divided into multiple KV pairs. This has two main benefits: fine-grained latching for reduced contention (useful at least for YCSB), and reduction in write amplification (especially if there are infrequently-updated blob columns). However, it's kind of a complex and subtle special case for something so rarely used. I propose replacing this with a generalization of storing indexes.

In the new model, a table with two column families would have two indexes instead of a single primary key. each of these indexes would have the same key columns, but "store" a different subset of the table's columns. This would change the constructed key from /$TABLE/1/$PK/$FAMILY to /$TABLE/$INDEX/$PK/0, and place columns from different families far apart from each other. This means that single-row operations are no longer guaranteed to be single-range, which is a downside if you often operate on the entire row, but could be a benefit if you usually operate on parts of the row at a time (which is exactly the time when column families make sense). The benefit would be especially useful in the "blob" use case, since the non-blob column family would be denser with real data. A "free" side effect is that column families would become targets for zone configs, so you could store your blobs on cheaper storage (and maybe this could be a step towards column-level security that goes all the way through the KV layer)

This model gets more interesting if we generalize it from "two half-primary keys" to "every column must be stored in at least one index" (and more subtly, there must be paths to look up every column given a PK). This allows for columns in different families to even be partitioned differently (for example to make some columns available for follower reads in other regions while other columns are replica-partitioned to have faster writes in the home region).

This model appeals to me on a theoretical level because it removes the "special case" of column families in place of a generalization of the relationship between tables and indexes. However, it also introduces a lot of new complexity in the form of complex relationships between indexes and invariants that need to be preserved. I think I've mostly talked myself out of this idea since I haven't been able to come up with use cases that it would help, but I wanted to write it down for posterity and see if anyone else was inspired by it.

Jira issue: CRDB-5398

The text was updated successfully, but these errors were encountered:

github-actions · 2021-06-04T19:49:40Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

nvanbenschoten mentioned this issue Jun 1, 2021

kv,*: non-contiguous ranges #65726

Open

github-actions bot added the no-issue-activity label Jun 4, 2021

jordanlewis added C-investigation Further steps needed to qualify. C-label will change. and removed no-issue-activity labels Jun 8, 2021

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mini-rfc: Non-primary column families #42038

mini-rfc: Non-primary column families #42038

bdarnell commented Oct 30, 2019 •

edited by cockroach-jira-scripts

Loading

github-actions bot commented Jun 4, 2021

mini-rfc: Non-primary column families #42038

mini-rfc: Non-primary column families #42038

Comments

bdarnell commented Oct 30, 2019 • edited by cockroach-jira-scripts Loading

github-actions bot commented Jun 4, 2021

bdarnell commented Oct 30, 2019 •

edited by cockroach-jira-scripts

Loading