-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: consistent follower reads with leaseholder coordination #72593
Comments
Do sql pods know their AZ and the AZ of sql nodes? I assume they know the latter thing, or, at least, have node descriptors which give them some info. Do we need to plumb more info into the sql pods to help facilitate this? |
This commit adds support for the `--locality` and `--max-offset` flags to the `cockroach mt start-sql` command. The first of these is important because tenant SQL pods should know where they reside. This will be important in the future for multi-region serverless and also for projects like cockroachdb#72593. The second of these is important because the SQL pod's max-offset setting needs to be the same as the host cluster's. If we want to be able to configure the host cluster's maximum clock offset to some non-default value, we'll need SQL pods to be configured identically. Validation of plumbing: ```sh ./cockroach start-single-node --insecure --max-offset=250ms ./cockroach sql --insecure -e 'select crdb_internal.create_tenant(2)' # verify --max-offset ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 # CRDB crashes with error "locally configured maximum clock offset (250ms) does not match that of node [::]:62744 (500ms)" ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms # successful # verify --locality ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' ERROR: gateway_region(): no region set on the locality flag on this node ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms --locality=region=us-east1 ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' gateway_region ------------------ us-east1 ```
73500: kv,storage: persist gateway node id in transaction intents r=AlexTalks a=AlexTalks This change augments the `TxnMeta` protobuf structure to include the gateway node ID (responsible for initiating the transaction) when serializing the intent. By doing so, this commit enables the Contention Event Store proposed in #71965, utilizing option 2. Release note: None 73862: sql: add test asserting CREATE/USAGE on public schema r=otan a=rafiss refs #70266 The public schema currently always has CREATE/USAGE privileges for the public role. Add a test that confirms this. Release note: None 73873: scdeps: tighten dependencies, log more side effects r=postamar a=postamar This commit reworks the dependency injection for the event logger, among other declarative schema changer dependencies. It also makes the test dependencies more chatty in the side effects log. Release note: None 73932: ui: select grants tab on table details page r=maryliag a=maryliag Previosuly, when the grants view was selected on the Database Details page, it was going to the Table Details with the Overview tab selected. With this commit, if the view mode selected is Grant, the grant tab is selected on the Table Details page. Fixes #68829 Release note: None 73943: cli: support --locality and --max-offset flags with sql tenant pods r=nvanbenschoten a=nvanbenschoten This commit adds support for the `--locality` and `--max-offset` flags to the `cockroach mt start-sql` command. The first of these is important because tenant SQL pods should know where they reside. This will be important in the future for multi-region serverless and also for projects like #72593. The second of these is important because the SQL pod's max-offset setting needs to be the same as the host cluster's. If we want to be able to configure the host cluster's maximum clock offset to some non-default value, we'll need SQL pods to be configured identically. Validation of plumbing: ```sh ./cockroach start-single-node --insecure --max-offset=250ms ./cockroach sql --insecure -e 'select crdb_internal.create_tenant(2)' # verify --max-offset ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 # CRDB crashes with error "locally configured maximum clock offset (250ms) does not match that of node [::]:62744 (500ms)" ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms # successful # verify --locality ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' ERROR: gateway_region(): no region set on the locality flag on this node ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms --locality=region=us-east1 ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' gateway_region ------------------ us-east1 ``` 73946: ccl/sqlproxyccl: fix TestWatchPods under stressrace r=jaylim-crl a=jaylim-crl Fixes #69220. Regression from #67452. In #67452, we omitted DRAINING pods from the tenant directory. Whenever a pod goes into the DRAINING state, the pod watcher needs time to update the directory. Not waiting for that while calling EnsureTenantAddr produces a stale result. This commit updates TestWatchPods by polling on EnsureTenantAddr until the pod watcher updates the directory. Release note: None 73954: sqlsmith: don't compare voids for joins r=rafiss a=otan No comparison expr is defined on voids, so don't generate comparisons for them. Resolves #73901 Resolves #73898 Resolves #73777 Release note: None Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Marius Posta <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Jay <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
This commit adds support for the `--locality` and `--max-offset` flags to the `cockroach mt start-sql` command. The first of these is important because tenant SQL pods should know where they reside. This will be important in the future for multi-region serverless and also for projects like cockroachdb#72593. The second of these is important because the SQL pod's max-offset setting needs to be the same as the host cluster's. If we want to be able to configure the host cluster's maximum clock offset to some non-default value, we'll need SQL pods to be configured identically. Validation of plumbing: ```sh ./cockroach start-single-node --insecure --max-offset=250ms ./cockroach sql --insecure -e 'select crdb_internal.create_tenant(2)' # verify --max-offset ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 # CRDB crashes with error "locally configured maximum clock offset (250ms) does not match that of node [::]:62744 (500ms)" ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms # successful # verify --locality ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' ERROR: gateway_region(): no region set on the locality flag on this node ./cockroach mt start-sql --insecure --tenant-id=2 --sql-addr=:26258 --http-addr=:0 --max-offset=250ms --locality=region=us-east1 ./cockroach sql --insecure --port=26258 -e 'select gateway_region()' gateway_region ------------------ us-east1 ```
We have marked this issue as stale because it has been inactive for |
To date, follower reads (rfc/follower_reads.md, rfc/follower_reads_implementation.md, rfc/non_blocking_txns.md) have always been viewed first and foremost as a tool to minimize latency for read-only operations. By avoiding all communication with a range's leaseholder, follower reads can help a transaction avoid cross-region communication, dramatically reducing latency. However, in order to avoid any coordination with the leaseholder, follower reads trade off some utility — they either require reads to be stale or writes to be pushed into the future. This limits the places where they can be used.
This issue explores an extended form of "consistent" follower read that can be used in more situations than "stale" follower reads but still requires synchronous fixed-size (with respect to data accessed) communication with the range's leaseholder, negating what we have traditionally viewed as the primary benefit of follower reads. It also explores the secondary benefits of follower reads that remain even if the leaseholder helps coordinate the read off of a follower.
Motivations
Network costs in public clouds are expensive. They are also asymmetric, with pricing dependent on the source and destination of data transfer. For example, we see from EC2's data transfer pricing page that cross-region transfer costs between $0.01-$0.02 per GB, cross-zone transfer costs $0.01 per GB, and intra-zone transfer is free. This asymmetric pricing provides a strong incentive to minimize the amount of data shipped across regions/zones, even if some communication across regions/zones is unavoidable. Recognizing that many clients often have a follower for a given range closer (in data transfer cost terms) than the leaseholder presents an opportunity for cost savings.
Load-based splitting and rebalancing can help spread out well-distributed load across a cluster of nodes. However, they cannot spread out hotspots that cannot be split into different ranges. For read-heavy hotspots, serving reads from followers replicas can provide a form of load-balancing. This is true even if the leaseholder is contacted at some point to facilitate the read from the follower, as long as the follower is the one performing the expensive portion of the read (e.g. reading from disk, sending the result set to the client over the network, etc.).
(Stretch motivation) In the future, followers may store data in a different layout than leaseholders (e.g. column-oriented instead of row-oriented), which may be better suited for large analytical-style reads. The data organization would exchange write performance for read performance, so it would be more appropriate for a follower by virtue of the fact that followers can apply log entries at a slower cadence than leaseholders (e.g. batching 100s of entries to apply at a time). Serving reads from follower replicas would allow these read-optimized followers to be used even for consistent reads.
High-Level Overview
The key idea here is that even if the leaseholder is contacted during a read, it doesn't need to be the one to serve the read's results. Instead, it can be contacted to do some light bookkeeping and then offload the heavy-lifting to a follower replica who may be a better candidate to serve the data back to the client.
For the sake of this issue, let's pretend we introduced a new request type called
EstablishResolvedTimestamp
(a sibling to theQueryResolvedTimestamp
request).In response to an
EstablishResolvedTimestamp
request, the leaseholder would concern itself with concurrency control and with determining how far the follower needs to catch up on its Raft log before its state machine contains a fully resolved view of the specifiedspan x timestamp
segment of "keyspacetime". Morally, the leaseholder would be in charge of creating a resolved timestamp over the given key span at the given timestamp. So the API would look something like this:With this new API, followers can now be used to serve consistent follower reads. Either of the following appeaches would work here, and each has their own benefits:
Follower-coordinated
Benefits:
Client-coordinated
Extended client-coordinated
establish_resolved_timestamp_on_large_result
flaglease_applied_index
Benefits:
Additional unstructured notes:
Jira issue: CRDB-11223
Epic CRDB-14991
The text was updated successfully, but these errors were encountered: