-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add time-bound iterator handling for range keys #82596
Comments
cc @cockroachdb/replication |
@jbowens Re: our previous discussion about this. Seems like we should move to block property collectors for TBIs. These appear to already support range keys. Given that enabling range keys requires ratcheting the Pebble database version and CRDB cluster version, we can assume that once we start writing these block properties then readers will respect them. We can also stop writing table properties. I don't think we need to do anything more here? Older SSTables that contain existing table props would be filtered like today based on point keys, and these cannot contain range keys so we're not incorrectly filtering anything here. Newer SSTables would not have table props, and thus would only be filtered on block properties, which are written accurately both for point and range keys. Am I missing anything? |
I just created #82678 to work out if / how we can go about removing table properties entirely. We've been talking about this on Storage for a while. |
Way ahead of you: #82658 We'll just need to keep respecting them for as long as nodes can still have SSTs written by 21.2 or older versions. Not sure how best to ensure that though -- might need an actual migration in 23.1 or something. |
Yeah, that issue is more for the migration. I'll update it to make it more pointed. We have some precedent for doing these two step migrations now (see #77634 and cockroachdb/pebble#1537), so hopefully we can reuse some of that functionality. |
81071: sql: Allow DEFAULT/ON UPDATE types that can be assignment-cast r=e-mbrown a=e-mbrown Resolves #74854 postgres: ``` postgres=# create table casts (b BOOL DEFAULT 'foo'::VARCHAR); ERROR: column "b" is of type boolean but default expression is of type character varying> HINT: You will need to rewrite or cast the expression. ``` crdb: ``` [email protected]:26257/movr> CREATE TABLE t (b BOOL DEFAULT 'foo'::VARCHAR); CREATE TABLE Time: 8ms total (execution 8ms / network 0ms) ``` Release note (sql change): A column's DEFAULT/ON UPDATE clause can now have a type that differs from the column type, as long as that type can be assignment-cast into the column's type. This increases our PostgreSQL compatibility. 81160: opt: simplify lookup join constraint building r=mgartner a=mgartner #### opt: simplify lookup join constraint building The lookup join constraint builder has been simplified so that it builds constraints in a single pass over an index's columns. This is in contrast to the previous method which would require a second pass over the index's columns after it was discovered that a multi-span lookup join with a lookup expression would be required. This change has a number of consequences: 1. Lookup joins can be planned in more cases. As an example, a lookup join can now be planned when there is an equality filter on a indexed computed column and a range filter on another indexed column. 2. Constant value filters, like `x = 1`, are no longer included in lookup expressions. Instead, the constant value will be projected and the lookup expression will contain an equality and the projected column, like `x = projected_col`. This should have little, if any, effect on the performance of lookup joins. This change required a minor change to the `GenerateLocalityOptimizedAntiJoin` rule to ensure that projected constant values are produced by the the inner anti-join. This is necessary because the outer anti-join's lookup expression references the projected column. This wasn't previously necessary because constant values weren't projected and referenced in lookup expressions - they were inlined directly in lookup expressions. Release note: None #### opt: remove "cross-join" from some tests As of #79586, we no longer plan cross-joins as inputs of lookup joins. This commit updates some test comments to reflect this. Release note: None 81713: kvserver: benchmark empty replica addition/removal r=aayushshah15 a=aayushshah15 This commit adds a small benchmark to track low level regressions in the time it takes to add/remove an empty range's replica. Relates to #72083 Release note: None 82317: ccl/sqlproxyccl: enable a more graceful shutdown r=pjtatlow a=pjtatlow When a drainSignal is received, the sql proxy now waits for all connections to close within a certain time limit (59 minutes) before shutting down. The next drainSignal will be ignored, but the third will forcefully shut down the server by panicking. This is to resolve an issue with Kubernetes where traffic could be lost during upgrades. See CC-5298 for more details. Release notes: None 82426: ui: new charts to statement details page r=maryliag a=maryliag The overview for a statement details page now shows charts instead of just the mean value for: - Execution and Planning Time - Rows Processed - Execution Retries - Execution Count - Contention https://www.loom.com/share/2b6d0311e61a433d81ad37b8b62fba7d <img width="1566" alt="Screen Shot 2022-06-06 at 6 14 41 PM" src="https://user-images.githubusercontent.com/1017486/172258188-d0367377-140e-4863-8ecf-f06f14a0e277.png"> <img width="1576" alt="Screen Shot 2022-06-03 at 4 53 49 PM" src="https://user-images.githubusercontent.com/1017486/171951030-0535484d-f4f2-4af4-aa50-f53247951f8c.png"> <img width="1575" alt="Screen Shot 2022-06-03 at 4 54 02 PM" src="https://user-images.githubusercontent.com/1017486/171951037-af6bf17d-0128-49dc-861a-250adaa0f383.png"> Fixes #74517 Release note (ui change): Statement Details page now shows charts for: Execution and Planning Time, Rows Processed, Execution Retries, Execution Count and Contention. 82561: sql: put connExecutor's AutoRetry fields into txnState's mutex r=ajwerner a=RichardJCai Auto retry variables could be used outside the connExecutor's goroutine in calls to serialize. If this is the case, the field should be in txnState's mutex struct. Release note: None Fixes #82506 Fixes #78178 82658: storage: stop writing SST table properties r=jbowens a=erikgrinaker **util/hlc: add `Timestamp.WallNext()`** Release note: None **storage: stop writing SST table properties** Previously, SSTables were written with the table properties `crdb.ts.min` and `crdb.ts.max` tracking the minimum and maximum timestamps in the SSTable. This was used to filter SSTables via the `MVCCIterator` options `MinTimestampHint` and `MaxTimestampHint`, aka time-bound iterators. In 22.1, we additionally began writing the block property `MVCCTimeInterval` containing the minimum and maximum timestamps for each SSTable block, aggregated up to the SSTable. Time-bound iterators filter on this block property in addition to the table properties. In 22.2, we will be introducing Pebble range keys. Table properties do not support range keys, so we must migrate entirely to block properties. This patch therefore stops writing table properties entirely. This ensures that SSTables containing range keys (introduced in 22.2) will not have table properties -- conversely, SSTables that do have table properties can only contain point keys. By doing so, we avoid erroneously filtering out SSTables with range keys that satisfy the time predicate but were not taken into account by table properties. A later patch will add block property collectors for range keys. We still respect table properties while filtering, to maintain compatibility with SSTables written by 21.2 nodes and earlier. We must retain this backwards compatibility as long as it is possible for a cluster to have SSTables written by older nodes, which can happen e.g. if a cluster is upgraded from 21.2 to 22.2 in rapid succession. The change from table to block properties slightly changes the filtering semantics, as block properties discard the logical timestamp component. This should not be an issue, since timestamp hints are inclusive, and TBIs are allowed to emit false positives. This patch also removes the `IteratorStats.TimeBoundNumSSTs` field, as block property filters do not update it. Touches #82596. Release note: None **storage: remove `pebbleDeleteRangeCollector`** This was a noop property collector no longer in use. Release note: None 82664: roachtest: skip flaky costfuzz r=meridional a=meridional Github issue: #81717 Release note: None Co-authored-by: e-mbrown <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: PJ Tatlow <[email protected]> Co-authored-by: Marylia Gutierrez <[email protected]> Co-authored-by: richardjcai <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Ye Ji <[email protected]>
We currently use table/block filters for time-bound iteration of point keys. We need this to handle range keys too. At the very least, it cannot have false negatives (skip tables/blocks with range keys that are within the range key interval). It would be great if we could rely on
pebble.IterOptions.RangeKeyFilters
for this, but all Pebble databases may not support it so it's possible this would require a costly data migration. In the worst case, just make sure we don't skip over range key blocks at all.cockroach/pkg/storage/pebble_iterator.go
Lines 195 to 227 in 28e600d
Jira issue: CRDB-16526
Epic CRDB-2624
The text was updated successfully, but these errors were encountered: