Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add new CheckSSTConflicts randomized test #98408

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Mar 10, 2023

enginepb: add MVCCStats.Formatted

Move the storage test formatStats function onto the MVCCStats type itself in
preparation for using it in additional places.

storage: change CheckSSTConflicts start and end types

Callers of CheckSSTConflicts pass in start and end boundaries for the sstable.
Previously, these were passed as MVCCKeys, although the end key's timestamp was
ignored. This resulted in confusing semantics whereby the end boundary was
interpreted as exclusive end bound of end.Key.

storage: add new CheckSSTConflicts randomized test

Add a new randomized test exercising CheckSSTConflicts. Additionally, sketch
out a meta package to aid in writing randomized tests like this one. In the
future, this package may be extracted to the cockroachdb/metamorphic
repository.

Epic: None
Release note: None
Informs #94141.
Informs cockroachdb/pebble#2086.

@jbowens jbowens requested a review from a team as a code owner March 10, 2023 20:59
@jbowens jbowens requested a review from sumeerbhola March 10, 2023 20:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is currently failing. I'm hoping it's failing due to #94141 and the test can be used to produce a simpler reproduction of that issue.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

@jbowens jbowens force-pushed the check-sst-conflicts branch from 51a3474 to 2b1fec5 Compare March 11, 2023 00:07
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 12, 2023
Previously we were missing some cases of range key
overlaps with other range keys, or were too eagerly
stepping over engine keys that did not conflict with
sstable keys, resulting in incorrect stats. This change
adds more targeted test cases to TestEvalAddSSTable to
test for those previously-unaccounted cases, and fixes them
in CheckSSTConflicts.

Informs cockroachdb#94140 and cockroachdb#98408.
Fixes cockroachdb#94141.

Epic: none

Release note: None
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @sumeerbhola)


pkg/storage/meta/run.go line 58 at r2 (raw file):

// each item. It's intended to be used with a function returned by Weighted,
// whhere the item itself is func(rng *rand.Rand).
func Generate[I any](rng *rand.Rand, n int, fn func(*rand.Rand) func(*rand.Rand) I) []I {

I don't get the generic-ness of ItemWeight if we're only going to use them with this function which assumes that the items are a func(*rand.Rand) I.

Can't we have a GenWithWeight[I] struct { Gen func(*rand.Rand) I; Weight int } and pass a ...GenWithWeight[I] to Generate ? Are we envisioning any other kind of usage?

craig bot pushed a commit that referenced this pull request Mar 12, 2023
98426: storage: Fix CheckSSTConflicts stats calculations r=erikgrinaker a=itsbilal

Previously we were missing some cases of range key overlaps with other range keys, or were too eagerly stepping over engine keys that did not conflict with sstable keys, resulting in incorrect stats. This change adds more targeted test cases to TestEvalAddSSTable to test for those previously-unaccounted cases, and fixes them in CheckSSTConflicts.

Informs #94140 and #98408.
Fixes #94141.

Epic: none

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
@jbowens jbowens force-pushed the check-sst-conflicts branch from 2b1fec5 to d15ea19 Compare March 13, 2023 15:04
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsbilal — looks like this is still failing after rebasing over master, including #98426.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)


pkg/storage/meta/run.go line 58 at r2 (raw file):

Are we envisioning any other kind of usage?

Yes. Picking among weighted items is expected to be useful during generation of the individual options themselves as well. For example, the Pebble metamorphic tests pick among IterOptions.KeyTypes enum values.

@jbowens jbowens force-pushed the check-sst-conflicts branch from d15ea19 to 4c0343b Compare March 14, 2023 17:34
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this one is still failing, even if kvnemesis isn't 🤔

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @sumeerbhola)

@itsbilal
Copy link
Contributor

@jbowens I realized that the randomized test produces sst range keys that are of different timestamps. While the function supports that, it's unlikely (impossible I believe?) for that to happen with practical uses of AddSSTable; the entirety of the SST should be of the same timestamp. Maybe someone in KV can confirm, but at least this sort of a failure doesn't point to something imminently serious.

@jbowens
Copy link
Collaborator Author

jbowens commented Mar 14, 2023

I'm forgetting, do we support RESTORE-ing with history, or only backing up with history and restoring the most recent values? If we RESTORE with history, that must use AddSSTable without setting the sstTimestamp parameter.

I suppose the disaggregated storage RESTORE will need to restore with history or perform read-time timestamp rewriting, but that'll be for 23.2 in any case.

@jbowens
Copy link
Collaborator Author

jbowens commented Mar 15, 2023

@itsbilal I asked in #disaster-recovery and it sounds like both cluster-to-cluster streaming and tenant restores both use AddSSTable with varied timestamps.

itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 21, 2023
Previously, the nexting logic around both iterators
being at a range key and not a point key was flawed
in that we'd miss ext points that were in between
the current and next sst keys, when we'd next both
of them. This change addresses that.

It also addresses other miscellaneous corner cases around
stats calculations with overlapping sst/engine range keys
and point keys. All these bugs were found with the upcoming
CheckSSTConflicts randomized test in cockroachdb#98408.

Epic: none

Release note: None
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 22, 2023
Previously, the nexting logic around both iterators
being at a range key and not a point key was flawed
in that we'd miss ext points that were in between
the current and next sst keys, when we'd next both
of them. This change addresses that.

It also addresses other miscellaneous corner cases around
stats calculations with overlapping sst/engine range keys
and point keys. All these bugs were found with the upcoming
CheckSSTConflicts randomized test in cockroachdb#98408.

Epic: none

Release note: None
itsbilal added a commit to itsbilal/cockroach that referenced this pull request Mar 22, 2023
Previously, the nexting logic around both iterators
being at a range key and not a point key was flawed
in that we'd miss ext points that were in between
the current and next sst keys, when we'd next both
of them. This change addresses that.

It also addresses other miscellaneous corner cases around
stats calculations with overlapping sst/engine range keys
and point keys. All these bugs were found with the upcoming
CheckSSTConflicts randomized test in cockroachdb#98408.

Epic: none

Release note: None
craig bot pushed a commit that referenced this pull request Mar 22, 2023
…99239 #99263 #99278

98980: kvcoord: Add metric to keep track of restarted ranges in rangefeed r=miretskiy a=miretskiy

Add a `distsender.rangefeed.restart_ranges` metric to keep track of the number of ranges restarted due to transient error.

Epic: CRDB-25044
Release note: None

99069: storage/cloud: correct the flag name in implicit credentials error message r=rhu713 a=taroface

When `--external-io-disable-implicit-credentials` is set and the user issues a command with `AUTH=implicit`, the resulting error message has the wrong flag name (`disable` is left out). Searching for that flag name in the docs doesn't return any results. The flag name is corrected in this PR.

Release note: none
Release justification: CLI bug

99077: changefeedccl: Allow timeout override r=miretskiy a=miretskiy

Add timeout URL parameter for schema registry URIs. Prior to this change, all schema registry calls used default time out of 3 seconds.  This PR increases the timeout to 30 seconds, and allows timeout to be specified via `timeout=T` URL parameter.

Informs https://github.com/cockroachlabs/support/issues/2173

Release note (enterprise change): AVRO schema registry URI allow additional `timeout=T` query parameter to change the default timeout for contacting schema registry.

99141: storage: CheckSSTConflict fix for nexting over overlapping points r=jbowens a=itsbilal

Previously, the nexting logic around both iterators being at a range key and not a point key was flawed in that we'd miss ext points that were in between
the current and next sst keys, when we'd next both of them. This change addresses that.

It also addresses other miscellaneous corner cases around stats calculations with overlapping sst/engine range keys and point keys. All these bugs were found with the upcoming CheckSSTConflicts randomized test in #98408.

Epic: none

Release note: None

99146: opt: speed up lookup constraint builder r=mgartner a=mgartner

#### opt: add benchmark with many lookup joins

This commit adds an optimizer benchmark that explores many lookup joins.
It explores many potential lookup joins that do not ultimately get added
to the memo, as well as many lookup joins that do get added to the memo.

Release note: None

#### opt: split HasSingleColumnConstValues into two functions

This commit splits HasSingleColumnConstValues into two functions - one
that returns a boolean if a constraint set constrains a single column to
a set of constant, non-null values, and another function that returns
the constant values. The former is more efficient when the only the
boolean is needed.

Release note: None

#### opt: simplify lookup join constraint builder

This commit reduces computation and allocations when attempting to
build lookup join constraints by performing a simple column ID equality
before more complex computations and allocations.

Release note: None

#### opt: reduce allocations when building lookup join constraints

During the construction of lookup join constraints, two allocations of a
`opt.ColList` have been combined into a single allocation, and
allocation of a `memo.FiltersExpr` to store remaining filters is now
only performed if necessary.

Release note: None

These changes offer a nice speedup for the newly added benchmark:

```
name                         old time/op    new time/op    delta
SlowQueries/slow-query-1-10    15.8ms ± 1%    15.7ms ± 1%     ~     (p=0.690 n=5+5)
SlowQueries/slow-query-2-10     220ms ± 0%     219ms ± 0%     ~     (p=0.095 n=5+5)
SlowQueries/slow-query-3-10    63.0ms ± 1%    62.4ms ± 0%   -0.98%  (p=0.008 n=5+5)
SlowQueries/slow-query-4-10     1.70s ± 1%     1.38s ± 0%  -19.22%  (p=0.008 n=5+5)

name                         old alloc/op   new alloc/op   delta
SlowQueries/slow-query-1-10    7.04MB ± 0%    6.98MB ± 0%   -0.79%  (p=0.008 n=5+5)
SlowQueries/slow-query-2-10    48.7MB ± 0%    48.7MB ± 0%   -0.11%  (p=0.008 n=5+5)
SlowQueries/slow-query-3-10    45.1MB ± 0%    44.9MB ± 0%   -0.55%  (p=0.008 n=5+5)
SlowQueries/slow-query-4-10     878MB ± 0%     737MB ± 0%  -16.03%  (p=0.008 n=5+5)

name                         old allocs/op  new allocs/op  delta
SlowQueries/slow-query-1-10     76.1k ± 0%     75.8k ± 0%   -0.38%  (p=0.008 n=5+5)
SlowQueries/slow-query-2-10      401k ± 0%      400k ± 0%   -0.25%  (p=0.008 n=5+5)
SlowQueries/slow-query-3-10      390k ± 0%      389k ± 0%   -0.21%  (p=0.008 n=5+5)
SlowQueries/slow-query-4-10     18.2M ± 0%     17.4M ± 0%   -4.44%  (p=0.008 n=5+5)
```

Epic: None


99154: ui: stop polling in stmt fingerprint details page, change default sort on stmts r=maryliag a=xinhaoz

See individual commits.

https://www.loom.com/share/17569db4a0c04a968dabbc4421d429bf

99169: kv: unflake TestDelegateSnapshot r=kvoli a=andrewbaptist

Fixes: #96841
Fixes: #96525

Previously this test would assume that all snapshots came from the sending of snapshots through the AdminChangeReplicasRequest which end up as type OTHER. However occassionally we get a spurious raft snapshot which makes this test flaky. This change ignores any raft snapshots that are sent.

Epic: none
Release note: None

99172: upgrades: hardcode descriptors in system_rbr_indexes r=JeffSwenson a=JeffSwenson

Previously, if a change was made to the system.sql_instances, system.lease, or system.sqlliveness bootstrap schema, it would change the behavior of the upgrade attached to the V23_1_SystemRbrReadNew version gate.

Now, the content of the descriptors is hard coded in the upgrade so that the behavior is not accidentally changed in the future.

Fixes: #99074

Release note: None

99180: builtins: add builtin functions which cast to OID to the distSQL block list r=michae2,cucaroach a=msirek

Distributed SQL which executes functions or casts to OID rely on `planner` receiver functions to execute internal SQL to get information about the OID from system tables. If these casts occur on a remote processor, the `planner` is not accessible and a dummy planner is used, which does not implement these receiver functions. To prevent internal errors, these casts or problem functions are added to a distSQL block list by `distSQLExprCheckVisitor`. A cast to an OID can also be done via a builtin function of the same name as the target type, e.g. `regproc`. These builtins do not currently have `DistsqlBlocklist` set, allowing distributed execution.

The solution is to mark `DistsqlBlocklist` as true for any builtin function which casts to an OID type.

Fixes #98373

Release note: None

99239: appstats: fix percentile greater than max latency r=maryliag a=maryliag

Part Of #99070
When an execution happens, its latency is added to a stream and then ordered so percentiles can be queried.
When getting the percentile values, we don't have the timestamp of when each value was added, meaning when we query the stream we could be getting values from a previous aggregation timestamp, if the current windows has very few executions (the stream has a limit, so we only have the most recent execution, but if the statement is not run frequently this stream can have old data).

The way this information is stored will need to be changed, but for now a patchy solution was added so we don't have the case where we show percentiles greater than the actual max.

Release note (bug fix): Add a check so percentiles are never greater than the max latency value.

99263: roachtest: copyfrom fix cluster package install r=aliher1911 a=aliher1911

When installing packages, look on cluster remoteness as proxy for arch instead of roachtest runtime which is runs different arch.

Epic: none

Release note: None

99278: sql: fix update helper optional from clause r=rytaft a=lyang24

fixes #98662

sanity testing:
output
<img width="265" alt="Screen Shot 2023-03-22 at 1 01 10 PM" src="https://user-images.githubusercontent.com/20375035/227027849-34f34bb4-d52b-4de4-8a5b-456ee8b27f1b.png">
sample sql
<img width="874" alt="Screen Shot 2023-03-22 at 1 10 15 PM" src="https://user-images.githubusercontent.com/20375035/227027874-3f6515f4-fdbe-4c64-9d6f-03eb9b5c67f3.png">


Release note (sql change): fix helper message on update sql to correctly position the optional from cause.

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Ryan Kuo <[email protected]>
Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: Jeff <[email protected]>
Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: maryliag <[email protected]>
Co-authored-by: Oleg Afanasyev <[email protected]>
Co-authored-by: Eric.Yang <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Mar 22, 2023
Previously, the nexting logic around both iterators
being at a range key and not a point key was flawed
in that we'd miss ext points that were in between
the current and next sst keys, when we'd next both
of them. This change addresses that.

It also addresses other miscellaneous corner cases around
stats calculations with overlapping sst/engine range keys
and point keys. All these bugs were found with the upcoming
CheckSSTConflicts randomized test in #98408.

Epic: none

Release note: None
@jbowens jbowens added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 27, 2023
@jbowens jbowens force-pushed the check-sst-conflicts branch from 4c0343b to 524b9e5 Compare March 27, 2023 14:36
blathers-crl bot pushed a commit that referenced this pull request Apr 1, 2023
Fixes some additional cases of stats divergence
in CheckSSTConflicts' handling of inbound sst range key
fragments that shadow points in engine and fragment
existing engine range keys.

Found by randomized test in #98408.

Epic: none

Release note: None
@jbowens jbowens force-pushed the check-sst-conflicts branch 3 times, most recently from 8d3cbdb to 443368c Compare April 4, 2023 15:13
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd sometimes see this error in this test, which would cause a panic and fail the test:

I fixed this—there was a sequence of during generation that could leave rangeStart unset, generating a key with a NUL key start bound.

With

./dev test -f TestCheckSSTConflictsRandomized pkg/storage --stress

it looks like we still have at least one more stats calculation issue :(

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @RaduBerinde, and @sumeerbhola)

@itsbilal
Copy link
Contributor

itsbilal commented Apr 4, 2023

This came up in the storage weekly, but the stats failures I'm seeing are involving the same key being present in the sst at different timestamps, of which one has a conflict but we don't error on the conflict (and I believe this function has never errored on this type of a conflict). Here's one example:

          SST:
            "ni"/0.000001035,0→/BYTES/jjub
            "ni"/0.000000970,0→/<empty>
          Before:
            "nd"/0.000000992,0→/BYTES/ljxrdz
            "ni"/0.000000970,0→/BYTES/xscxnv
          After:
            "nd"/0.000000992,0→/BYTES/ljxrdz
            "ni"/0.000001035,0→/BYTES/jjub
            "ni"/0.000000970,0→/<empty>

We should have errored on the ni@970 conflict but we didn't, instead our stats are incorrect as it just assumed ni@1035 is the only inbound key with the prefix "ni".

Asking disaster recovery on whether this is possible in the first place.

@itsbilal
Copy link
Contributor

itsbilal commented Apr 4, 2023

In addition, I noticed that there's a subtle bug in how we set start and end when calling CheckSSTConflicts in the randomized test. All callers of CheckSSTConflicts pass meta keys i.e. timestamp 0 as start/end bounds of the sstable, except for the randomized test. This change fixes this:

diff --git a/pkg/storage/sst_test.go b/pkg/storage/sst_test.go
index 5400e12f114..95f4fc8e546 100644
--- a/pkg/storage/sst_test.go
+++ b/pkg/storage/sst_test.go
@@ -476,8 +476,8 @@ func (o *checkSSTConflictsOp) Run(l *meta.Logger, s *sstTestState) {
                ctx,
                f.Bytes(),
                s.engine,
-               s.bufferedOps[0].sortKey(),
-               s.bufferedOps[len(s.bufferedOps)-1].sortKey(),
+               MVCCKey{Key: s.bufferedOps[0].sortKey().Key},
+               MVCCKey{Key: s.bufferedOps[len(s.bufferedOps)-1].sortKey().Key},
                o.start,
                o.end,
                o.disallowShadowing != nil,
@@ -585,7 +585,7 @@ func (o *addSSTableOp) Run(l *meta.Logger, s *sstTestState) {
                ctx,
                f.Bytes(),
                s.engine,
-               s.bufferedOps[0].sortKey(),
+               MVCCKey{Key: s.bufferedOps[0].sortKey().Key},
                MVCCKey{Key: o.end},
                o.start,
                o.end,

@itsbilal
Copy link
Contributor

itsbilal commented Apr 5, 2023

Took a look at all uses of AddSSTable that I could find, and I couldn't find a single one where we add multiple MVCC keys per roachpb.Key, and also do conflict checking. The c2c case, where we could add multiple keys, sets none of DisallowConflicts, DisallowShadowing or DisallowShadowingBelow so CheckSSTConflicts won't run there to begin with. It's why the assumption that this function has made for years (incl. pre-range keys) is safe to make.

Knowing this, I think we should update the randomized test to only output one TS per user key for point keys.

@jbowens
Copy link
Collaborator Author

jbowens commented Apr 5, 2023

In addition, I noticed that there's a subtle bug in how we set start and end when calling CheckSSTConflicts in the randomized test. All callers of CheckSSTConflicts pass meta keys i.e. timestamp 0 as start/end bounds of the sstable, except for the randomized test. This change fixes this:

There's a lot of subtlety there in the CheckSSTConflicts API. The end boundary doesn't make sense as a MVCCKey if we require end.Key to be an exclusive bound (not even end itself).

@jbowens jbowens force-pushed the check-sst-conflicts branch from 443368c to 93983fd Compare April 5, 2023 19:01
@jbowens jbowens requested a review from a team as a code owner April 5, 2023 19:01
@jbowens jbowens force-pushed the check-sst-conflicts branch 2 times, most recently from 8994dd3 to 76fdfa2 Compare April 6, 2023 14:57
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4417 runs so far, 0 failures, over 21m55s

😎

Thanks for your help here @itsbilal!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @itsbilal, @RaduBerinde, and @sumeerbhola)

@jbowens jbowens force-pushed the check-sst-conflicts branch from 76fdfa2 to caba2d1 Compare April 7, 2023 20:24
@jbowens
Copy link
Collaborator Author

jbowens commented Apr 10, 2023

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 10, 2023

Build failed (retrying...):

@yuzefovich
Copy link
Member

On CI run:

Failed
=== RUN   TestCheckSSTConflictsRandomized
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/c5eb8fc8b8e683c19f3c3e4238f64094/logTestCheckSSTConflictsRandomized3898596176
    test_log_scope.go:79: use -show-logs to present logs inline
    sst_test.go:183: Using seed -4844134980158906090; to re-run set COCKROACH_RANDOM_SEED=-4844134980158906090
    run.go:148: PutMVCC timestamp is empty
    run.go:155: History:
        
        op      0: MVCCDelete("diez"@0) = ok
        op      1: CheckSSTConflicts(disallowShadowing=<nil>, usePrefix=true) = error: PutMVCC timestamp is empty
    panic.go:522: -- test log scope end --

ERROR: a panic has occurred!
Details cannot be printed yet because we are still unwinding.
Hopefully the test harness prints the panic below, otherwise check the test logs.

test logs left over in: /artifacts/tmp/_tmp/c5eb8fc8b8e683c19f3c3e4238f64094/logTestCheckSSTConflictsRandomized3898596176
--- FAIL: TestCheckSSTConflictsRandomized (0.02s)

bors r-

@craig
Copy link
Contributor

craig bot commented Apr 10, 2023

Canceled.

Copy link
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Does the failing seed pass with this change?

@jbowens
Copy link
Collaborator Author

jbowens commented Apr 10, 2023

Yes, although stressing locally again uncovered another failure:

sst_test.go:183: Using seed -4293931100192978035; to re-run set COCKROACH_RANDOM_SEED=-4293931100192978035
    run.go:148: range "u": stats calculated from CheckSSTConflicts differ:
        SST:
          key_count=+1 key_bytes=+17 val_count=+1 val_bytes=+12 range_key_count=+1 range_key_bytes=+17 range_val_count=+1 range_val_bytes=+7
        Conflict:
          range_key_count=+1 range_key_bytes=+20 range_val_count=+1 range_val_bytes=+7
        Before:
          range_key_count=1 range_key_bytes=17 range_val_count=1 range_val_bytes=7
        Before+SST+Conflict:
          key_count=1 key_bytes=17 val_count=1 val_bytes=12 range_key_count=3 range_key_bytes=54 range_val_count=3 range_val_bytes=21
        ComputeStats:
          key_count=1 key_bytes=17 val_count=1 val_bytes=12 range_key_count=1 range_key_bytes=14 range_val_count=1 range_val_bytes=7
        Diff:
          range_key_count=-2 range_key_bytes=-40 range_val_count=-2 range_val_bytes=-14

This might be a consequence of the SST defining utsp both within the bounds of a MVCC Delete Range and as a point?

          SST:
            u{fvx-u}/0.000000156,0→/<empty>
            "utsp"/0.000000104,0→/BYTES/
          Before:
            u{-jbst}/0.000000156,0→/<empty>
          After:
            u{-u}/0.000000156,0→/<empty>
            "utsp"/0.000000104,0→/BYTES/
          error: range "u": stats calculated from CheckSSTConflicts differ:

@itsbilal
Copy link
Contributor

@jbowens At first glance that looks like a genuine failure i.e we should be returning a WriteTooOld error (I think) if a point is sliding under an existing range key. But maybe we aren't because it's also under an existing sst range key, which is an idempotent write with the existing ext range key (i.e. same timestamp). Idempotent writes cause all sorts of special cases and are a headache to handle in CheckSSTConflicts, and this is a partial idempotent write which makes it especially annoying.

I imagine the stats failure would occur even if the point didn't exist, as it looks like it's about the handling of partially-idempotent range keys merging into existing range keys. The point seems to be correctly accounted for (even if maybe it should be a conflict).

jbowens added 3 commits July 27, 2023 15:50
Move the storage test formatStats function onto the MVCCStats type itself in
preparation for using it in additional places.

Epic: None
Release note: None
Callers of CheckSSTConflicts pass in start and end boundaries for the sstable.
Previously, these were passed as MVCCKeys, although the end key's timestamp was
ignored. This resulted in confusing semantics whereby the end boundary was
interpreted as exclusive end bound of `end.Key`.

Epic: none
Release note: none
Add a new randomized test exercising CheckSSTConflicts. Additionally, sketch
out a `meta` package to aid in writing randomized tests like this one. In the
future, this package may be extracted to the cockroachdb/metamorphic
repository.

Epic: None
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants