Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github.com/cockroachdb/pebble/internal/metamorphic: TestMeta failed #3226

Closed
cockroach-teamcity opened this issue Jan 17, 2024 · 9 comments · Fixed by #3228
Closed

github.com/cockroachdb/pebble/internal/metamorphic: TestMeta failed #3226

cockroach-teamcity opened this issue Jan 17, 2024 · 9 comments · Fixed by #3228

Comments

@cockroach-teamcity
Copy link
Member

github.com/cockroachdb/pebble/internal/metamorphic.TestMeta failed with artifacts on refs/heads/master @ aba7c0ca6483:

=== CONT  TestMeta/execution/standard-008
=== RUN   TestMeta/execution/standard-011
=== PAUSE TestMeta/execution/standard-011
=== CONT  TestMeta/execution/standard-011
=== RUN   TestMeta/compare/standard-016
=== RUN   TestMeta/execution
=== RUN   TestMeta/execution/random-020
=== PAUSE TestMeta/execution/random-020
=== CONT  TestMeta/execution/random-020
=== RUN   TestMeta/execution/standard-005
=== PAUSE TestMeta/execution/standard-005
=== CONT  TestMeta/execution/standard-005
=== RUN   TestMeta/execution/standard-009
=== PAUSE TestMeta/execution/standard-009
=== CONT  TestMeta/execution/standard-009
=== RUN   TestMeta/execution/standard-018
=== PAUSE TestMeta/execution/standard-018
=== CONT  TestMeta/execution/standard-018
=== RUN   TestMeta/execution/standard-021
=== PAUSE TestMeta/execution/standard-021
=== CONT  TestMeta/execution/standard-021
=== RUN   TestMeta/compare/standard-001
=== RUN   TestMeta/compare/standard-015
=== RUN   TestMeta/compare/standard-018
=== RUN   TestMeta/execution/random-010
=== PAUSE TestMeta/execution/random-010
=== CONT  TestMeta/execution/random-010
=== RUN   TestMeta/execution/random-026
=== PAUSE TestMeta/execution/random-026
=== CONT  TestMeta/execution/random-026
=== RUN   TestMeta/execution/standard-006
=== PAUSE TestMeta/execution/standard-006
=== CONT  TestMeta/execution/standard-006
=== RUN   TestMeta/compare
=== RUN   TestMeta/compare/standard-003
=== RUN   TestMeta/execution/random-004
=== PAUSE TestMeta/execution/random-004
=== CONT  TestMeta/execution/random-004
=== RUN   TestMeta/execution/random-016
=== PAUSE TestMeta/execution/random-016
=== CONT  TestMeta/execution/random-016
=== RUN   TestMeta/execution/standard-001
=== PAUSE TestMeta/execution/standard-001
=== CONT  TestMeta/execution/standard-001
=== RUN   TestMeta/execution/standard-022
=== PAUSE TestMeta/execution/standard-022
=== CONT  TestMeta/execution/standard-022
=== RUN   TestMeta/execution/standard-027
=== PAUSE TestMeta/execution/standard-027
=== CONT  TestMeta/execution/standard-027
Help

To reproduce, try:

go test -tags 'invariants' -exec 'stress -p 1' -timeout 0 -test.v -run 'TestMeta$' ./internal/metamorphic -seed 1705485691727718694 -ops "uniform:5000-10000"

This test on roachdash | Improve this report!

@jbowens
Copy link
Collaborator

jbowens commented Jan 17, 2024

===== SEED =====
1705485691727718694
===== DIFF =====
/artifacts/meta/240117-100131.7271869018396/{standard-000,standard-019}
@@ -4946,11 +4946,11 @@
 iter52.Prev("tzawvrzbj@64") // [invalid] <nil> #4945
 db1.Set("uncnnvlfrkf@124", "mtekomkcsrnx") // <nil> #4946
 iter58.First() // [true,"emdadjky@18","qopgmheo",<no range>] <nil> #4947
 iter57.Last() // [false] <nil> #4948
 snap40 = db1.NewSnapshot("emdadjky", "eobfcejboo", "kjov", "lxqchebfmkp", "tzawvrzbj", "wocufbcpj") #4949
-snap40.Get("emdadjky@111") // [""] pebble: not found #4950
+snap40.Get("emdadjky@111") // ["iontrohmirexteoo"] <nil> #4950
 iter58.SeekLT("uncnnvlfrkf@108", "dntwzduwz@32") // [valid,"tvhuxww@102","zwwl",<no range>] <nil> #4951
 iter58.SeekLT("dsuwcxwrqe@90", "dntwzduwz@88") // [invalid] <nil> #4952
 iter59.SeekLT("kjov@38", "") // [false] <nil> #4953
 iter60.Next("wskxwczl@34") // [invalid] <nil> #4954
 iter55.InternalNext() // <nil> #4955

@itsbilal
Copy link
Member

Reproduces every time with that seed.

@itsbilal
Copy link
Member

Still digging, but the relevant writes on this key are:

db1.Set("emdadjky@111", "iontrohmirexteoo")
db1.Set("emdadjky@111", "pqxwrol")
batch41 = db1.NewBatch()
batch41.DeleteRange("dntwzduwz@59", "uncnnvlfrkf@65")
db1.Ingest(batch41, batch40) # ingest_using_apply = true
db1.SingleDelete("emdadjky@111", false /* maybeReplaceDelete */)
snap40 = db1.NewSnapshot("emdadjky", "eobfcejboo", "kjov", "lxqchebfmkp", "tzawvrzbj", "wocufbcpj")
snap40.Get("emdadjky@111") // ["iontrohmirexteoo"] <nil> #4950

@itsbilal
Copy link
Member

Nevermind, the ingest didn't apply on either instance, so this is the relevant sequence:

db1.Set("emdadjky@111", "iontrohmirexteoo")
db1.Set("emdadjky@111", "pqxwrol")
db1.SingleDelete("emdadjky@111", false /* maybeReplaceDelete */)
snap40 = db1.NewSnapshot("emdadjky", "eobfcejboo", "kjov", "lxqchebfmkp", "tzawvrzbj", "wocufbcpj")
snap40.Get("emdadjky@111") // ["iontrohmirexteoo"] <nil> #4950

@itsbilal
Copy link
Member

Given that, I think this is just a bug in our singledel tracking? There's no way that sequence of operations will be deterministic. Maybe our singledel eligibility tracking is assuming the rangedel was successfully applied when in reality it wasn't. cc @jbowens

@jbowens
Copy link
Collaborator

jbowens commented Jan 17, 2024

Yeah, running with this diff to peer into ops generation:

diff --git a/metamorphic/generator.go b/metamorphic/generator.go
index f00626d5..921cd4b2 100644
--- a/metamorphic/generator.go
+++ b/metamorphic/generator.go
@@ -1446,6 +1446,7 @@ func (g *generator) writerIngest() {
        // Either it succeeds and its keys are committed to the DB, or it fails and
        // the keys are not committed.
        if !g.keyManager.doObjectBoundsOverlap(batchIDs) {
+               fmt.Printf("Ingest of batches %s will succeed.\n", batchIDs)
                // This ingestion will succeed.
                //
                // The batches we're ingesting may contain single delete tombstones that
@@ -1465,6 +1466,8 @@ func (g *generator) writerIngest() {
                                })
                        }
                }
+       } else {
+               fmt.Printf("Ingest of batches %s will fail.\n", batchIDs)
        }
        g.add(&ingestOp{
                dbID:         dbID,
diff --git a/metamorphic/meta.go b/metamorphic/meta.go
index 34b32806..c7005e49 100644
--- a/metamorphic/meta.go
+++ b/metamorphic/meta.go
@@ -197,6 +197,7 @@ func RunAndCompare(t *testing.T, rootDir string, rOpts ...RunOption) {
                cfg.numInstances = runOpts.numInstances
        }
        ops := generate(rng, opCount, cfg, km)
+       return
        opsPath := filepath.Join(metaDir, "ops")
        formattedOps := formatOps(ops)
        require.NoError(t, os.WriteFile(opsPath, []byte(formattedOps), 0644))

The key manager thinks the ingestion should succeed:

Ingest of batches [batch41 batch40] will succeed.

Is some op not accounted for in the calculation of the batch's bounds?

@jbowens
Copy link
Collaborator

jbowens commented Jan 17, 2024

It looks like the key manager thinks batch40 is empty, but in actuality it contains a single RANGEKEYSET:

batch40.RangeKeySet("gkbj", "kjov", "@122", "iicpprtmvcy") // <nil> #4585

We don't track range key state within the key manager, because we don't need to, but we do need to track the bounds since ingestion expects the files to be non-overlapping in the merged range key + point key bounds. I think we'll just need to edit keyManager.update to explicitly expand the bounds of the writer on range key ops. Thanks for finding that. You want to make the change or should I?

@itsbilal
Copy link
Member

batch40:

batch40.RangeKeySet("gkbj", "kjov", "@122", "iicpprtmvcy")

batch41:

batch41.DeleteRange("dntwzduwz@59", "uncnnvlfrkf@65")
batch41.Set("wskxwczl@120", "pkaa")
batch41.RangeKeyUnset("dsuwcxwrqe", "tzawvrzbj", "@80")

It looks like keyManager.update doesn't update anything for range key operations, which is why the key manager thinks this will succeed.

@itsbilal
Copy link
Member

@jbowens I can make the change, thanks for helping uncover it too!

itsbilal added a commit to itsbilal/pebble that referenced this issue Jan 17, 2024
This change updates the key manager to track changes to
writer bounds caused by range key operations. These could affect
the success/failure of ingestions, which is something the key
manager determines.

Fixes cockroachdb#3226.
itsbilal added a commit that referenced this issue Jan 17, 2024
This change updates the key manager to track changes to
writer bounds caused by range key operations. These could affect
the success/failure of ingestions, which is something the key
manager determines.

Fixes #3226.
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants