Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metamorphic: add IngestAndExcise, make EFOS determinitic #3044

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

itsbilal
Copy link
Contributor

@itsbilal itsbilal commented Nov 8, 2023

This change updates the metamorphic tests to support a new useExcise testOption and a new IngestAndExcise operation, which calls db.IngestAndExcise() if useExcise is true, or calls it with an sstable containing DeleteRange/RangeKeyDelete for the same span if useExcise is false.

Furthermore, a new private option is added, efosAlwaysCreatesIterators. If the test option for useExcise (or the previously added useSharedReplicate) is enabled, one of these two things happen to guarantee EFOS determinism around excises:

  1. If efosAlwaysCreatesIterators is true, the behaviour of EFOS is changed to always hold onto a readState and never return errors from NewIter.
  2. If efosAlwaysCreatesIterators is false, regular EFOS behaviour is retained, but a flush is completed after every EFOS instantiation to guarantee that it becomes excise-immune right at instantiation.

Note that regular snapshots are never created when useExcise or useSharedReplicate is true.

Fixes #2885.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@itsbilal
Copy link
Contributor Author

itsbilal commented Nov 8, 2023

This isn't quite ready to merge yet (the body of ingestAndExciseOp is a little rough right now), but I sped it up to help investigate cockroachdb/cockroach#114056 and cockroachdb/cockroach#113973 , and sure enough it fails pretty readily when run locally.

itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 8, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with cockroachdb#3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 8, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with cockroachdb#3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
itsbilal added a commit that referenced this pull request Nov 9, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with #3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 9, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with cockroachdb#3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 9, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with cockroachdb#3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
itsbilal added a commit that referenced this pull request Nov 9, 2023
Previously if filteringIter's FilterFunc mutated the passed-in span
to no longer be a valid return value for a SeekLT or SeekGE call,
we would still return that span even though it could be >=  the seek
key (for SeekLT), or less than it (for SeekGE).

This change updates filteringIter to guard for this case before
returning from a seek call.

Found with #3044.

Informs cockroachdb/cockroach#113973, cockroachdb/cockroach#114056.
@itsbilal itsbilal force-pushed the meta-excise branch 3 times, most recently from 94f1575 to 96e315e Compare November 10, 2023 02:21
@itsbilal
Copy link
Contributor Author

There's one occasional bug/failure around virtual sstable fileMetadata refcounts that this test is still tripping up. I'll look into that momentarily, but that bug aside (which is likely not in the test and in pebble code itself), this is ready for a review.

@itsbilal itsbilal marked this pull request as ready for review November 10, 2023 18:00
@itsbilal itsbilal requested review from a team and jbowens November 10, 2023 18:01
@itsbilal itsbilal force-pushed the meta-excise branch 2 times, most recently from 27e2055 to a765170 Compare November 15, 2023 16:11
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we were calling checkVirtualBounds on FileMetadatas
before they were installed. This would lead to panics in invariant
builds as we would try to load an uninstalled FileMetadata
from the table cache.

Only an issue with invariants builds. Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we were calling checkVirtualBounds on FileMetadatas
before they were installed. This would lead to panics in invariant
builds as we would try to load an uninstalled FileMetadata
from the table cache.

Only an issue with invariants builds. Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks cockroachdb#3044.
itsbilal added a commit to itsbilal/pebble that referenced this pull request Nov 15, 2023
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks cockroachdb#3044.
@itsbilal
Copy link
Contributor Author

Note that only the last commit needs a review. The first couple commits are #3074.

itsbilal added a commit that referenced this pull request Nov 15, 2023
Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks #3044.
itsbilal added a commit that referenced this pull request Nov 15, 2023
Previously, we were calling checkVirtualBounds on FileMetadatas
before they were installed. This would lead to panics in invariant
builds as we would try to load an uninstalled FileMetadata
from the table cache.

This change removes the checkVirtualBounds call, making it a test-only
thing. Adding a check after version installation also doesn't guarantee
that refs will be nonzero, as a compaction immediately after ingestApply
is possible.

Only an issue with invariants builds. Unblocks #3044.
itsbilal added a commit that referenced this pull request Nov 15, 2023
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks #3044.
@itsbilal itsbilal force-pushed the meta-excise branch 2 times, most recently from 8a0ae6f to 84140c8 Compare November 16, 2023 15:40
@itsbilal
Copy link
Contributor Author

Rebased on master so it should just have the one commit for the metamorphic test changes now.

@itsbilal itsbilal force-pushed the meta-excise branch 3 times, most recently from 68dfdcc to 681a4e1 Compare November 22, 2023 19:11
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's still a metamorphic test failure in CI?

===== SEED =====

[41](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:42) 1700494649026393988

[42](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:43) ===== DIFF =====

[43](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:44) \_meta/231120-153729.0263362307724/{standard-000,standard-004}

[44](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:45) @@ -1595,12 +1595,12 @@

[45](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:46) db1.Get("dojyrmo") // \[""\] pebble: not found #1594

[46](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:47) iter17.First() // \[true,"anawodd",<no point>,\["anawodd","anawodd@28")=>{""="pvpltzv","@15"="iefqtmkpw"}\*\] <nil> #1595

[47](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:48) db1.Get("anawodd@28") // \[""\] pebble: not found #1596

[48](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:49) iter15.Last() // \[true,"dojyrmo@50","ymtjebhutgk",<no range>\] <nil> #1597

[49](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:50) iter15.Prev("") // \[false\] <nil> #1598

[50](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:51)\-iter17.NextPrefix() // \[true,"dojyrmo@50","ymtjebhutgk",<no range>\*\] <nil> #1599

[51](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:52)\-iter17.Last() // \[true,"zyolnpsqklu@2","dwl",<no range>\] <nil> #1600

[52](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:53)+iter17.NextPrefix() // \[true,"anawodd@28",<no point>,\["anawodd@28","bifqyolqz@45")=>{""="pvpltzv"}\*\] <nil> #1599

[53](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:54)+iter17.Last() // \[true,"zyolnpsqklu@2","dwl",<no range>\*\] <nil> #1600

[54](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:55) iter17.Last() // \[true,"zyolnpsqklu@2","dwl",<no range>\] <nil> #1601

[55](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:56) iter17.SeekLT("dojyrmo@24", "") // \[true,"dojyrmo@34","jieppgnxeabys",<no range>\] <nil> #1602

[56](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:57) iter17.Next("") // \[true,"dojyrmo@15","qjsanhnsxg",<no range>\] <nil> #1603

[57](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:58) iter16.Next("") // \[false\] <nil> #1604

[58](https://github.com/cockroachdb/pebble/actions/runs/6932349203/job/18856077586?pr=3044#step:4:59) iter15.Next("") // \[true,"dojyrmo@50","ymtjebhutgk",<no range>\] <nil> #1605

Reviewable status: 0 of 18 files reviewed, all discussions resolved

@itsbilal
Copy link
Contributor Author

@jbowens yep, latest push fixes it. basically in the non-shared replicate case we always did a DeleteRange+RangeKeyDelete but in the shared replace case we would do an excise only if there's at least one shared or one unshared file. I've addressed it now.

The boomerang-file issue was actually ^ so all known issues should be fixed now.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, but I tried stressing the two-instance metamorphic test locally (go test -v -run 'TestMetaTwoInstance' -tags invariants -exec 'stress -p 1') and encountered a failure:

=== RUN   TestMetaTwoInstance/compare/random-000

		===== SEED =====
		1701102066551880000
		===== DIFF =====
		_meta/231127-112106.5512653029539/{standard-000,random-000}
		@@ -1749,11 +1749,11 @@
 iter17.SeekGE("cxcbdcwufu@8", "") // [false] <nil> #1748
 iter16.Prev("") // [false] <nil> #1749
 iter16.SeekGE("nrcmih@3", "") // [false] <nil> #1750
 iter16.Next("feuglg@3") // [invalid] <nil> #1751
 iter19.SetBounds("bhjqwzfgkr@8", "xqdaz@8") // <nil> #1752
-iter19.SeekLT("xqdaz@8", "") // [false] <nil> #1753
+iter19.SeekLT("xqdaz@8", "") // [true,"qbqiswvvkp@8","wwgjvgvupfcsq",<no range>] <nil> #1753
 db1.Delete("cxcbdcwufu@8") // <nil> #1754
 db2.DeleteRange("nhdybnj", "quaizejvik@7") // <nil> #1755
 iter19.SetBounds("bhjqwzfgkr@8", "bhjqwzfgkr@8") // <nil> #1756
 iter19.SeekGE("bhjqwzfgkr@8", "") // [false] <nil> #1757
 iter19.Next("") // [false] <nil> #1758
@@ -2046,11 +2046,11 @@

Reviewed all commit messages.
Reviewable status: 0 of 18 files reviewed, all discussions resolved

@itsbilal
Copy link
Contributor Author

I'll take a look at that failure. We should still review and aim to merge this soon though, as this test has already found multiple genuine bugs (including one that KV had to triage) and new failures are far more likely to be bugs in Pebble itself than in the test at this point.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r5, 1 of 6 files at r8.
Reviewable status: 3 of 18 files reviewed, 14 unresolved discussions (waiting on @itsbilal)


db.go line 1058 at r8 (raw file):

		if internalOpts.snapshot.vers == nil {
			if internalOpts.snapshot.readState != nil {
				readState = internalOpts.snapshot.readState

Is it necessary to ref the readState because it'll be unrefed when the iterator is closed?


db.go line 1291 at r8 (raw file):

	if sOpts.vers == nil {
		if sOpts.readState != nil {
			readState = sOpts.readState

Ditto


snapshot.go line 438 at r8 (raw file):

		if err := es.mu.snap.closeLocked(); err != nil {
			return err
		}

nit: unrelated to this change, but closeLocked appears to be infallible. We should remove the error return value as it would make this easier to see that we're not leaking snapshot refs in an error case.


snapshot.go line 508 at r8 (raw file):

		}
		sOpts.readState = es.mu.rs
	}

Is there a race here when es.alwaysCreateIters == true and es.excised.Load() == false? The span may be excised after we load the atomic but before we acquire the current read state.


snapshot.go line 559 at r8 (raw file):

			}
		}
	}

Same race concern here too


metamorphic/generator.go line 1238 at r8 (raw file):

	// If we're in multi-instance mode, we must always create bounds, as we will
	// always create EventuallyFileOnlySnapshots to allow commands that use excises
	// (eg. replicateOp) to work.

I'm wondering if we should make this conditional on the op distribution (writerIngestAndExcise > 0), and then add a separate op distribution for writerIngestAndExcise > 0. It'd be nice to maintain test coverage ensuring equivalence between EFOS and ordinary snapshots in some test runs.


metamorphic/generator.go line 1451 at r8 (raw file):

	dbID := g.dbs.rand(g.rng)
	// Ingest between 1 and 3 batches.

This comment is incorrect, right? We're only ingesting one batch?


metamorphic/ops.go line 792 at r8 (raw file):

}

type ingestAndExciseOp struct {

I think we need to add this op to keyManager.update (see its treatment of ingestOp).

We should also update opWrittenKeys to return the two excise keys for ingestAndExciseOps.


metamorphic/ops.go line 803 at r8 (raw file):

	b := t.getBatch(o.batchID)
	t.clearObj(o.batchID)
	if b.Empty() || t.testOpts.Opts.Comparer.Compare(o.exciseEnd, o.exciseStart) <= 0 {

If the excise span is invalid but the batch is non-empty, the batch is discarded? Should we always generate well-formed excise spans to avoid wasted test ops?


metamorphic/ops.go line 817 at r8 (raw file):

	path, err2 := o.build(t, h, b, 0 /* i */)
	if err2 != nil {
		h.Recordf("Build(%s) // %v", o.batchID, err2)

Missing return?


metamorphic/ops.go line 1445 at r8 (raw file):

			if err != nil {
				h.Recordf("%s // %v", o, err)
				h.history.err.Store(errors.Wrap(err, "newSnapshotOp"))

Should we panic here? The test isn't going to be able to make progress without the snapshot, and I think it'll nil pointer on the first operation that uses the snapshot.


metamorphic/ops.go line 1462 at r8 (raw file):

			}
			return rangekey.Encode(&s, func(k base.InternalKey, v []byte) error {
				return w.AddRangeKey(base.MakeInternalKey(k.UserKey, 0, k.Kind()), v)

Did we not need the sequence number zeroing?


metamorphic/ops.go line 1535 at r8 (raw file):

	var sharedSSTs []pebble.SharedSSTMeta
	var err error
	sstEmpty := true

I think we could use (*sstable.Writer).Metadata instead to observe how many keys were written, if any.


metamorphic/options.go line 605 at r8 (raw file):

	testOpts.useExcise = rng.Intn(2) == 0
	if testOpts.useExcise || testOpts.useSharedReplicate {
		//testOpts.efosAlwaysCreatesIters = rng.Intn(2) == 0

debugging detritus?

Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! Also fixed the failure identified above. It was because IngestAndExciseOp had slightly divergent behaviour with useExcise and the only content of the batch being a rangedel that starts/ends at the same key. Moved to a better NumEntries based approach for identifying the no-op case in both replicateOp and ingestAndExciseOp..

Reviewable status: 3 of 18 files reviewed, 14 unresolved discussions (waiting on @jbowens)


db.go line 1058 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is it necessary to ref the readState because it'll be unrefed when the iterator is closed?

Done.


db.go line 1291 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Ditto

Done.


snapshot.go line 438 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: unrelated to this change, but closeLocked appears to be infallible. We should remove the error return value as it would make this easier to see that we're not leaking snapshot refs in an error case.

That's a good point. And it looks like snap.Close() is similar because it returns closeLocked. I can do this in a separate PR.


snapshot.go line 508 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Is there a race here when es.alwaysCreateIters == true and es.excised.Load() == false? The span may be excised after we load the atomic but before we acquire the current read state.

Good catch. Wasn't tripping up in the metamorphic testing because of the way it syncs operations, but fixed (by grabbing the db mutex conditionally in this method if alwaysCreateIters is true).


snapshot.go line 559 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Same race concern here too

Done.


metamorphic/generator.go line 1238 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm wondering if we should make this conditional on the op distribution (writerIngestAndExcise > 0), and then add a separate op distribution for writerIngestAndExcise > 0. It'd be nice to maintain test coverage ensuring equivalence between EFOS and ordinary snapshots in some test runs.

We still get that equivalence, right? We don't create an EFOS just because bounds are present. We just generate bounds in all cases so depending on whether the test options necessitate an EFOS (and currently none of the standard options do) or the outcome of the fibonacci hash, we could still create a regular snapshot.


metamorphic/generator.go line 1451 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

This comment is incorrect, right? We're only ingesting one batch?

Copy-paste artifact. Removed.


metamorphic/ops.go line 792 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think we need to add this op to keyManager.update (see its treatment of ingestOp).

We should also update opWrittenKeys to return the two excise keys for ingestAndExciseOps.

Done.


metamorphic/ops.go line 803 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

If the excise span is invalid but the batch is non-empty, the batch is discarded? Should we always generate well-formed excise spans to avoid wasted test ops?

Good catch. Done.


metamorphic/ops.go line 817 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Missing return?

Done.


metamorphic/ops.go line 1445 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Should we panic here? The test isn't going to be able to make progress without the snapshot, and I think it'll nil pointer on the first operation that uses the snapshot.

Done.


metamorphic/ops.go line 1462 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

Did we not need the sequence number zeroing?

ScanInternal does it for us (and also does the appropriate sort).


metamorphic/ops.go line 1535 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I think we could use (*sstable.Writer).Metadata instead to observe how many keys were written, if any.

Done.


metamorphic/options.go line 605 at r8 (raw file):

Previously, jbowens (Jackson Owens) wrote…

debugging detritus?

Done.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 6 files at r8, 5 of 6 files at r9.
Reviewable status: 10 of 19 files reviewed, 6 unresolved discussions (waiting on @itsbilal)


snapshot.go line 590 at r9 (raw file):

	if es.alwaysCreateIters {
		// See the similar conditional above where we grab this mutex.
		es.db.mu.Unlock()

do you have any ideas on how we could clean this up? The conditional mutex acquisition is hard to follow. At the very least we could move the opts := statement up above the lock acquisition to reduce the volume of code that sits within the critical section, making it a little easier to parse.

maybe we should split the entire remaining critical section into separate branches based on alwaysCreateIters versus !alwaysCrateIters? This applies to NewIterWithContext too I think.


metamorphic/generator.go line 1238 at r8 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

We still get that equivalence, right? We don't create an EFOS just because bounds are present. We just generate bounds in all cases so depending on whether the test options necessitate an EFOS (and currently none of the standard options do) or the outcome of the fibonacci hash, we could still create a regular snapshot.

Ah, yeah, I wasn't reading carefully.


metamorphic/generator.go line 1456 at r9 (raw file):

	start := g.randKeyToWrite(0.001)
	end := g.randKeyToWrite(0.001)
	for g.cmp(start, end) == 0 {

nit: g.equal(start, end)

This change updates the metamorphic tests to support a new
useExcise testOption and a new IngestAndExcise operation, which
calls db.IngestAndExcise() if useExcise is true, or calls it
with an sstable containing DeleteRange/RangeKeyDelete for the
same span if useExcise is false.

Furthermore, a new private option is added, efosAlwaysCreatesIterators.
If the test option for useExcise (or the previously added useSharedReplicate)
is enabled, one of these two things happen to guarantee EFOS determinism
around excises:

1) If efosAlwaysCreatesIterators is true, the behaviour of EFOS
is changed to always hold onto a readState and never return
errors from NewIter.
2) If efosAlwaysCreatesIterators is false, regular EFOS behaviour
is retained, but a flush is completed after every EFOS instantiation
to guarantee that it becomes excise-immune right at instantiation.

Note that regular snapshots are never created when useExcise
or useSharedReplicate is true.

Fixes cockroachdb#2885.
Copy link
Contributor Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Dismissed @jbowens from a discussion.
Reviewable status: 6 of 19 files reviewed, 4 unresolved discussions (waiting on @jbowens)


snapshot.go line 590 at r9 (raw file):

Previously, jbowens (Jackson Owens) wrote…

do you have any ideas on how we could clean this up? The conditional mutex acquisition is hard to follow. At the very least we could move the opts := statement up above the lock acquisition to reduce the volume of code that sits within the critical section, making it a little easier to parse.

maybe we should split the entire remaining critical section into separate branches based on alwaysCreateIters versus !alwaysCrateIters? This applies to NewIterWithContext too I think.

Done. I did a separate function to handle the alwaysCreateIters case of NewIterWithContext as that function was filled with lots of conditionals on that boolean. For ScanInternal I just moved the opts up as it's (imo) clearer what's going on in the critical section there.

@itsbilal itsbilal merged commit 4fc6ccd into cockroachdb:master Nov 29, 2023
11 checks passed
RaduBerinde pushed a commit to RaduBerinde/pebble that referenced this pull request Nov 30, 2023
Previously, we weren't setting hideObsoletePoints to true for
compaction iters created out of shared foreign sstables. This
change addresses that by passing that state through virtualState
in VirtualReader through to the old Reader.

Unblocks cockroachdb#3044.
RaduBerinde pushed a commit to RaduBerinde/pebble that referenced this pull request Nov 30, 2023
Previously, we were passing range keys that were sorted by trailer
descending right from ScanInternal. However, the caller isn't
interested in seqnum-based sorting, only kind-based trailer sorting.
This change zeroes seqnums before sorting the keys.

Unblocks cockroachdb#3044.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

metamorphic: integrate IngestWithExcise
3 participants