Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/bank/zerosum-splits failed [skipped] #62749

Closed
andreimatei opened this issue Mar 29, 2021 · 2 comments · Fixed by #63073
Closed

roachtest: acceptance/bank/zerosum-splits failed [skipped] #62749

andreimatei opened this issue Mar 29, 2021 · 2 comments · Fixed by #63073
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test

Comments

@andreimatei
Copy link
Contributor

Unclear why it's skipped; there's a tumultuous history for the test. Last failure I think was #53829 and the test was most recently skipped in #53833.

@andreimatei andreimatei added A-testing Testing tools and infrastructure skipped-test labels Mar 29, 2021
@andreimatei andreimatei added the C-test-failure Broken test (automatically or manually discovered). label Mar 29, 2021
@cockroachdb cockroachdb deleted a comment from blathers-crl bot Mar 29, 2021
@andreimatei andreimatei removed the A-testing Testing tools and infrastructure label Mar 29, 2021
@andreimatei
Copy link
Contributor Author

See #62751 for the skipped sibling bank/zerosum-restart.

@tbg
Copy link
Member

tbg commented Mar 31, 2021

Should we delete this test? kvnemesis does splits, checks invariants (and at a deeper level), and more.

@tbg tbg changed the title bank/zerosum-splits skipped roachtest: acceptance/bank/zerosum-splits failed [skipped] Mar 31, 2021
tbg added a commit to tbg/cockroach that referenced this issue Apr 5, 2021
The `bank` roachtests are a basic form of Jepsen test. We were hoping
that they could provide additional benefit due to being less complex
than a full Jepsen test. In my opinion, this calculus has not worked
out. We've spent many hours staring at this test deadlocked when it
wasn't CockroachDB's fault at all. Besides, it's not adding anything
meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC
invariant checks, so we are removing the `bank` family of tests here.

Should we want to reintroduce these tests, we should do it at the level
of TestCluster, or at least augment the `bank` workload to incorporate
these invariant checks instead.

Closes cockroachdb#62754.
Closes cockroachdb#62749
Closes cockroachdb#53871.

Release note: None
craig bot pushed a commit that referenced this issue Apr 8, 2021
62728: roachtest: don't call t.Fatal in FailOnReplicaDivergence r=erikgrinaker a=tbg

In #61990 we had this method catch a stats divergence on teardown in an
otherwise successful test. The call to `t.Fatal` in that method
unfortunately prevented the logs from being collected, which is not
helpful.

Release note: None


63073: roachtest: retire `bank` tests r=nvanbenschoten,andreimatei a=tbg

The `bank` roachtests are a basic form of Jepsen test. We were hoping
that they could provide additional benefit due to being less complex
than a full Jepsen test. In my opinion, this calculus has not worked
out. We've spent many hours staring at this test deadlocked when it
wasn't CockroachDB's fault at all. Besides, it's not adding anything
meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC
invariant checks, so we are removing the `bank` family of tests here.

Should we want to reintroduce these tests, we should do it at the level
of TestCluster, or at least augment the `bank` workload to incorporate
these invariant checks instead.

Closes #62754.
Closes #62749
Closes #53871.

Release note: None


63214: kvserver: reduce ReplicaGCQueueInactivityThreshold to 12 hours r=ajwerner a=erikgrinaker

`ReplicaGCQueueInactivityThreshold` specifies the interval at which the
GC queue checks whether a replica has been removed from the canonical
range descriptor, and was set to 10 days. This is a fallback for when we
fail to detect the removal and GC the replica immediately. However, this
could occasionally cause stale replicas to linger for 10 days, which
surprises users (e.g. by causing alerts if the stale replica thinks the
range has become underreplicated).

This patch reduces the threshold to 12 hours, which is a more reasonable
timeframe for users to expect things to "sort themselves out". The
operation to read the range descriptor is fairly cheap, so this is not
likely to cause any problems, and the interval is therefore not jittered
either.

Resolves #63212, touches #60259.

Release note (ops change): Replica garbage collection now checks
replicas against the range descriptor every 12 hours (down from 10 days)
to see if they should be removed. Replicas that fail to notice they have
been removed from a range will therefore linger for at most 12 hours
rather than 10 days.

Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
@craig craig bot closed this as completed in 2cebc9b Apr 8, 2021
nanduu04 pushed a commit to nanduu04/cockroach that referenced this issue Apr 9, 2021
The `bank` roachtests are a basic form of Jepsen test. We were hoping
that they could provide additional benefit due to being less complex
than a full Jepsen test. In my opinion, this calculus has not worked
out. We've spent many hours staring at this test deadlocked when it
wasn't CockroachDB's fault at all. Besides, it's not adding anything
meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC
invariant checks, so we are removing the `bank` family of tests here.

Should we want to reintroduce these tests, we should do it at the level
of TestCluster, or at least augment the `bank` workload to incorporate
these invariant checks instead.

Closes cockroachdb#62754.
Closes cockroachdb#62749
Closes cockroachdb#53871.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue Apr 26, 2021
The `bank` roachtests are a basic form of Jepsen test. We were hoping
that they could provide additional benefit due to being less complex
than a full Jepsen test. In my opinion, this calculus has not worked
out. We've spent many hours staring at this test deadlocked when it
wasn't CockroachDB's fault at all. Besides, it's not adding anything
meaningful on top of our Jepsen tests plus KVNemesis plus the TPCC
invariant checks, so we are removing the `bank` family of tests here.

Should we want to reintroduce these tests, we should do it at the level
of TestCluster, or at least augment the `bank` workload to incorporate
these invariant checks instead.

Closes cockroachdb#62754.
Closes cockroachdb#62749
Closes cockroachdb#53871.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants