-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: Add sstable-corruption roachtest #70845
roachtest: Add sstable-corruption roachtest #70845
Conversation
d2bfa7a
to
2fd4810
Compare
TFTR! I did the monitor refactor and got rid of the channel. The test also passes more reliably now, as we're no longer tied to the more finicky output of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @stevendanna)
pkg/cmd/roachtest/tests/sstable_corruption.go, line 46 at r2 (raw file):
return nil }) m.Wait()
Could you indent this whole block so that m
is in a nested scope?
{
m := ...
...
m.Wait()
}
This prevents reuse of monitors, which has lots of snags (actually using just a single monitor already unfortunately has snags :sadface:)
pkg/cmd/roachtest/tests/sstable_corruption.go, line 47 at r2 (raw file):
}) m.Wait() m.ExpectDeaths(3)
This line isn't doing anything, since you're not using this m
any more and it also isn't watching the nodes any more (since Wait()
returned).
pkg/cmd/roachtest/tests/sstable_corruption.go, line 88 at r2 (raw file):
m.Go(func(ctx context.Context) error { _ = c.RunE(ctx, workloadNode, "./cockroach workload run tpcc --warehouses=100 "+ fmt.Sprintf("--tolerate-errors --duration=%s", 2*time.Minute))
nit: unusual use of Sprintf
here, why don't you Sprintf
the entire args if you're going to do it to part of it?
pkg/cmd/roachtest/tests/sstable_corruption.go, line 89 at r2 (raw file):
_ = c.RunE(ctx, workloadNode, "./cockroach workload run tpcc --warehouses=100 "+ fmt.Sprintf("--tolerate-errors --duration=%s", 2*time.Minute)) return nil
// Don't return an error from the workload. We want outcome of WaitE
to be determined by the monitor noticing that a node died. The workload may also fail, despite --tolerate-errors
, if a node crashes too early.
pkg/cmd/roachtest/tests/sstable_corruption.go, line 92 at r2 (raw file):
}) require.Error(t, m.WaitE()) _ = c.WipeE(ctx, t.L(), c.Node(corruptNode))
// Exempt corrupted node from roachtest harness' post-test liveness checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)
pkg/cmd/roachtest/tests/sstable_corruption.go, line 46 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Could you indent this whole block so that
m
is in a nested scope?{ m := ... ... m.Wait() }This prevents reuse of monitors, which has lots of snags (actually using just a single monitor already unfortunately has snags :sadface:)
I was already putting all uses of m
in this one nested scope, but I outdented the lines below that aren't using m
to make it clear. Hope that works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)
pkg/cmd/roachtest/tests/sstable_corruption.go, line 46 at r2 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I was already putting all uses of
m
in this one nested scope, but I outdented the lines below that aren't usingm
to make it clear. Hope that works!
The goal is that m
goes out of scope once m.Wait()
returns. Otherwise I'm just nervously scanning everything after that that has m
visible for errant uses.
2fd4810
to
607db33
Compare
Oops, forgot a push. Here we go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @stevendanna)
607db33
to
5e0f68e
Compare
TFTR! Latest push now corrupts 6 table SSTs on every node, not just on one node. Should eliminate the flakiness. |
5e0f68e
to
022a0c4
Compare
Adds a new roachtest, sstable-corruption/table, that imports TPCC, then finds an sstable that contains table keys (which are replicated) on one node, then calls `dd` to write random bytes into the middle of them. It then checks that the corrupt node either crashes on restart, or crashes after the tpcc workload is run on it. Informs cockroachdb#67568. Release note: None.
022a0c4
to
8f7f7bf
Compare
bors r=tbg |
Build succeeded: |
Adds a new roachtest, sstable-corruption/table, that
imports TPCC, then finds an sstable that contains
table keys (which are replicated) on one node, then
calls
dd
to write random bytes into the middleof them. It then checks that the corrupt node
either crashes on restart, or crashes after the
tpcc workload is run on it.
Informs #67568.
Release note: None.