-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: observe ctx cancellation in Run #29178
Conversation
@petermattis this is just the first commit from #29174 which doesn't need to be blocked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though do you mind addressing the comment I left regarding l.stdout == l.stderr
?
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, missed that - done (I think).
Reviewable status: complete! 1 of 0 LGTMs obtained
Let's hold off here so that I can try out the "panic escalation" idea (https://play.golang.com/p/0WUEJCAr0su) |
455b164
to
01400e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petermattis ready for another look.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much prefer this approach. Minor question about the test you've added.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/cmd/roachtest/cluster.go, line 225 at r1 (raw file):
debugStdoutBuffer, _ := circbuf.NewBuffer(1024) debugStderrBuffer, _ := circbuf.NewBuffer(1024)
Did you find this useful somewhere? Curious where it came up.
pkg/cmd/roachtest/cluster.go, line 1172 at r1 (raw file):
// Note that the trick here is that we panicked explicitly below, // which somehow "overrides" the Goexit which is supposed to be // un-recoverable, but we do need to recover to return an error.
Oy! This is confusing. Thanks for the comment.
pkg/cmd/roachtest/test_example_adversarial.go, line 34 at r1 (raw file):
} func testHarness(ctx context.Context, t *test, c *cluster) {
Can this be an actual test (i.e TestHarnest(t *testing.T)
)? Yeah, it forks sleep
and echo
, but that should be ok from a test.
c221bbd
to
05314ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/cmd/roachtest/cluster.go, line 225 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Did you find this useful somewhere? Curious where it came up.
I think this will save more time than anything else in this PR - before you'd get something like "command XYZ returned exit status 1" and no details of what the process output. Take a look at the commit message and note how it captures the latest part of stdout/stderr. Basically I want to be able to look at the issue and see the failure instead of clicking through 12 teamcity pages and downloading up to five artifacts. I hope that this achieves that.
pkg/cmd/roachtest/cluster.go, line 1172 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Oy! This is confusing. Thanks for the comment.
Done.
pkg/cmd/roachtest/test_example_adversarial.go, line 34 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Can this be an actual test (i.e
TestHarnest(t *testing.T)
)? Yeah, it forkssleep
andecho
, but that should be ok from a test.
Done. Feels good to have decent coverage for this, but I seriously pity the next person who'll have to touch it. Probably gonna be me :-)
Before this patch, roachprod invocations would not observe ctx cancellation. Or rather they would, but due to the usual obscure passing of Stdout into the child process of roachprod the Run call would not return until the child had finished. As a result, the test would continue running, which is annoying and also costs money. Also fixes up the handling of calling `c.t.Fatal` on a monitor goroutine (using what is perhaps unspecified behavior of the Go runtime). Anyway, the result is that you can do basically whatever inside of a monitor and get away with it: ```go m.Go(func(ctx context.Context) error { // Make sure the context cancellation works (not true prior to the PR // adding this test). return c.RunE(ctx, c.Node(1), "sleep", "2000") }) m.Go(func(ctx context.Context) error { // This will call c.t.Fatal which also used to wreak havoc on the test // harness. Now it exits just fine (and all it took were some mean hacks). // Note how it will exit with stderr and stdout in the failure message, // which is extremely helpful. c.Run(ctx, c.Node(1), "echo foo && echo bar && notfound") return errors.New("impossible") }) m.Wait() ``` now returns ``` --- FAIL: tpmc/w=1/nodes=3 (0.24s) ...,errgroup.go:58: /Users/tschottdorf/go/bin/roachprod run local:1 -- echo foo && echo bar && notfound returned: stderr: bash: notfound: command not found Error: exit status 127 stdout: foo bar : exit status 1 ...,tpcc.go:661: Goexit() was called FAIL ``` Release note: None
bors r=petermattis |
Build failed |
bors r=petermattis TestSplitAt. |
29178: roachtest: observe ctx cancellation in Run r=petermattis a=tschottdorf Before this patch, roachprod invocations would not observe ctx cancellation. Or rather they would, but due to the usual obscure passing of Stdout into the child process of roachprod the Run call would not return until the child had finished. As a result, the test would continue running, which is annoying and also costs money. Co-authored-by: Tobias Schottdorf <[email protected]>
Build succeeded |
Accidentally changed this in cockroachdb#29178. Release note: None
Accidentally changed this in cockroachdb#29178. Release note: None
29329: storage: skip TestClosedTimestampCanServe r=a-robinson a=tschottdorf It won't be the first one I'm looking into. Release note: None 29339: roachtest: fix flake in TestMonitor r=petermattis a=tschottdorf Accidentally changed this in #29178. Fixes #29325. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Accidentally changed this in cockroachdb#29178. Release note: None
29390: backport-2.1: storage, roachtest, cli: assorted backports r=petermattis a=tschottdorf storage: remove test-only data race The test was stopping the engine before the stopper, so the compactor was sometimes able to use the engine while it was being closed. Fixes #29302. roachtest: improve TestMonitor Add two more test cases about the exit status of the `monitor` invocation itself. roachtest: fix flake in TestMonitor Accidentally changed this in #29178. cli: add hex option to debug keys This was used in #29252 and I imagine I'll want to use it again whenever we see the consistency checker fail in the future. storage: skip TestClosedTimestampCanServe It won't be the first one I'm looking into. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
Before this patch, roachprod invocations would not observe
ctx cancellation. Or rather they would, but due to the usual
obscure passing of Stdout into the child process of roachprod
the Run call would not return until the child had finished.
As a result, the test would continue running, which is annoying
and also costs money.