-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: version/mixed/nodes=3 failed #66522
Comments
My hunch would be that this is fallout from #66414, though I don't understand exactly why. I think what we're seeing here and in the sibling issue #66528 is that the monitor thinks a node is dead (with exit status zero) when the test intentionally stops and restarts a node here: cockroach/pkg/cmd/roachtest/version.go Lines 135 to 139 in 7dc3736
I think the problem is that we're doing a graceful stop here, i.e. the process stops with exit status zero, but the monitor now detects this as a dead node. I suspect it didn't prior to #66414. I think this new behavior makes sense. I'll verify and update the test with an appropriate |
I think this is because the restart will cause the node to start with a new PID. We intentionally changed the behavior of We could e.g. change this to emit a |
Hmm, it has to be something like that, but not quite. I realized cockroach/pkg/cmd/roachtest/version.go Lines 119 to 123 in 7dc3736
I see that when we're starting CRDB, there's an extra "dead (exit status 0)" event that shouldn't be there. Will fix in |
66420: ui: disallow imports from cluster-ui sources r=koorosh a=koorosh Db Console depends on local `cluster-ui` package that and it was possible to specify imports to its modules as a paths to source modules or use exported `index` file which is an entry point for bundled module. `cluster-ui` package has to be built before imports in Db Console because it has its own dependencies and build process which isn't compatible with build process of Db Console. To prevent incorrect imports, this change adds es lint rule to prohibit any imports from `@cockroachlabs/cluster-ui/src/*` path. Release note: none 66539: roachprod: avoid extra dead event in roachprod monitor r=erikgrinaker a=tbg Prior to this commit, we'd get an extra dead event when restarting a node: ``` // initially running 1: 5762 // issue roachprod stop 1: dead (exit status 137) // issue roachprod start 1: dead (exit status 0) 1: 6254 ``` This would in turn upset roachtest's `monitor`, as it keeps track of the number of expected death events, and cause bogus test failures. This was introduced in #66414 and is fixed in this commit. As a small extra fix, if we don't observe the exit status of the stopped systemd unit (i.e. if the process cycles rapidly), we now correctly print that we don't know the exit status, where previously we would print the "exit" status of the new running incarnation, i.e. likely zero. I tested this manually by exacerbating the sleeps in the shell snippet: ``` 1: 10885 1: dead (exit status unknown) 1: 11220 ``` I also verfied that repeated restarts produce the expected events in general, i.e. one dead event following one pid event. Closes #66522 Closes #66528 Release note: None 66540: backupccl: add setting to write files in SQL r=dt a=dt This adds a setting -- default off -- to force BACKUP to always ask KV to return files to SQL to write instead of writing them directly. This is currently what it does for tenants but not for the system tenant due to the extra network hop, data copy, cpu and mem overhead. Release note: none. 66542: build: set ROACHPROD_USER in weekly roachtest script r=tbg a=rickystewart This was lost in 6651a08. Release note: None Co-authored-by: Andrii Vorobiov <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
Prior to this commit, we'd get an extra dead event when restarting a node: ``` // initially running 1: 5762 // issue roachprod stop 1: dead (exit status 137) // issue roachprod start 1: dead (exit status 0) 1: 6254 ``` This would in turn upset roachtest's `monitor`, as it keeps track of the number of expected death events, and cause bogus test failures. This was introduced in cockroachdb#66414 and is fixed in this commit. As a small extra fix, if we don't observe the exit status of the stopped systemd unit (i.e. if the process cycles rapidly), we now correctly print that we don't know the exit status, where previously we would print the "exit" status of the new running incarnation, i.e. likely zero. I tested this manually by exacerbating the sleeps in the shell snippet: ``` 1: 10885 1: dead (exit status unknown) 1: 11220 ``` I also verfied that repeated restarts produce the expected events in general, i.e. one dead event following one pid event. Closes cockroachdb#66522 Closes cockroachdb#66528 Release note: None
roachtest.version/mixed/nodes=3 failed with artifacts on master @ ec36bc66f5147a8703719a9c890be12dd0c08945:
Reproduce
To reproduce, try:
# From https://go.crdb.dev/p/roachstress, perhaps edited lightly. caffeinate ./roachstress.sh version/mixed/nodes=3
This test on roachdash | Improve this report!
The text was updated successfully, but these errors were encountered: