-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acceptance: TestBuildInfo panics in Cluster.IsReplicated #29151
Comments
Took a look at this just now and both of the instances I investigated showed that the server nodes didn't start because there was already something listening on the desired port:
Because the server didn't start, How to track down what was listening on the port and preventing the server from starting? |
What if there is something sitting on the agent machine listening on those ports? I just saw a second failure in a row of |
The two failed |
I think that any acceptance test that runs in "local" mode will be susceptible to this problem. We currently have 11 such tests:
I already want to rewrite the gossip tests as roachtests (#29115). Seems reasonable to do the rest as well and then to get rid of the "local" cluster mode for acceptance tests. Sort of a rote task, but could probably be knocked out pretty quickly. |
Logging into a problematic agent shows that there are processes listening on the cockroach ports used for "local" clusters:
Looks like another run of
Looking at the build history on that agent I see an earlier run of
Node 2 couldn't connect here because node 1 crashed:
(@jordanlewis So it seems like |
Nice sleuthing @petermattis. Arguably this problem is hard to avoid when sharing resources (roachtest might run into the same, but is probably better about killing stuff out of its way before it starts new things) but we can preemptively kill all processes when such a package/suite starts running. |
|
Yes, I think roachtest will be better about killing off stuff and not leave it lingering. I'm also taking a look at getting rid of the
Yes, that is the new policy. From now on nobody is allowed to push buggy PRs! |
Move the build-info acceptance test to a new acceptance/build-info roachtest. Move the existing cli/node-status to acceptance/cli/node-status. The acceptance roachtests are all going to share the same cluster to amortize the cluster setup/teardown time. Fun fact: the acceptance/build-info roachtest is about twice as fast as the old build-info acceptance test. See cockroachdb#29151 Release note: None
29437: roachtest: add acceptance/build-info r=benesch a=petermattis Move the build-info acceptance test to a new acceptance/build-info roachtest. Move the existing cli/node-status to acceptance/cli/node-status. The acceptance roachtests are all going to share the same cluster to amortize the cluster setup/teardown time. Fun fact: the acceptance/build-info roachtest is about twice as fast as the old build-info acceptance test. See #29151 Release note: None Co-authored-by: Peter Mattis <[email protected]>
Move the cluster-recovery and node-restart acceptance tests to acceptance/bank/{cluster-recovery,node-restart} roachtests. See cockroachdb#29151 Release note: None
Move the status-server acceptance test to a new acceptance/status-server roachtest. See cockroachdb#29151 Release note: None
Move the rapid-restart acceptance test to a new acceptance/rapid-restart roachtest. See cockroachdb#29151 Release note: None
Move the gossip-peerings and gossip-restart acceptance tests to new acceptance/gossip/{peerings,restart} roachtests. See cockroachdb#29151 Release note: None
Move the gossip-restart-first-node-needs-incoming acceptance test to a new acceptance/gossip/restart-node-one roachtest. See cockroachdb#29151 Fixes cockroachdb#29115 Release note: None
29449: roachtest: add acceptance/bank/{cluster-recovery,node-restart} r=benesch a=petermattis Move the cluster-recovery and node-restart acceptance tests to acceptance/bank/{cluster-recovery,node-restart} roachtests. See #29151 Release note: None Co-authored-by: Peter Mattis <[email protected]>
Move the gossip-peerings and gossip-restart acceptance tests to new acceptance/gossip/{peerings,restart} roachtests. See cockroachdb#29151 Release note: None
Move the gossip-restart-first-node-needs-incoming acceptance test to a new acceptance/gossip/restart-node-one roachtest. See cockroachdb#29151 Fixes cockroachdb#29115 Release note: None
Move the version-upgrade acceptance test to a new acceptance/version-upgrade roachtest. See cockroachdb#29151 Release note: None
29324: storage: fix a nasty merge deadlock r=tschottdorf,nvanbenschoten a=benesch Fix a nasty edge case which could cause a concurrent merge and split to deadlock. See the comment on TestStoreRangeMergeConcurrentSplit for details. Release note: None 29465: roachtest: add acceptance/gossip/* r=benesch a=petermattis Move the gossip acceptance tests to roachtests. See #29151 Fixes #29115 Co-authored-by: Nikhil Benesch <[email protected]> Co-authored-by: Peter Mattis <[email protected]>
Move the version-upgrade acceptance test to a new acceptance/version-upgrade roachtest. See cockroachdb#29151 Release note: None
Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes cockroachdb#29151 Release note: None
Move the version-upgrade acceptance test to a new acceptance/version-upgrade roachtest. See cockroachdb#29151 Release note: None
29444: changefeedccl: resurrect changefeed benchmark r=mrtracy a=danhhz name time/op ChangefeedTicks/InitialScan-8 12.3ms ± 3% ChangefeedTicks/SteadyState-8 52.3ms ± 3% name speed ChangefeedTicks/InitialScan-8 11.6MB/s ± 3% ChangefeedTicks/SteadyState-8 2.73MB/s ± 3% name alloc/op ChangefeedTicks/InitialScan-8 2.08MB ± 0% ChangefeedTicks/SteadyState-8 5.37MB ± 0% name allocs/op ChangefeedTicks/InitialScan-8 35.1k ± 0% ChangefeedTicks/SteadyState-8 68.6k ± 0% Release note: None 29470: roachtest: add acceptance/version-upgrade r=nvanbenschoten a=petermattis Move the version-upgrade acceptance test to a new acceptance/version-upgrade roachtest. See #29151 Release note: None 29511: roachtest: relax space reclamation in drop test r=petermattis a=tschottdorf We know it fails; we are not looking into it right now due to other priorities. This should be investigated and fixed in the course of #29290. Closes #29232. Closes #29327. Release note: None Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Peter Mattis <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>
Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes cockroachdb#29151 Release note: None
Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes cockroachdb#29151 Release note: None
29367: changefeedccl: error when a watched table backfills r=mrtracy,vivekmenezes a=danhhz When a table is currently being backfilled for a schema change (e.g. adding a column with a default value), it's unclear what the expectation is for any rows that are changed during the backfill. Our current invariant is that rows are emitted with an updated timestamp and a later SELECT ... AS OF SYSTEM TIME for that row would exactly match the emitted data. During the backfill, there is nothing we can emit that would definitely meet that invariant (because the backfill can be aborted and rolled back). In the meantime, this commit makes sure that we error whenever a backfill happens, even if it's fast enough that we never get it from leasing. This also paves the way for switching to RangeFeed, which doesn't have the convenient `fetchSpansForTargets` hook that the ExportRequest based poller was (ab)using. Closes #28643 Release note (bug fix): CHANGEFEEDs now error when a watched table backfills (instead of undefined behavior) 29427: docs: Fix replace and link in table_ref diagram r=jseldess a=jseldess Needed for cockroachdb/docs#3682. Release note: None 29488: roachtest: add acceptance/decommission r=benesch,tschottdorf a=petermattis Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes #29151 Release note: None 29538: stats: document stats-related commands as experimental r=RaduBerinde a=RaduBerinde Update the documentation inside `sql.y` to designate the stats-related statements as experimental. Release note: None 29546: roachtest: skip (intentionally) failing Kafka chaos test r=petermattis a=tschottdorf This test has no chance of passing until Kafka chaos is actually supported (see #28636). Touches #29196. Release note: None 29550: testcluster: make manual replication mode disable the merge queue r=petermattis a=benesch TestClusters have a manual replication mode for use in tests that need to precisely control replication on a cluster. Teach that mode to disable the merge queue in addition to the split and replicate queues. This decreases the number of tests that need to directly disable the merge queue. Release note: None 29552: ui: add attributes to login form so LastPass will autofill it r=vilterp a=vilterp LastPass wasn't confident enough to autofill and autologin without these attributes. Fixes #29529 (fixes for LastPass, but maybe not other PW managers) Release note (admin ui change): Add attributes to the login form to allow LastPass to properly recognize it. Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Jesse Seldess <[email protected]> Co-authored-by: Peter Mattis <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]> Co-authored-by: Pete Vilter <[email protected]>
29367: changefeedccl: error when a watched table backfills r=mrtracy,vivekmenezes a=danhhz When a table is currently being backfilled for a schema change (e.g. adding a column with a default value), it's unclear what the expectation is for any rows that are changed during the backfill. Our current invariant is that rows are emitted with an updated timestamp and a later SELECT ... AS OF SYSTEM TIME for that row would exactly match the emitted data. During the backfill, there is nothing we can emit that would definitely meet that invariant (because the backfill can be aborted and rolled back). In the meantime, this commit makes sure that we error whenever a backfill happens, even if it's fast enough that we never get it from leasing. This also paves the way for switching to RangeFeed, which doesn't have the convenient `fetchSpansForTargets` hook that the ExportRequest based poller was (ab)using. Closes #28643 Release note (bug fix): CHANGEFEEDs now error when a watched table backfills (instead of undefined behavior) 29427: docs: Fix replace and link in table_ref diagram r=jseldess a=jseldess Needed for cockroachdb/docs#3682. Release note: None 29488: roachtest: add acceptance/decommission r=benesch,tschottdorf a=petermattis Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes #29151 Release note: None 29538: stats: document stats-related commands as experimental r=RaduBerinde a=RaduBerinde Update the documentation inside `sql.y` to designate the stats-related statements as experimental. Release note: None 29546: roachtest: skip (intentionally) failing Kafka chaos test r=petermattis a=tschottdorf This test has no chance of passing until Kafka chaos is actually supported (see #28636). Touches #29196. Release note: None 29550: testcluster: make manual replication mode disable the merge queue r=petermattis a=benesch TestClusters have a manual replication mode for use in tests that need to precisely control replication on a cluster. Teach that mode to disable the merge queue in addition to the split and replicate queues. This decreases the number of tests that need to directly disable the merge queue. Release note: None 29552: ui: add attributes to login form so LastPass will autofill it r=vilterp a=vilterp LastPass wasn't confident enough to autofill and autologin without these attributes. Fixes #29529 (fixes for LastPass, but maybe not other PW managers) Release note (admin ui change): Add attributes to the login form to allow LastPass to properly recognize it. Co-authored-by: Daniel Harrison <[email protected]> Co-authored-by: Jesse Seldess <[email protected]> Co-authored-by: Peter Mattis <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]> Co-authored-by: Nikhil Benesch <[email protected]> Co-authored-by: Pete Vilter <[email protected]>
Move the build-info acceptance test to a new acceptance/build-info roachtest. Move the existing cli/node-status to acceptance/cli/node-status. The acceptance roachtests are all going to share the same cluster to amortize the cluster setup/teardown time. Fun fact: the acceptance/build-info roachtest is about twice as fast as the old build-info acceptance test. See cockroachdb#29151 Release note: None
Move the cluster-recovery and node-restart acceptance tests to acceptance/bank/{cluster-recovery,node-restart} roachtests. See cockroachdb#29151 Release note: None
Move the status-server acceptance test to a new acceptance/status-server roachtest. See cockroachdb#29151 Release note: None
Move the rapid-restart acceptance test to a new acceptance/rapid-restart roachtest. See cockroachdb#29151 Release note: None
Move the gossip-peerings and gossip-restart acceptance tests to new acceptance/gossip/{peerings,restart} roachtests. See cockroachdb#29151 Release note: None
Move the gossip-restart-first-node-needs-incoming acceptance test to a new acceptance/gossip/restart-node-one roachtest. See cockroachdb#29151 Fixes cockroachdb#29115 Release note: None
Move the version-upgrade acceptance test to a new acceptance/version-upgrade roachtest. See cockroachdb#29151 Release note: None
Move the decommission acceptance test to a new acceptance/decommission roachtest. Fixes cockroachdb#29151 Release note: None
The text was updated successfully, but these errors were encountered: