-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: acceptance/gossip/restart failed #48423
Comments
I think the statement that failed is cockroach/pkg/cmd/roachtest/gossip.go Lines 224 to 226 in f80ec9c
I assume we're getting an ambiguous result all the way to the client because the upsert decomposes into a read-write cycle... somehow? But this should never go back to the client in this case, no? cc @asubiotto |
I think the optimizer plans a check separately from the actual upsert operation. I'm not very familiar with this code, but why do you think this shouldn't go back to the client if the transport is exhausted? Based on the |
@tbg what were you thinking here? |
Oh - looks like I wasn't thinking at all. Sorry about the noise. I guess my thinking was that this particular Unclear what exactly to do with that failure. Do we really need to return an ambiguous result when a breaker is open? We didn't even attempt to send. |
(roachtest).acceptance/gossip/restart failed on master@436a82a518b142c3a3212f49bb595956b57ac68c:
More
Artifacts: /acceptance/gossip/restart
See this test on roachdash |
looked at this in this round of triaging - the latest failure is still with the same symptoms. |
Looking at this again as part of triage. @asubiotto I think there is something for SQL to do here yet - I believe the error comes from cockroach/pkg/sql/colflow/colrpc/outbox.go Lines 151 to 159 in 7345931
I may be wrong- but it seems to come from SQL, it's this one here: cockroach/pkg/rpc/nodedialer/nodedialer.go Lines 142 to 155 in f80ec9c
This error should not become ambiguous since no connection attempt has been made (also, this is in the read portion of the query? Maybe I am wrong about the callsite here). If we see this again, we should grab the logs before they're gone to make sure. |
(roachtest).acceptance/gossip/restart failed on master@e6c1a2abe0bb9008d904aad0f23eeff9ef217430:
More
Artifacts: /acceptance/gossip/restart See this test on roachdash |
- TestFollowerReadsWithStaleDescriptor (cockroachdb#56281) - TestDistSQLRangeCachesIntegrationTest (cockroachdb#56282) - acceptance/gossip/restart (cockroachdb#48423) Release note: None
52688: *: skip a couple of flakey tests r=irfansharif a=irfansharif - TestFollowerReadsWithStaleDescriptor (#52681) - TestDistSQLRangeCachesIntegrationTest (#52682) - acceptance/gossip/restart (#48423) Release note: None --- +cc @andreimatei, @asubiotto who are the current assignees on those tests. Co-authored-by: irfan sharif <[email protected]>
100584: roachtest: fix `gossip/restart` tests r=erikgrinaker a=erikgrinaker **roachtest: add `WaitForReady` helper** This patch adds a `WaitForReady()` helper that will wait until the given nodes report ready via health checks. Epic: none Release note: None **roachtest: fix `gossip/restart` tests** This patch fixes `acceptance/gossip/restart` and `gossip/restart` by waiting for all nodes to report ready before restarting nodes, and unskips them. Resolves #96091. Touches #48423. Epic: none Release note: None Co-authored-by: Erik Grinaker <[email protected]>
(roachtest).acceptance/gossip/restart failed on master@425eaa8fb05fc32b2c42827b85338daa52f4177c:
More
Artifacts: /acceptance/gossip/restart
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
Jira issue: CRDB-4327
The text was updated successfully, but these errors were encountered: