You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
88492: Roachtest redirect SSH flakes to test-eng r=tbg a=smg260
*See second commit note at the bottom*
This PR inspects the failure output of a roachtest, and if it sees an SSH_PROBLEM, overrides the owning team to test-eng when reporting the github issue.
Currently errors are classified as an `SSH` error by roachprod if the exit code is `255` with an accompanying message prefixed with `SSH_PROBLEM` [[1]](https://github.com/cockroachdb/cockroach/blob/ad3bd1355463cefdc07e995765fa82adfe391d05/pkg/roachprod/errors/errors.go#L112). The errors are stringified and saved into `t.mu.output|failureMsg`. Thus in the test_runner at the call site of issue posting, we can check `t.mu.output` for `SSH_PROBLEM` and override the team and issue name accordingly.
Resolves: #82398
Release justification: test-only change
Release note: none
89913: changefeedccl: job-level retry when error message is about draining r=[miretskiy] a=HonoreDB
See #https://github.com/cockroachlabs/support/issues/1839. The flow retryable error marker doesn't survive every path by which it can bubble up, so just look for the single word "draining" as false positives are much better than false negatives.
Fixes#89663
Release note (enterprise change): Fixed a bug that could cause changefeeds to fail during a rolling restart.
Co-authored-by: Miral Gadani <[email protected]>
Co-authored-by: Aaron Zinger <[email protected]>
Our telemetry changfeeds failed with the message:
replica unavailable: (n1,s1):3 unable to serve request to r170780:/Table/104/2/"\t\x{a5\xbc\xb6g\xeeKI\xa5F\x8eb\xfc9}\xab"/1/4/1918-04-19T12:43:26.240716999Z/"sql.misc.started.count"-b0Q\xbcG\x17Ng\xb0\xc8\x16 \xa8\xa7\xca`"/1/3/1918-09-14T16:35:54.879832999Z/"sql.plan.ops.cast.int::string"} [(n14,s14):4, (n3,s3):2, (n1,s1):3, next=5, gen=197]: lost quorum (down: (n14,s14):4,(n3,s3):2); closed timestamp: 1665220705.166109291,0 (2022-10-08 09:18:25); raft status: {"id":"3","term":160,"vote":"2","commit":6282066,"lead":"0","raftState":"StatePreCandidate","applied":6282066,"progress":{},"leadtransferee":"0"}: have been waiting 60.20s for slow proposal RequestLease [/Table/104/2/"\t\xa5\xbc\xb6g\xeeKI\xa5F\x8eb\xfc9}\xab"/1/4/1918-04-19T12:43:26.240716999Z/"sql.misc.started.count",/Min) | 9
We should investigate and address these failures.
See ticket for more info: https://cockroachdb.zendesk.com/agent/tickets/14297
Jira issue: CRDB-20362
Epic CRDB-11732
The text was updated successfully, but these errors were encountered: