-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: import/mixed-versions failed #87837
Comments
Node 1 has an AdminSplit that has been taking its time:
|
I'm going to tag KV in giving the above symptom while continuing to look for what AdminSplit is waiting on. Unfortunately, the 7/8 imports succeeded, one of the jobs was stuck on an admin split for 10+hours. |
For hours on node 3, I see what looks like 1 split a second on what may be the same key:
|
Related to what Steven mentions above, node 3 is on the older binary version so 22.1 in this case. This is an interesting stack though it has not been spinning for long:
It corresponds to this retry loop https://github.com/cockroachdb/cockroach/blob/v22.1.7/pkg/kv/kvserver/replica_command.go#L536 |
Is it possible we are just constantly racing something else taking the descriptor lease? Seems unlikely that it would keep happening as we update are lease just before attempting the split. I wonder if our request is such that we are now always going to get a ConditionFailed back for some reason... |
@stevendanna mind taking a look at ^. Sounds reasonable to have that generally even after this issue is resolved. |
Will we need to backport that for it to work in this test? It sounds like the retry loop was failing to terminate on a v22.1 node. |
I think we should backport to both 22.1 and 22.2, if that patch looks reasonable to you. |
Informs cockroachdb#87837 Release note: None
88288: kvserver: log retriable errors received during splits r=aayushshah15 a=aayushshah15 Informs #87837 Release note: None Co-authored-by: Aayush Shah <[email protected]>
Informs #87837 Release note: None
Informs #87837 Release note: None
88752: kvserver: log details about `ConditionFailedError` encountered by splits r=aayushshah15 a=aayushshah15 Informs #87837 Release justification: logging only change Release note: None 88790: logictest: remove a duplicate query r=yuzefovich a=yuzefovich This commit removes a single test query since it is an exact duplicate of another one about 100 lines up in the file. There is also no change in the session variables between the two spots. Release note: None 88814: changefeedccl: Fix array encoding avro bug. r=miretskiy a=miretskiy Fix latent array avro encoding bug where previously allocated memo array might contain 'nil' element, while the code assumed that the element must always be a map. Release note: none. Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
…lits Informs #87837 Release justification: logging only change Release note: None
…lits Informs #87837 Release justification: logging only change Release note: None
The fact that we reached these |
I've tried stressing this a lot more over the last week (~700 runs) and have not been able to repro aside from the one time (before #88835 had landed). That one instance of the stall had the split queue spamming the log message from here on a 22.1 node:
Given that the AdminSplit stall, when it occurs, occurs on the 22.1 node, I'm inclined to suggest that we remove the ga-blocker label from this and let this issue stay open until we see it again in the wild. |
We have marked this test failure issue as stale because it has been |
roachtest.import/mixed-versions failed with artifacts on master @ 773568fbda06ba9be9fb1bc34a331f21c8891ffa:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-19560
The text was updated successfully, but these errors were encountered: