-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logictest: schema_change_in_txn failed with TransactionRetryError #104464
Comments
Error from the logs:
|
Table 3 (the one with the retry error) is the descriptor table. |
### Motivation Hopefully fixes: https://github.com/MaterializeInc/materialize/issues/19931 There seems to be an error in the latest version of CockroachDB, see similar issue: cockroachdb/cockroach#104464. ### Checklist - [ ] This PR has adequate test coverage / QA involvement has been duly considered. - [ ] This PR has an associated up-to-date [design doc](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/README.md), is a design doc ([template](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/design/00000000_template.md)), or is sufficiently small to not require a design. <!-- Reference the design in the description. --> - [ ] If this PR evolves [an existing `$T ⇔ Proto$T` mapping](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/command-and-response-binary-encoding.md) (possibly in a backwards-incompatible way), then it is tagged with a `T-proto` label. - [ ] If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label ([example](MaterializeInc/cloud#5021)). <!-- Ask in #team-cloud on Slack if you need help preparing the cloud PR. --> - [x] This PR includes the following [user-facing behavior changes](https://github.com/MaterializeInc/materialize/blob/main/doc/developer/guide-changes.md#what-changes-require-a-release-note): - N/a
Hey folks! We recently upgraded Cockroach in our CI to v23.1.1 and since then have seen sporadic failures similar to the one here. For example:
any thoughts on what might be causing this? Edit: It seems like this issue might be related to #104728 ? |
Hi @ParkMyCar, this is more of an issue with the test than anything else, but you can refer to our docs to diagnose the retry error you're seeing: https://www.cockroachlabs.com/docs/stable/transaction-retry-error-reference.html#retry_serializable |
Hey @DrewKimball! I’m the one who pointed @ParkMyCar at this issue based on the similarity in the implicated keys. I haven’t actually dug into the Git history for this test, but looking at its name it’s for the legacy schema changer, and presumably is therefore quite old? What has my spidey sense tingling is that this test is doing a schema change and then a read in the same transaction, which is exactly what we do in Materialize CI when we initialize a new stash. And the conflict in both is on a key in the descriptor table. Possible we’re barking up the wrong tree here, but I’m really wondering if there was a regression in v23.1 that’s leading to both the failure here and in our CI. Does #104728 look like it’s related to this test failure? Would it be better to continue discussion on that issue? |
Oh I see, I didn't notice it was on the same table, sorry about that. I think it makes sense to keep the discussion here. #104728 could be related, but there are multiple possible reasons why a |
@ParkMyCar I looked into this a bit further - in this test failure the conflicting write was to the jobs table, whereas in the error you provided it was to the descriptor table, which indicates probable contending schema changes. Creating, altering, or dropping a schema in |
Thanks for investigating further @DrewKimball! For each test we do create a new random schema, e.g. But after creating the new schema, we do start a transaction, create tables, and then alter the We're not changing privileges anywhere, we do set |
Are your tests run concurrently? Also, do you know where you're seeing these retry errors - schema creation or table creation? If the former and tests are run concurrently, it sounds likely that the schema setups for each test are contending with one another. If the latter, we'll have to keep digging because it's unexpected that creating/altering tables would touch the defaultdb descriptor. |
As far as deflaking the CRDB logictest, we can use the |
@benesch @ParkMyCar To make it easier for us to help with your issue, would you mind opening a new Github issue with steps on how to reproduce the problem? |
Hey @rafiss! I'm working on generating a minimum repro, my goal is to have something by tomorrow. At that point I'll open up a new issue and tag it here. P.S. Your GitHub picture is great! |
Previously, we have `skip_on_retry` directive for logic test which, when set, it skips the rest of test if a statement fails with TransactionRetryError. However, it won't skip if the statement is expected to fail with certain error message. This PR ensures that whenever we have a TransactionRetryError and `skip_on_retry` is set, we always skip the rest of the test, even if the stmt is expected to fail. Informs cockroachdb#104464 Release note: None
Previously, we have `skip_on_retry` directive for logic test which, when set, it skips the rest of test if a statement fails with TransactionRetryError. However, it won't skip if the statement is expected to fail with certain error message. This PR ensures that whenever we have a TransactionRetryError and `skip_on_retry` is set, we always skip the rest of the test, even if the stmt is expected to fail. Informs #104464 Release note: None
Previously, we have `skip_on_retry` directive for logic test which, when set, it skips the rest of test if a statement fails with TransactionRetryError. However, it won't skip if the statement is expected to fail with certain error message. This PR ensures that whenever we have a TransactionRetryError and `skip_on_retry` is set, we always skip the rest of the test, even if the stmt is expected to fail. Informs cockroachdb#104464 Release note: None
pkg/sql/logictest/tests/local-legacy-schema-changer/local-legacy-schema-changer_test.TestLogic_schema_change_in_txn failed with artifacts on release-23.1 @ 1a5dc2aad16ca4185afcda75d923804324dcfeb9:
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-28541
The text was updated successfully, but these errors were encountered: