-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: add another error to jepsen allowlist #50333
Conversation
Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/cmd/roachtest/jepsen.go, line 253 at r1 (raw file):
// lock=true stat=PENDING rts=1592384065.123218152 ,0 wto=false // max=1592384065.128230720,0 `-e "TransactionRetryWithProtoRefreshError"`,
mmm for this one it might be worth asking ben if it's easy to put a retry to whatever schema change causes it
cc @bdarnell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/cmd/roachtest/jepsen.go, line 253 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
mmm for this one it might be worth asking ben if it's easy to put a retry to whatever schema change causes it
It's really important that simple schema changes like the ones jepsen does not be subject to random retry errors like this. We've put a lot of work into this in the past. Have we had a regression in this area? (ABORT_REASON_NEW_LEASE_PREVENTS_TXN
is new to me)
I was complaining about some apparent regression in the area in #37751 (comment) The underlying reasons for |
Yeah. Specifically, I'm pretty sure jepsen only does its schema changes in implicit transactions (or maybe as the first and only statement in an explicit transaction) so it should be retried on the server side. Assuming this was discovered in #49360, multiregister's DDL statements are trivial |
Yes, it was discovered there. Not frequently - a few times over >400 runs. |
Do we even know that this was a schema change? Could the error have come from a regular txn operation in jepsen or would they not surface to that level? |
During normal operation jepsen catches nearly all errors. Most of the time when an error bubbles up to this level, it's from a schema change. But the jepsen logs should tell us exactly where it's coming from. Do you have a pointer to a log showing this problem? |
Release note: None