-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: fail schema changes on GC threshold errors #24293
Conversation
This doesn't seem right to me. Many things in the system are classified as |
Thanks for prodding me to do better. I've added a new error type for this and it seems to be ok now. |
Index backfills use a single readAsOf time for their entire job. If the backfill took longer than the GC (which defaults to 25h but could easily be set to a few minutes) then the schema change would retry forever. This happened because it didn't detect that error as a fatal schema change error, even though it would happen every time after the first occurrence. This would render the table schema unchangeable, and probably undroppable as well, and the cluster would forever be retrying the schema change. Add a specific type for this error so it can be detected. Release note (bug fix): prevent index backfills from failing in a loop after exceeding the GC TTL of their source table.
ping @jordanlewis |
This with regards to the schema change logic, but I can't say with certainty whether or not the condition for failure is accurate (the code in replica.go). Can someone else vet that part? Reviewed 7 of 7 files at r1. Comments from Reviewable |
@nvanbenschoten can you review the replica.go change? |
I'm happy to see that error become structured. Review status: all files reviewed at latest revision, all discussions resolved, all commit checks successful. Comments from Reviewable |
Index backfills use a single readAsOf time for their entire job. If
the backfill took longer than the GC (which defaults to 25h but could
easily be set to a few minutes) then the schema change would retry
forever. This happened because it didn't detect that error as a fatal
schema change error, even though it would happen every time after the
first occurrence. This would render the table schema unchangeable,
and probably undroppable as well, and the cluster would forever be
retrying the schema change.
Add a specific type for this error so it can be detected.
Release note (bug fix): prevent index backfills from failing in a
loop after exceeding the GC TTL of their source table.