Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fail schema changes on GC threshold errors #24293

Merged
merged 1 commit into from
Apr 2, 2018
Merged

sql: fail schema changes on GC threshold errors #24293

merged 1 commit into from
Apr 2, 2018

Conversation

maddyblue
Copy link
Contributor

@maddyblue maddyblue commented Mar 28, 2018

Index backfills use a single readAsOf time for their entire job. If
the backfill took longer than the GC (which defaults to 25h but could
easily be set to a few minutes) then the schema change would retry
forever. This happened because it didn't detect that error as a fatal
schema change error, even though it would happen every time after the
first occurrence. This would render the table schema unchangeable,
and probably undroppable as well, and the cluster would forever be
retrying the schema change.

Add a specific type for this error so it can be detected.

Release note (bug fix): prevent index backfills from failing in a
loop after exceeding the GC TTL of their source table.

@maddyblue maddyblue requested review from jordanlewis, vivekmenezes and a team March 28, 2018 18:07
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member

This doesn't seem right to me. Many things in the system are classified as internalError, right? I can't figure out by looking at the code what is castable to an internalError and what is not... we'd have to see a list before understanding whether this change is correct or not.

@maddyblue maddyblue changed the title sql: fail schema changes on internal errors sql: fail schema changes on GC threshold errors Mar 29, 2018
@maddyblue
Copy link
Contributor Author

Thanks for prodding me to do better. I've added a new error type for this and it seems to be ok now.

Index backfills use a single readAsOf time for their entire job. If
the backfill took longer than the GC (which defaults to 25h but could
easily be set to a few minutes) then the schema change would retry
forever. This happened because it didn't detect that error as a fatal
schema change error, even though it would happen every time after the
first occurrence. This would render the table schema unchangeable,
and probably undroppable as well, and the cluster would forever be
retrying the schema change.

Add a specific type for this error so it can be detected.

Release note (bug fix): prevent index backfills from failing in a
loop after exceeding the GC TTL of their source table.
@maddyblue
Copy link
Contributor Author

ping @jordanlewis

@jordanlewis
Copy link
Member

This :lgtm: with regards to the schema change logic, but I can't say with certainty whether or not the condition for failure is accurate (the code in replica.go). Can someone else vet that part?


Reviewed 7 of 7 files at r1.
Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@maddyblue maddyblue requested a review from nvanbenschoten April 2, 2018 17:55
@maddyblue
Copy link
Contributor Author

@nvanbenschoten can you review the replica.go change?

@nvanbenschoten
Copy link
Member

:lgtm: I'm happy to see that error become structured.


Review status: :shipit: all files reviewed at latest revision, all discussions resolved, all commit checks successful.


Comments from Reviewable

@maddyblue maddyblue merged commit 9ba6a6b into cockroachdb:master Apr 2, 2018
@maddyblue maddyblue deleted the schema-gc branch April 2, 2018 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants