-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: TRUNCATE and ongoing index builds are broken #62842
Comments
@fqazi any chance I can enlist you on this? I think the fix is small and the test should be straightforward. This is higher priority than anything else you're currently working on and I believe you have sufficient context. |
Sure I'll take it. |
I took a stab at this #62944. |
Alright, after some thinking about this implementation, there's still a lot of scary edge cases. For now the thinking is that we're going to disallow this hazardous behavior. It's modestly unfortunate as there are some cases which would have allowed you to quickly finish long-running schema changes that take a long time to cancel but getting this right in all of the cases seems too hard. Instead we'll allowlist the cases we can convince ourselves are safe and disallow the rest. |
…ma changes Prior to this change, the truncate implementation since 20.2 was totally broken. The code creates new indexes for all existing indexes. It also makes all mutations public. The problem with this is that it makes the mutations public *after* deciding which indexes to clone. Because of this it just makes currently being backfilled or validated indexes public. This leads to inconsistency between the old secondary index and the newly truncated indexes. The code is also problematic in that it does not do anything about the mutation jobs. The code assumes that these jobs will happily continue and detect that they correspond to no mutations and stop doing things as well as clean themselves up. Unfortunately, it seems that these changes can lead to certain failures. For example, new index validation may now fail. Reverting that index build will now also fail because the index is public. This will leave the descriptor in a state where it has an orphaned mutation job. We fix this by only permitting TRUNCATE to run concurrently with mutations we deem safe and then to fix the behavior for those cases. Fixes cockroachdb#62842. Release note (bug fix): Fixed bugs where TRUNCATE concurrent with index construction and other schema changes could result in corruption.
62944: sql: fix or disallow TRUNCATE and concurrent schema changes r=jordanlewis a=ajwerner Prior to this change, the truncate implementation since 20.2 was totally broken. The code creates new indexes for all existing indexes. It also makes all mutations public. The problem with this is that it makes the mutations public *after* deciding which indexes to clone. Because of this it just makes currently being backfilled or validated indexes public. This leads to inconsistency between the old secondary index and the newly truncated indexes. The code is also problematic in that it does not do anything about the mutation jobs. The code assumes that these jobs will happily continue and detect that they correspond to no mutations and stop doing things as well as clean themselves up. Unfortunately, it seems that these changes can lead to certain failures. For example, new index validation may now fail. Reverting that index build will now also fail because the index is public. This will leave the descriptor in a state where it has an orphaned mutation job. We fix this by only permitting TRUNCATE to run concurrently with mutations we deem safe and then to fix the behavior for those cases. Fixes #62842. Release note (bug fix): Fixed bugs where TRUNCATE concurrent with index construction and other schema changes could result in corruption. 63038: ccl/bulk-io: clean up the processors a bit r=yuzefovich a=yuzefovich This commit removes some redundant implementations of `ConsumerClosed` method that simply call `InternalClose` - the reason is that it is the default implementation provided by the `ProcessorBase`. Release note: None Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
…ma changes Prior to this change, the truncate implementation since 20.2 was totally broken. The code creates new indexes for all existing indexes. It also makes all mutations public. The problem with this is that it makes the mutations public *after* deciding which indexes to clone. Because of this it just makes currently being backfilled or validated indexes public. This leads to inconsistency between the old secondary index and the newly truncated indexes. The code is also problematic in that it does not do anything about the mutation jobs. The code assumes that these jobs will happily continue and detect that they correspond to no mutations and stop doing things as well as clean themselves up. Unfortunately, it seems that these changes can lead to certain failures. For example, new index validation may now fail. Reverting that index build will now also fail because the index is public. This will leave the descriptor in a state where it has an orphaned mutation job. We fix this by only permitting TRUNCATE to run concurrently with mutations we deem safe and then to fix the behavior for those cases. Fixes cockroachdb#62842. Release note (bug fix): Fixed bugs where TRUNCATE concurrent with index construction and other schema changes could result in corruption.
…ma changes Prior to this change, the truncate implementation since 20.2 was totally broken. The code creates new indexes for all existing indexes. It also makes all mutations public. The problem with this is that it makes the mutations public *after* deciding which indexes to clone. Because of this it just makes currently being backfilled or validated indexes public. This leads to inconsistency between the old secondary index and the newly truncated indexes. The code is also problematic in that it does not do anything about the mutation jobs. The code assumes that these jobs will happily continue and detect that they correspond to no mutations and stop doing things as well as clean themselves up. Unfortunately, it seems that these changes can lead to certain failures. For example, new index validation may now fail. Reverting that index build will now also fail because the index is public. This will leave the descriptor in a state where it has an orphaned mutation job. We fix this by only permitting TRUNCATE to run concurrently with mutations we deem safe and then to fix the behavior for those cases. Fixes cockroachdb#62842. Release note (bug fix): Fixed bugs where TRUNCATE concurrent with index construction and other schema changes could result in corruption.
You can detect this situation in a cluster by running: WITH descs AS (
SELECT d
FROM (
SELECT crdb_internal.pb_to_json(
'cockroach.sql.sqlbase.Descriptor',
descriptor,
false
)->'table' AS d
FROM system.descriptor
)
WHERE d IS NOT NULL
),
bad_descs AS (
SELECT d, idx
FROM (
SELECT d,
d->'primaryIndex' AS pi,
json_array_elements(d->'indexes') AS idx
FROM descs
)
WHERE (pi->>'id')::INT8 > (idx->>'id')::INT8
)
SELECT database_name,
schema_name,
name AS table_name,
table_id,
idx->>'name' AS index_name,
(idx->>'id')::INT8 AS index_id
FROM crdb_internal.tables JOIN bad_descs ON (table_id = (d->>'id')::INT8); |
Describe the problem
In 20.2 we changed the implementation of TRUNCATE to create all new indexes instead of creating a new table (#52235). This was wonderful in that it made table IDs stable. Unfortunately, it appears that that PR had some severe logic bugs in the face of concurrent index builds.
The logic has two bugs around
cockroach/pkg/sql/truncate.go
Lines 221 to 227 in d145e9f
The first is that it should make the in-progress indexes public prior to calculating the re-mapping. Secondly, it needs to deal with the ongoing jobs.
To Reproduce
Create an index backfill which takes a long time and then truncate the table. The index will become public but this index will contain data from prior to the truncate. The job will also remain in the queue which will prevent subsequent schema changes.
Expected behavior
The index is truncated and made public.
Additional context
This affects all current 20.2 versions and 21.1.
The text was updated successfully, but these errors were encountered: