sql: column backfills should build a new index instead of mutating the existing one #47989

thoszhang · 2020-04-23T19:59:27Z

Currently, when a backfill is needed for a column being added or dropped, the column backfiller rewrites the existing primary index in-place to add or remove the column value. This has some disadvantages compared to the implementation of index backfills (for non-interleaved indexes):

The index backfiller is much faster because it batches KVs and writes SSTs directly.
Column backfills are more difficult to clean up. A index being built has no effect on other indexes and can be GC'ed using ClearRanges (which is what the index GC job currently does). A column backfill can only be cleaned up by running the "reverse" column backfill.
- This poses problems for restoring backups taken in the middle of a column backfill (bulkio, sql: mutations with job IDs are persisted in backups and cause problems after restoring #44019). Currently the best solution is to create a new job to resume the column schema schema change, but it would be good to have the option of being able to simply ignore the column.
- We have bugs that prevent column backfills from being completely rolled back (sql: some schema changes to drop columns cannot be rolled back and will block other schema changes #47712). In the worst case, it would be good to have the option of ignoring the partially backfilled column and clearing the mutation. Currently, the presence of column values with no accompanying column mutation on the table descriptor might make the KVs unreadable.
Dropping a column requires deleting values. If we have to roll back dropping a column and want to recover the previous values (which we currently don't do; see sql, backfill: rolling back a schema change that drops a column leads to restored column with missing data #46541), the only way to get them back is via MVCC.

The proposal is to change the implementation of column backfills to build a new (primary) index every time, instead of mutating the existing one. There are 2 proposed steps in this change (which are somewhat independent):

Switch the column backfill implementation so that it works like today's index backfiller. Adding or dropping any column would be like a primary key change: we go through the states of adding a new index until we're writing to both indexes, then do an index backfill, then swap the primary index in the final table descriptor update of the schema change.
There are performance concerns with just doing the first step. A proposed optimization: Reorder the steps in adding an index so that the backfill happens before any live traffic to the new index is turned on, then start writing to the new index and use a rangefeed to catch up on writes in the gap between the backfill and when all nodes are writing to the new index. See sql: large latency spikes when creating a large storing index while running tpcc #45888 and schema changes: pre-backfill without online traffic then incremental backfill after #36850 for details. We'd want to do some speed-of-light experiments before committing to this approach.

Rolling back a column schema change, with this proposed approach, would essentially mean switching back to the pre-existing index. If we rolled back dropping a column, we still have all the column data as of the timestamp when the schema change was started.

This approach would also allow us to do ALTER TYPE in a single index backfill operation (see #46893).

Open questions:

What do we do for interleaved tables? Very preliminary thoughts:
- For interleaved tables with no interleaved children, we should just be able to write the new primary index alongside the old one. I'm not sure what the implications are for cleaning up an index if we don't want to/can't roll it back normally. Presumably we can't just leave abandoned index KVs interleaved in some other table with no indication of this on any of the table descriptors.
- For interleaved parents, can we do this without rewriting all the child interleaved indexes as well?

Jira issue: CRDB-4366

The text was updated successfully, but these errors were encountered:

petermattis · 2020-04-23T20:25:49Z

Big 👍 .

Rolling back a column schema change, with this proposed approach, would essentially mean switching back to the pre-existing index. If we rolled back dropping a column, we still have all the column data as of the timestamp when the schema change was started.

Out of curiosity, when do we have to roll back dropping a column?

thoszhang · 2020-04-23T20:40:23Z

Out of curiosity, when do we have to roll back dropping a column?

The schema change can be cancelled, or another schema change started in the same transaction can fail (e.g., a unique index can't be built), in which case we roll everything back.

ajwerner · 2020-04-23T21:06:47Z

This would also fix #35738.

petermattis · 2020-04-23T23:50:58Z

Out of curiosity, when do we have to roll back dropping a column?

The schema change can be cancelled, or another schema change started in the same transaction can fail (e.g., a unique index can't be built), in which case we roll everything back.

Got it. #46541 seems not good. Trying to roll back a partially dropped column seems fraught. I wonder if we can arrange to perform the drop column schema changes last. And then not allow them to be rolled back.

ajwerner · 2020-04-24T01:54:05Z

I wonder if we can arrange to perform the drop column schema changes last. And then not allow them to be rolled back.

We've talked about something like this in relatively vague terms. It seems like so long as we leave the dropped column public until all other mutations which can fail complete, and then make sure we remove all constraints before dropping a column, then make the drop column mutation something which cannot be canceled then we'll be out of this pickle.

Today we assume that the transaction which performs the drop sees the column in WRITE_ONLY for subsequent statements. My thinking is we should have that be true for that transaction but not actually commit the table descriptor in write only. This also gives us an opportunity to drop the constraints prior to making the dropped column non-public. That too is hazardous to roll back. My sense is as soon as we start dropping properties we need to make it impossible to roll back. In short:

Commit Time
- Create new columns and queue mutations
- Add new columns in delete-only
Add columns, indexes, and constraint
- Still can cancel or fail
- Add new columns in write-only, backfill new indexes, attempt to add constraints
Drop constraints
- Cannot cancel or fail
- Drop constraints which have been dropped
- Drop constraints for columns which are being dropped
Drop columns
- cannot cancel or fail

thoszhang · 2020-04-24T03:21:12Z

I agree with all this. One thing to clarify: the new index that gets built in step (2) will exclude the columns being dropped, right? So if we have to roll back after validation at the end of step 2, we keep using the original index as though nothing happened. Otherwise, step (4) (which involves multiple table descriptor versions) consists of stepping through the states for the dropped column (but with no backfill needed), and the very last step is swapping to the new index.

petermattis · 2020-04-24T11:29:07Z

This all sounds fantastic.

Will we collapse multiple related schema changes into a single step? If we add two columns to a table in a transaction, would we write a single new primary key index, or two. This relates to Lucy's point about dropping a column at the same time we're rewriting the primary index to add a column.

thoszhang · 2020-04-24T20:13:22Z

Will we collapse multiple related schema changes into a single step? If we add two columns to a table in a transaction, would we write a single new primary key index, or two.

I think we'll always be able to execute multiple schema changes started in the same transaction with just one rewrite of the primary index. That rewrite can encompass all column add/drop operations as well as changes to the primary key.

thoszhang · 2020-04-29T16:33:45Z

@ajwerner @rohany @RichardJCai and I discussed this in person.

First a few notes, then the main problem:

This proposal can be thought of as a generalization of primary key changes: given an existing primary index, the index backfiller will be able to produce a new primary index with a specified list of columns and a specified primary key, in addition to being able to produce new secondary indexes (with specified lists of indexed columns and stored columns).
- Idea for later: In theory, the source for the index backfiller doesn't always need to be the primary index. For instance, when rewriting a secondary index after changing a column's type, we could scan the original secondary index to build a copy with just that column changed. This would allow sorted ingestion for the index backfiller in some cases, which is ideal for producing non-overlapping SSTs, and it would also make primary and secondary indexes more symmetric.
We could implement this without changing how column mutations are created and queued. The schema changer would generate a set of new indexes to build based on the set of column mutations (and PK change mutations, etc.). This is a step toward making mutations more of a "logical" representation of pending schema changes as opposed to a place to store execution-related state.

The main problem with this proposal ("the generalized index backfiller") is with interleaved tables: If an index has interleaved children, it's not possible to write an updated index with a new index ID without also rewriting all its interleaved descendants. Depending on whether we decide this would be prohibitively expensive, we have two options:

Never rewrite the interleaved descendants of a table. We only implement the generalized index backfiller for non-interleaved tables and interleaved tables without children, and fall back to the existing column backfiller for interleave parents.
Implement the generalized index backfiller for all tables, which rewrites descendant tables when necessary.

(NB: The interleaved ancestors of the table would not need to be rewritten. The index backfiller can already backfill secondary indexes interleaved into another table.)

Issues with option 1:

The benefits listed above no longer apply for interleaved parent tables. We also introduce complexity by having a separate implementation just for interleaved tables (which, anecdotally, are currently under-tested relative to their complexity).
We lose the benefit of getting rid of the existing column backfiller entirely and having a single, unified implementation. @dt points out that the column and index backfillers are so different that having both under the same "backfiller" interface makes it impossible to optimize either.
For our current async non-transactional schema changes, the reordering of backfill steps to prohibit rolling back a column drop described above will now require 2 column backfills instead of 1 when there are both added and dropped columns in the same transaction. For the same reason, ALTER COLUMN TYPE will require 2 column backfills (as in the original RFC draft).
Having an index backfiller that doesn't touch the existing index will likely make transactional schema changes much easier to implement in the future, since we won't have to worry about other transactions seeing backfilled values for a non-public column in the original primary index, and we won't have to do a second backfill for rollbacks.

Issues with option 2:

Rewriting all the descendant tables is obviously expensive. If a child table is much larger than its parent table, it'll violate user expectations when the cost of doing a column schema change on the parent table is proportional to the size of the child table.
Writes to a child table will miss out on the 1PC optimization for the duration of a backfill initiated by an ancestor table, since writes will go to both the old and new indexes. I think this is no longer an issue if we end up implementing schema changes: pre-backfill without online traffic then incremental backfill after #36850.
A backfill that rewrites descendant tables requires coordinating a schema change across multiple tables, so that the table descriptor versions move in lockstep with each other. We'll have to update the schema change job to publish updates to multiple tables at once.

There are a few special performance concerns when backfilling an interleaved index compared to a non-interleaved index (applies to both options, but more so to (2)). We'll always be writing SSTs that overlap with existing keys in the span even if the new keys themselves are sorted, and there's some impact on traffic on the parent tables due to latch acquisition (not sure how significant this is). All this is already true for interleaved secondary indexes and interleaved ALTER PRIMARY KEY.

I think the next step is to figure out whether we can tolerate rewriting all the descendant tables for option (2).

…ller This commit was mostly motivated by work on the new schema changer to enable the behavior described in cockroachdb#47989, however, it also turns out to be a prerequisite of the work to use virtual computed columns in secondary indexes. Given we haven't released virtual computed columns, I'm going to omit a release not for this PR. Release note: None

ajwerner · 2021-01-15T03:46:21Z

@thoszhang and I have been working on realizing this vision and have run into a couple of unanticipated things we needed to fix.

The need to support computed expressions in index backfills
- This one, in retrospect, was somewhat obvious.
- sentry: db.go:550: log.Fatal: write ×: no space left on device -- *errutil.leafError: log.Fatal: write ×: no space left on device (1) *secondary.withSecondaryError db.go:550: *withstack.withStack (top exception) (check the extra data payloads) #58699 deals with this need. Testing is a bit annoying but it's not too bad.

The need to ensure that concurrent transactions do not write to the old primary index.

The primary index semantics are bespoke. Secondary index descriptors carry the columns they store as values. Primary index descriptors (while on the primary index), do not carry any columns in their StoreColumns slice.

Fortunately, it seems like the place where logic would need to change is pretty well isolated to

cockroach/pkg/sql/row/helper.go

Lines 133 to 138 in 3e2a4e7

    
           // skipColumnInPK returns true if the value at column colID does not need 
        
           // to be encoded because it is already part of the primary key. Composite 
        
           // datums are considered too, so a composite datum in a PK will return false. 
        
           // TODO(dan): This logic is common and being moved into TableDescriptor (see 
        
           // #6233). Once it is, use the shared one. 
        
           func (rh *rowHelper) skipColumnInPK(colID descpb.ColumnID, value tree.Datum) (bool, error) {

Fixing this will require finding some way to mark that the columns being added should not be included in the existing primary index.
- There's a bunch of options here but the straw man I'll propose here is that we mark the column descriptor in some way to note that it's been added using the index swap protocol. A boolean will do.

…ller This commit was mostly motivated by work on the new schema changer to enable the behavior described in cockroachdb#47989, however, it also turns out to be a prerequisite of the work to use virtual computed columns in secondary indexes. Given we haven't released virtual computed columns, I'm going to omit a release not for this PR. The basic idea is that we need to track dependencies for computed columns to make sure they are retrieved. Default expressions need to be evaluated first. Much of the code is testing. Release note: None

ajwerner · 2021-01-19T18:50:12Z

Item 2. above has been broken out into #59149.

…ller This commit was mostly motivated by work on the new schema changer to enable the behavior described in cockroachdb#47989, however, it also turns out to be a prerequisite of the work to use virtual computed columns in secondary indexes. Given we haven't released virtual computed columns, I'm going to omit a release not for this PR. The basic idea is that we need to track dependencies for computed columns to make sure they are retrieved. Default expressions need to be evaluated first. Much of the code is testing. Release note: None

58990: sql: deal with computed and default column generation in index backfi… r=ajwerner a=ajwerner This commit was mostly motivated by work on the new schema changer to enable the behavior described in #47989, however, it also turns out to be a prerequisite of the work to use virtual computed columns in secondary indexes. Given we haven't released virtual computed columns, I'm going to omit a release not for this PR. The basic idea is that we need to track dependencies for computed columns to make sure they are retrieved. Default expressions need to be evaluated first. Much of the code is testing. Release note: None 59261: sql: add full table or index scan count metric r=barryhe2000 a=barryhe2000 Previously, this metric was not available, but is now added so that users can see when full table scans or full index scans are being used for their queries. Fixes: #58653 Release note (ui change): User can see time series custom chart in admin ui for full table or index scans. 59478: cloudimpl: cache suffix in remote file sst wrapper r=dt a=dt Reading SSTs starts with multiple tiny reads in offsets near the end of the file. If we can read that whole region once and fulfill those reads from a cached buffer, we can avoid repeated RPCs. Release note: none. Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Barry He <[email protected]> Co-authored-by: David Taylor <[email protected]>

58979: schemachanger,*: introduce new schema changer r=lucy-zhang a=ajwerner This PR introduces the new schema changer, for which a partially written RFC (including a "guide-level explanation") can be found in #59288. It contains a somewhat complete implementation for a subset of `ALTER TABLE ... ADD COLUMN`. Like the existing implementation, these schema changes are planned and partially executed in the user transaction, but most of the work happens in an asynchronous job that runs after the transaction has committed. We use the index backfiller to build new columns, as described in #47989. See the RFC and individual commits for details. Missing things that will be in later PRs: - Making execution changes to support concurrent writes for columns added using the index backfiller. We currently don't have a way to write column values which are not part of the primary index. See #59149. (#58990 takes care of the backfill part). - Ensuring compatible coexistence with the old schema changer; in particular, polling for queued mutations to finish before any schema changes can start, using child transactions within the user transaction (#56588). The new schema changer assumes it has exclusive rights to perform schema changes on the table, which anticipates the transactional schema changes of the future. - Reverting schema changes when failed or canceled. - Properly checkpointing and reading job progress for the backfill. Currently there's a placeholder implementation that doesn't interact with the job at all. Co-authored-by: Andrew Werner <[email protected]>

81388: authors: add chinemeremchigbo to authors r=lassenordahl a=ChinemeremChigbo authors: add chinemeremchigbo to authors Release note: None Release justification: non-production code change 81460: acceptance: run `python`, `psql` containers as current uid r=knz a=rickystewart `postgres`'s permission checking for certificates has gotten more rigorous since [this commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a59c79564bdc209a5bc7b02d706f0d7352eb82fa). This has broken a couple `acceptance` tests which do not pin to any specific `postgres` version (see #81313, #81437). Here we attempt to solve the problem "once and for all" by ensuring that these containers run with a UID that is equal to the one that created the certificates. Release note: None 81464: ui: Add tests for the TimeScaleDropdown component r=jocrl a=jocrl This commit splits out tests in `timescale.spec.tsx` and adds additional tests for the TimeScaleDropdown component, including testing that the custom time picker is initialized to the currently selected time. This also adds TimeScaleDropdown stories. Release note: None 81486: sql/backfill: fix bug adding new columns to new index with volatile default r=ajwerner a=ajwerner In #58990 we added support for the index backfiller to evaluate expressions. This unblocked the usage of virtual computed columns in secondary expressions, so wasn't a totally bogus change, but probably was trying to do too much without understanding all of the implications. The original motivation for that change was to unblock the then nascent declarative schema changer prototype which wanted to only implement #47989 as a column change protocol. In order to do that, it needed to evaluate default expressions when adding new columns to a new primary index. For whatever reason, probably lack of understanding, I made it such that all default expressions for columns which were in the adding state were evaluated. This is wrong in cases where the column has already been backfilled into the current primary index; it's wrong because volatile expressions will evaluate to different values causing the newly backfilled secondary index to be bogus and corrupt. This change addresses the failure of the above change by being more thoughtful about whether it's sane to evaluate a default expression when backfilling into an index. Note that it's still insane to backfill a volatile expression for a new column into the key of a new primary index. As of writing, there is no way to do this. Backports will address #81448. Release note (bug fix): In 21.1 a bug was introduced whereby default values were recomputed when populating data in new secondary indexes for columns which were added in the same transaction as the index. This arises, for example in cases like `ALTER TABLE t ADD COLUMN f FLOAT8 UNIQUE DEFAULT (random())`. If the default expression is not volatile, then the recomputation is harmless. If, however, the default expression is volatile, the data in the secondary index will not match the data in the primary index: a corrupt index will have been created. This bug has now been fixed. 81504: acceptance/roachtest: update ORM versions under test r=ecwall a=rafiss See DXTPT-35 Release note: None 81523: sql: fix typo in warning on using DateStyle/IntervalStyle r=rafiss a=otan Release note (sql change): Fixed a small typo when using DateStyle and IntervalStyle. Co-authored-by: Chinemerem <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Josephine Lee <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Oliver Tan <[email protected]>

ajwerner · 2022-10-18T01:33:22Z

I'm inclined to close this. The only reason not to would be that it only happens in certain cases and not in all cases. We'll keep working to make it happen in all cases in implicit transactions for 23.1 and hopefully in all cases in 23.2.

drew-vz · 2023-12-29T21:12:27Z

@ajwerner This specific issue is unrelated but I figured I would ask here. I am looking for a way to backfill for more than just adding or removing a column, but things like altering a table (renaming column). It seems to me that when a column is renamed, we would need all the CDC backfill records for that table with the rows and their new schema, so that they can be reflected properly downstream. Is there any way of accomplishing this that you know of?

thoszhang added the A-schema-changes label Apr 23, 2020

ajwerner mentioned this issue Apr 23, 2020

sql,sqlbase: maintain an index free-list on table descriptor #47981

Open

thoszhang mentioned this issue Apr 28, 2020

sql: speed up DROP COLUMN #35499

Closed

RichardJCai mentioned this issue May 4, 2020

RFCS: ALTER COLUMN TYPE for cast conversions #46893

Merged

jordanlewis assigned RichardJCai May 5, 2020

thoszhang mentioned this issue May 5, 2020

sql: bug when setting a column NOT NULL and then dropping it #47719

Closed

ajwerner mentioned this issue May 5, 2020

sql, backfill: rolling back a schema change that drops a column leads to restored column with missing data #46541

Open

RichardJCai mentioned this issue May 21, 2020

Prototype: Naive column backfill improvement implementation using index backfiller #49146

Closed

jordanlewis unassigned RichardJCai May 26, 2020

ajwerner mentioned this issue Jun 11, 2020

sql: expose MVCC timestamps to SQL #50102

Closed

ajwerner mentioned this issue Jul 2, 2020

changefeedccl: distinguish between schema change backfills and regular updates/inserts #35738

Closed

ajwerner mentioned this issue Jul 27, 2020

sql,kv: column backfills can get stuck in a failed state when ranges are too large and unsplittable #51949

Closed

thoszhang mentioned this issue Sep 18, 2020

sql: implement new state-machine-based schema changer #54584

Closed

ajwerner mentioned this issue Jan 14, 2021

sql: deal with computed and default column generation in index backfi… #58990

Merged

ajwerner mentioned this issue Jan 19, 2021

sql: enable adding columns which are not written to the current primary index #59149

Closed

thoszhang mentioned this issue Jan 27, 2021

schemachanger,*: introduce new schema changer #58979

Merged

ajwerner mentioned this issue Apr 15, 2021

backfill: index backfills should write rows at their current MVCC timestamp #63741

Closed

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

ajwerner mentioned this issue Apr 19, 2022

sql: require CASCADE to drop physical shard column of hash index #80181

Closed

This was referenced May 19, 2022

sql/backfill: fix bug adding new columns to new index with volatile default #81486

Merged

sql/backfill: prevent index backfill from evaluating new volatile key columns #81521

Open

blathers-crl bot mentioned this issue May 19, 2022

release-22.1: sql/backfill: fix bug adding new columns to new index with volatile default #81549

Merged

ajwerner mentioned this issue Jun 22, 2022

release-21.2: sql/backfill: fix bug adding new columns to new index with volatile default #83223

Merged

ajwerner closed this as completed Oct 18, 2022

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: column backfills should build a new index instead of mutating the existing one #47989

sql: column backfills should build a new index instead of mutating the existing one #47989

thoszhang commented Apr 23, 2020 •

edited by cockroach-jira-scripts

Loading

petermattis commented Apr 23, 2020

thoszhang commented Apr 23, 2020

ajwerner commented Apr 23, 2020

petermattis commented Apr 23, 2020

ajwerner commented Apr 24, 2020

thoszhang commented Apr 24, 2020

petermattis commented Apr 24, 2020

thoszhang commented Apr 24, 2020

thoszhang commented Apr 29, 2020

ajwerner commented Jan 15, 2021 •

edited

Loading

ajwerner commented Jan 19, 2021

ajwerner commented Oct 18, 2022

drew-vz commented Dec 29, 2023

sql: column backfills should build a new index instead of mutating the existing one #47989

sql: column backfills should build a new index instead of mutating the existing one #47989

Comments

thoszhang commented Apr 23, 2020 • edited by cockroach-jira-scripts Loading

petermattis commented Apr 23, 2020

thoszhang commented Apr 23, 2020

ajwerner commented Apr 23, 2020

petermattis commented Apr 23, 2020

ajwerner commented Apr 24, 2020

thoszhang commented Apr 24, 2020

petermattis commented Apr 24, 2020

thoszhang commented Apr 24, 2020

thoszhang commented Apr 29, 2020

ajwerner commented Jan 15, 2021 • edited Loading

ajwerner commented Jan 19, 2021

ajwerner commented Oct 18, 2022

drew-vz commented Dec 29, 2023

thoszhang commented Apr 23, 2020 •

edited by cockroach-jira-scripts

Loading

ajwerner commented Jan 15, 2021 •

edited

Loading