Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: column backfills should build a new index instead of mutating the existing one #47989

Closed
thoszhang opened this issue Apr 23, 2020 · 13 comments
Labels
A-schema-changes T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@thoszhang
Copy link
Contributor

thoszhang commented Apr 23, 2020

Currently, when a backfill is needed for a column being added or dropped, the column backfiller rewrites the existing primary index in-place to add or remove the column value. This has some disadvantages compared to the implementation of index backfills (for non-interleaved indexes):

The proposal is to change the implementation of column backfills to build a new (primary) index every time, instead of mutating the existing one. There are 2 proposed steps in this change (which are somewhat independent):

  1. Switch the column backfill implementation so that it works like today's index backfiller. Adding or dropping any column would be like a primary key change: we go through the states of adding a new index until we're writing to both indexes, then do an index backfill, then swap the primary index in the final table descriptor update of the schema change.
  2. There are performance concerns with just doing the first step. A proposed optimization: Reorder the steps in adding an index so that the backfill happens before any live traffic to the new index is turned on, then start writing to the new index and use a rangefeed to catch up on writes in the gap between the backfill and when all nodes are writing to the new index. See sql: large latency spikes when creating a large storing index while running tpcc #45888 and schema changes: pre-backfill without online traffic then incremental backfill after #36850 for details. We'd want to do some speed-of-light experiments before committing to this approach.

Rolling back a column schema change, with this proposed approach, would essentially mean switching back to the pre-existing index. If we rolled back dropping a column, we still have all the column data as of the timestamp when the schema change was started.

This approach would also allow us to do ALTER TYPE in a single index backfill operation (see #46893).

Open questions:

  • What do we do for interleaved tables? Very preliminary thoughts:
    • For interleaved tables with no interleaved children, we should just be able to write the new primary index alongside the old one. I'm not sure what the implications are for cleaning up an index if we don't want to/can't roll it back normally. Presumably we can't just leave abandoned index KVs interleaved in some other table with no indication of this on any of the table descriptors.
    • For interleaved parents, can we do this without rewriting all the child interleaved indexes as well?

Jira issue: CRDB-4366

@petermattis
Copy link
Collaborator

Big 👍 .

Rolling back a column schema change, with this proposed approach, would essentially mean switching back to the pre-existing index. If we rolled back dropping a column, we still have all the column data as of the timestamp when the schema change was started.

Out of curiosity, when do we have to roll back dropping a column?

@thoszhang
Copy link
Contributor Author

Out of curiosity, when do we have to roll back dropping a column?

The schema change can be cancelled, or another schema change started in the same transaction can fail (e.g., a unique index can't be built), in which case we roll everything back.

@ajwerner
Copy link
Contributor

This would also fix #35738.

@petermattis
Copy link
Collaborator

Out of curiosity, when do we have to roll back dropping a column?

The schema change can be cancelled, or another schema change started in the same transaction can fail (e.g., a unique index can't be built), in which case we roll everything back.

Got it. #46541 seems not good. Trying to roll back a partially dropped column seems fraught. I wonder if we can arrange to perform the drop column schema changes last. And then not allow them to be rolled back.

@ajwerner
Copy link
Contributor

I wonder if we can arrange to perform the drop column schema changes last. And then not allow them to be rolled back.

We've talked about something like this in relatively vague terms. It seems like so long as we leave the dropped column public until all other mutations which can fail complete, and then make sure we remove all constraints before dropping a column, then make the drop column mutation something which cannot be canceled then we'll be out of this pickle.

Today we assume that the transaction which performs the drop sees the column in WRITE_ONLY for subsequent statements. My thinking is we should have that be true for that transaction but not actually commit the table descriptor in write only. This also gives us an opportunity to drop the constraints prior to making the dropped column non-public. That too is hazardous to roll back. My sense is as soon as we start dropping properties we need to make it impossible to roll back. In short:

  1. Commit Time
    • Create new columns and queue mutations
    • Add new columns in delete-only
  2. Add columns, indexes, and constraint
    • Still can cancel or fail
    • Add new columns in write-only, backfill new indexes, attempt to add constraints
  3. Drop constraints
    • Cannot cancel or fail
    • Drop constraints which have been dropped
    • Drop constraints for columns which are being dropped
  4. Drop columns
    • cannot cancel or fail

@thoszhang
Copy link
Contributor Author

I agree with all this. One thing to clarify: the new index that gets built in step (2) will exclude the columns being dropped, right? So if we have to roll back after validation at the end of step 2, we keep using the original index as though nothing happened. Otherwise, step (4) (which involves multiple table descriptor versions) consists of stepping through the states for the dropped column (but with no backfill needed), and the very last step is swapping to the new index.

@petermattis
Copy link
Collaborator

This all sounds fantastic.

Will we collapse multiple related schema changes into a single step? If we add two columns to a table in a transaction, would we write a single new primary key index, or two. This relates to Lucy's point about dropping a column at the same time we're rewriting the primary index to add a column.

@thoszhang
Copy link
Contributor Author

Will we collapse multiple related schema changes into a single step? If we add two columns to a table in a transaction, would we write a single new primary key index, or two.

I think we'll always be able to execute multiple schema changes started in the same transaction with just one rewrite of the primary index. That rewrite can encompass all column add/drop operations as well as changes to the primary key.

@thoszhang
Copy link
Contributor Author

@ajwerner @rohany @RichardJCai and I discussed this in person.

First a few notes, then the main problem:

  • This proposal can be thought of as a generalization of primary key changes: given an existing primary index, the index backfiller will be able to produce a new primary index with a specified list of columns and a specified primary key, in addition to being able to produce new secondary indexes (with specified lists of indexed columns and stored columns).
    • Idea for later: In theory, the source for the index backfiller doesn't always need to be the primary index. For instance, when rewriting a secondary index after changing a column's type, we could scan the original secondary index to build a copy with just that column changed. This would allow sorted ingestion for the index backfiller in some cases, which is ideal for producing non-overlapping SSTs, and it would also make primary and secondary indexes more symmetric.
  • We could implement this without changing how column mutations are created and queued. The schema changer would generate a set of new indexes to build based on the set of column mutations (and PK change mutations, etc.). This is a step toward making mutations more of a "logical" representation of pending schema changes as opposed to a place to store execution-related state.

The main problem with this proposal ("the generalized index backfiller") is with interleaved tables: If an index has interleaved children, it's not possible to write an updated index with a new index ID without also rewriting all its interleaved descendants. Depending on whether we decide this would be prohibitively expensive, we have two options:

  1. Never rewrite the interleaved descendants of a table. We only implement the generalized index backfiller for non-interleaved tables and interleaved tables without children, and fall back to the existing column backfiller for interleave parents.
  2. Implement the generalized index backfiller for all tables, which rewrites descendant tables when necessary.

(NB: The interleaved ancestors of the table would not need to be rewritten. The index backfiller can already backfill secondary indexes interleaved into another table.)

Issues with option 1:

  • The benefits listed above no longer apply for interleaved parent tables. We also introduce complexity by having a separate implementation just for interleaved tables (which, anecdotally, are currently under-tested relative to their complexity).
  • We lose the benefit of getting rid of the existing column backfiller entirely and having a single, unified implementation. @dt points out that the column and index backfillers are so different that having both under the same "backfiller" interface makes it impossible to optimize either.
  • For our current async non-transactional schema changes, the reordering of backfill steps to prohibit rolling back a column drop described above will now require 2 column backfills instead of 1 when there are both added and dropped columns in the same transaction. For the same reason, ALTER COLUMN TYPE will require 2 column backfills (as in the original RFC draft).
  • Having an index backfiller that doesn't touch the existing index will likely make transactional schema changes much easier to implement in the future, since we won't have to worry about other transactions seeing backfilled values for a non-public column in the original primary index, and we won't have to do a second backfill for rollbacks.

Issues with option 2:

  • Rewriting all the descendant tables is obviously expensive. If a child table is much larger than its parent table, it'll violate user expectations when the cost of doing a column schema change on the parent table is proportional to the size of the child table.
  • Writes to a child table will miss out on the 1PC optimization for the duration of a backfill initiated by an ancestor table, since writes will go to both the old and new indexes. I think this is no longer an issue if we end up implementing schema changes: pre-backfill without online traffic then incremental backfill after #36850.
  • A backfill that rewrites descendant tables requires coordinating a schema change across multiple tables, so that the table descriptor versions move in lockstep with each other. We'll have to update the schema change job to publish updates to multiple tables at once.

There are a few special performance concerns when backfilling an interleaved index compared to a non-interleaved index (applies to both options, but more so to (2)). We'll always be writing SSTs that overlap with existing keys in the span even if the new keys themselves are sorted, and there's some impact on traffic on the parent tables due to latch acquisition (not sure how significant this is). All this is already true for interleaved secondary indexes and interleaved ALTER PRIMARY KEY.

I think the next step is to figure out whether we can tolerate rewriting all the descendant tables for option (2).

ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 14, 2021
…ller

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in cockroachdb#47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

Release note: None
@ajwerner
Copy link
Contributor

ajwerner commented Jan 15, 2021

@thoszhang and I have been working on realizing this vision and have run into a couple of unanticipated things we needed to fix.

  1. The need to support computed expressions in index backfills
  2. The need to ensure that concurrent transactions do not write to the old primary index.
    • The primary index semantics are bespoke. Secondary index descriptors carry the columns they store as values. Primary index descriptors (while on the primary index), do not carry any columns in their StoreColumns slice.
    • Fortunately, it seems like the place where logic would need to change is pretty well isolated to
      // skipColumnInPK returns true if the value at column colID does not need
      // to be encoded because it is already part of the primary key. Composite
      // datums are considered too, so a composite datum in a PK will return false.
      // TODO(dan): This logic is common and being moved into TableDescriptor (see
      // #6233). Once it is, use the shared one.
      func (rh *rowHelper) skipColumnInPK(colID descpb.ColumnID, value tree.Datum) (bool, error) {
    • Fixing this will require finding some way to mark that the columns being added should not be included in the existing primary index.
      • There's a bunch of options here but the straw man I'll propose here is that we mark the column descriptor in some way to note that it's been added using the index swap protocol. A boolean will do.

ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 19, 2021
…ller

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in cockroachdb#47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None
@ajwerner
Copy link
Contributor

Item 2. above has been broken out into #59149.

ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 28, 2021
…ller

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in cockroachdb#47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jan 28, 2021
…ller

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in cockroachdb#47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None
craig bot pushed a commit that referenced this issue Jan 29, 2021
58990: sql: deal with computed and default column generation in index backfi… r=ajwerner a=ajwerner

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in #47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None

59261: sql: add full table or index scan count metric r=barryhe2000 a=barryhe2000

Previously, this metric was not available, but is now added so that users can
see when full table scans or full index scans are being used for their queries.

Fixes: #58653

Release note (ui change): User can see time series custom chart in admin ui
 for full table or index scans.

59478: cloudimpl: cache suffix in remote file sst wrapper r=dt a=dt

Reading SSTs starts with multiple tiny reads in offsets near the end of the file.
If we can read that whole region once and fulfill those reads from a cached buffer,
we can avoid repeated RPCs.

Release note: none.

Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Barry He <[email protected]>
Co-authored-by: David Taylor <[email protected]>
craig bot pushed a commit that referenced this issue Feb 2, 2021
58979: schemachanger,*: introduce new schema changer r=lucy-zhang a=ajwerner

This PR introduces the new schema changer, for which a partially written RFC (including a "guide-level explanation") can be found in #59288. It contains a somewhat complete implementation for a subset of `ALTER TABLE ... ADD COLUMN`. Like the existing implementation, these schema changes are planned and partially executed in the user transaction, but most of the work happens in an asynchronous job that runs after the transaction has committed. We use the index backfiller to build new columns, as described in #47989.

See the RFC and individual commits for details.

Missing things that will be in later PRs:

- Making execution changes to support concurrent writes for columns added using the index backfiller. We currently don't have a way to write column values which are not part of the primary index. See #59149. (#58990 takes care of the backfill part).
- Ensuring compatible coexistence with the old schema changer; in particular, polling for queued mutations to finish before any schema changes can start, using child transactions within the user transaction (#56588). The new schema changer assumes it has exclusive rights to perform schema changes on the table, which anticipates the transactional schema changes of the future.
- Reverting schema changes when failed or canceled.
- Properly checkpointing and reading job progress for the backfill. Currently there's a placeholder implementation that doesn't interact with the job at all.

Co-authored-by: Andrew Werner <[email protected]>
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
craig bot pushed a commit that referenced this issue May 19, 2022
81388: authors: add chinemeremchigbo to authors r=lassenordahl a=ChinemeremChigbo

authors: add chinemeremchigbo to authors

Release note: None
Release justification: non-production code change

81460: acceptance: run `python`, `psql` containers as current uid r=knz a=rickystewart

`postgres`'s permission checking for certificates has gotten more
rigorous since [this commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a59c79564bdc209a5bc7b02d706f0d7352eb82fa).
This has broken a couple `acceptance` tests which do not pin to any
specific `postgres` version (see #81313, #81437).

Here we attempt to solve the problem "once and for all" by ensuring that
these containers run with a UID that is equal to the one that created
the certificates.

Release note: None

81464: ui: Add tests for the TimeScaleDropdown component r=jocrl a=jocrl

This commit splits out tests in `timescale.spec.tsx` and adds additional tests
for the TimeScaleDropdown component, including testing that the custom time
picker is initialized to the currently selected time. This also adds
TimeScaleDropdown stories.

Release note: None

81486: sql/backfill: fix bug adding new columns to new index with volatile default r=ajwerner a=ajwerner

In #58990 we added support for the index backfiller to evaluate expressions. This unblocked the usage of virtual computed columns in secondary expressions, so wasn't a totally bogus change, but probably was trying to do too much without understanding all of the implications. The original motivation for that change was to unblock the then nascent declarative schema changer prototype which wanted to only implement #47989 as a column change protocol. In order to do that, it needed to evaluate default expressions when adding new columns to a new primary index. For whatever reason, probably lack of understanding, I made it such that all default expressions for columns which were in the adding state were evaluated. This is wrong in cases where the column has already been backfilled into the current primary index; it's wrong because volatile expressions will evaluate to different values causing the newly backfilled secondary index to be bogus and corrupt. 

This change addresses the failure of the above change by being more thoughtful about whether it's sane to evaluate a default expression when backfilling into an index. Note that it's still insane to backfill a volatile expression for a new column into the key of a new primary index. As of writing, there is no way to do this. 

Backports will address #81448.

Release note (bug fix): In 21.1 a bug was introduced whereby default values
were recomputed when populating data in new secondary indexes for columns
which were added in the same transaction as the index. This arises, for example
in cases like `ALTER TABLE t ADD COLUMN f FLOAT8 UNIQUE DEFAULT (random())`.
If the default expression is not volatile, then the recomputation is harmless.
If, however, the default expression is volatile, the data in the secondary
index will not match the data in the primary index: a corrupt index will have
been created. This bug has now been fixed.

81504: acceptance/roachtest: update ORM versions under test r=ecwall a=rafiss

See DXTPT-35

Release note: None

81523: sql: fix typo in warning on using DateStyle/IntervalStyle r=rafiss a=otan

Release note (sql change): Fixed a small typo when using DateStyle and
IntervalStyle.

Co-authored-by: Chinemerem <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Josephine Lee <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@ajwerner
Copy link
Contributor

I'm inclined to close this. The only reason not to would be that it only happens in certain cases and not in all cases. We'll keep working to make it happen in all cases in implicit transactions for 23.1 and hopefully in all cases in 23.2.

@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
@drew-vz
Copy link

drew-vz commented Dec 29, 2023

@ajwerner This specific issue is unrelated but I figured I would ask here. I am looking for a way to backfill for more than just adding or removing a column, but things like altering a table (renaming column). It seems to me that when a column is renamed, we would need all the CDC backfill records for that table with the rows and their new schema, so that they can be reflected properly downstream. Is there any way of accomplishing this that you know of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

6 participants