Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: enable adding columns which are not written to the current primary index #59149

Closed
ajwerner opened this issue Jan 19, 2021 · 39 comments
Closed
Assignees
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jan 19, 2021

Is your feature request related to a problem? Please describe.

This issue is a breakout of point 2 in #47989 (comment). Column backfills for changing the set of columns in a table have a number of issues outlined in #47989. The alternative is to build a new primary index and swap to it. We support this in theory with primary key changes, however, there's some slight nuance in that primary key changes require the set of columns in the new and old primary index are the same. This issue is concerned with the requirement that concurrent writer not write any new columns being backfilled to this new primary index to the old primary index.

The root of the problem is that primary index descriptors are handled specially. For secondary indexes, the columns stored in the value are primary index columns specified in ExtraColumnIDs for unique indexes and the columns in StoreColumnIDs. For primary indexes, however, all of the table columns are stored.

The hazard is that, if we don't change anything, the column in DELETE_AND_WRITE_ONLY will end up being written to the existing primary index.

Describe the solution you'd like

The thrust of the solution is twofold:

  1. Update the table descriptor to encode the need to not write these columns to the current primary index.
  2. Adopt this change where necessary.

Updated descriptor structure

I have two proposals here, one that I prefer but is riskier and one that is less invasive but worse. The latter does not in any
A clear and concise description of what you want to happen.

  • A) Make primary indexes look like secondary indexes and populate their StoreColumnIDs.

    • Pros
      • This is nice for a variety of reasons. It's not obvious to me why we maintain this distinction save for legacy reasons. It should simplify index handling code.
    • Cons
      • It's slightly less compact.
      • It'd be a migration that would affect a lot of code.
  • B) Add another field to the table descriptor to encode the set of columns which should not be written to the primary index

    • Pros
      • Much less invasive
    • Cons
      • More cruft

Adopting change to descriptor structure.

Let's assume we're going to go with B) as it seems more tractable. One thing which will need to change is the code which writes rows. My current reading is that we might be able to isolate this change to legitimately just this function. @yuzefovich could I ask you to verify that claim and generally have a look at this issue?

The bigger unknown for me is what changes would need to be made in the optimizer. @RaduBerinde could I ask you to give review this and provide some guidance.

Additional context

Something must be done here to realize #47989. More pressingly, we are working to not include support for column backfills in the new schema change rewrite so realistically something needs to be done here in the next few weeks.

Epic: CRDB-2356

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 19, 2021
@ajwerner ajwerner added the A-schema-descriptors Relating to SQL table/db descriptor handling. label Jan 19, 2021
@ajwerner
Copy link
Contributor Author

@RaduBerinde I'm assigning you only because I'd love to get some intel from you on implications in the optimizer for the two proposals to the table descriptor structure.

@RaduBerinde
Copy link
Member

The optimizer's catalog treats the primary index just like any index - we already present all columns as stored columns (with the optimizer catalog objects like optTable making the translation). The difference in the catalog would be rather small (e.g. the primary index could simply not present these mutation columns).

However, the optimizer does assume that a scan of the primary index can produce all necessary columns - this is a core assumption that isn't easy to work around. So from the optimizer side, we will run into problems if we ever have to fetch a value on such a column.

Seems like we shouldn't, but I remember there were surprising cases where the execution engine needed the old value of a mutation column. I don't think I ever exactly understood the details, perhaps @andy-kimball remembers. There are some mentions in #46285

Would we ever have a secondary index on such a mutation column? In that case, it's pretty clear that the optimizer must produce the old value, e.g. for a DELETE. I'm hoping that we sequence these schema changes so we only add such indexes after we finished adding the new column.

@andy-kimball
Copy link
Contributor

We should never be reading values in mutation columns. However, we've had bugs in the past where we were reading them. I fixed some cases where we were mistakenly reading those values, but there may be others.

@thoszhang
Copy link
Contributor

Would we ever have a secondary index on such a mutation column? In that case, it's pretty clear that the optimizer must produce the old value, e.g. for a DELETE. I'm hoping that we sequence these schema changes so we only add such indexes after we finished adding the new column.

I'm not sure reordering is possible. If we were to add a column, and then add a secondary index on the column later in the same transaction (which today we do support), I think we do have to keep the column non-public until after we finished backfilling the index on it. The reason is that "finishing adding the new column" means swapping the primary index, but at that point it's difficult to roll back.

Just to clarify, don't we still need to "produce the old value" on deletes even just for the secondary index which is the replacement primary index? It's still not clear to me why this, by itself, isn't already a problem.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jan 20, 2021

I'm not sure reordering is possible. If we were to add a column, and then add a secondary index on the column later in the same transaction (which today we do support), I think we do have to keep the column non-public until after we finished backfilling the index on it. The reason is that "finishing adding the new column" means swapping the primary index, but at that point it's difficult to roll back.

I think it's sort of possible. Today we support having columns which are not public in the primary index. This happens both when adding and when dropping. I think it may make sense to do the primary index swap before making new columns (or secondary indexes) public. In today's world that leaves us in a somewhat weird place regarding constraints but I've come around to that being okay (actually, sort of good). The summary is that if we keep constraints around for quite a long time, then we can roll things back safely. The downside is that it might break some small amount of schema changes, but things are super broken today. Also, no transaction will need to write anything other than default values to columns which exist in indexes but don't exist in the primary key. There will never be a requirement to read column values for columns not in the primary index.

Below find a riff on #47989 (comment)

Let's consider the following scenario:

CREATE TABLE pt (i INT PRIMARY KEY);
CREATE TABLE t (
     i INT PRIMARY KEY,
     j INT NOT NULL CHECK (j > 0)
);
BEGIN;
ALTER TABLE t DROP COLUMN j;
ALTER TABLE t ADD COLUMN k INT NOT NULL AS (i*i) STORED CHECK (k>0) UNIQUE;
INSERT INTO t VALUES (1), (-1);
COMMIT;

Let's dig into what this might look like in some different scenarios.

Today

The above today goes extremely poorly #46541.

  • User transaction
    • Commit j in DELETE_AND_WRITE_ONLY, k and its unique index in DELETE_ONLY. At this point we're writing 0 for values of j. This is already very bad because concurrent transactions may be using that check constraint.
  • Job
    • Wait for t to drain leases.
    • Move j to DELETE_ONLY and k and its index to DELETE_AND WRITE_ONLY
    • Column backfill the primary index (deleting all known values of j and populating the values for k).
    • Index backfill k's unique index
    • Validate the index and determine that the unique constraint is violated
    • Attempt to rollback:
      • Try to move k to DELETE_ONLY and j back to DELETE_AND_WRITE_ONLY
      • Column backfill the other way
      • Try to validate the check constraint, fail, explode.

Let's also look at the observed behaviors:

  • Inside the transaction
    • j is gone immediately after the statement that drops it
    • k is not available
    • We permit this user transaction to violate the check constraint on j
  • Concurrent transactions after the commit of the user transaction.
    • Concurrent readers after the user transaction commits will observe that it wrote 0 for j (opt: fix handling of write-only columns during update #46285)
    • Concurrent writers using the new version will observe that the check constraint on the new column (i.e. it will not be possible to insert the value 0 for i.

Index swapping (non-transactional, keeping constraints around)

  • User transaction
    • The zero value default for j will fail, so the user transaction will not commit.
  • User transaction assuming we had dropped the check constraint
    • j is moved to DELETE_AND_WRITE_ONLY, k, the new primary index which has k and not j and the unique index for k are in DELETE ONLY
  • Job
    • The DELETE_ONLY things are moved to DELETE_AND_WRITE_ONLY.
    • The new primary index is backfilled (it has no j but it does have k).
    • The unique index is built, violation detected
    • The new indexes and column are dropped.
    • Everything is okay.

This isn't a super happy vision however. Imagine we didn't drop the check on j but rather just didn't violate it in that transaction and pre-existing data violated it. In that case, during the entire schema change, no concurrent writers would be able to write to the table. I'm honestly fine with this. You can do other dumb things. At least if you cancel this it will go back to a happy state.

@RaduBerinde
Copy link
Member

We should never be reading values in mutation columns. However, we've had bugs in the past where we were reading them. I fixed some cases where we were mistakenly reading those values, but there may be others.

@andy-kimball I am referring to this discussion in #46285:

[radu] We should add some comments before the buildScan calls in buildInputForUpsert and buildInputForUpdate explaining why we are reading mutation values, and pointing out that we aren't (and shouldn't be) using them to calculate any "new" values.
[andy] I added some comments to buildScan. We actually need those values for the Delete case as well. The execution engine panics if they're not available, even for Delete.

Here is a current test; notice how it's scanning the o, p, q columns (and they are set as fetch columns so they won't be pruned):
https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/optbuilder/testdata/update#L1338

@RaduBerinde
Copy link
Member

Just to clarify, don't we still need to "produce the old value" on deletes even just for the secondary index which is the replacement primary index? It's still not clear to me why this, by itself, isn't already a problem.

@lucy-zhang we really only need values for columns that are part of a key; if there are no secondary indexes on the new column we shouldn't need the value.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jan 20, 2021

Just to clarify, don't we still need to "produce the old value" on deletes even just for the secondary index which is the replacement primary index? It's still not clear to me why this, by itself, isn't already a problem.

@lucy-zhang we really only need values for columns that are part of a key; if there are no secondary indexes on the new column we shouldn't need the value.

To clarify, we may need the values of new column not in the primary index for the purpose of writing, it's just that we will be able to produce their values without reading anywhere. For new columns which are not computed, that will mean using their default value. For computed columns, it will mean computing them. Also, those computed columns may rely on non-computed columns, so the default expr evaluation may need to be performed.

@ajwerner
Copy link
Contributor Author

In fact, it's somewhat complex. I think the way that the update will work is that before the new column is made public, the new primary index will need to be swapped in so that new updates using the version preceding the column becoming public will retain values of writes which saw the column.

@ajwerner
Copy link
Contributor Author

ajwerner commented Jan 20, 2021

For a simpler scenario:

CREATE TABLE t (i INT PRIMARY KEY);
ALTER TABLE t ADD COLUMN j INT UNIQUE
  1. v1: before schema change
  2. v2: new column, new primary index, new secondary index in DELETE_ONLY
  3. v3: new column, new primary index, new secondary index in DELETE_AND_WRITE_ONLY
    • Writes will need to use the default value to write to the secondary index until the column is in the primary index
  4. v3: index backfill to new primary index occurs
  5. v4: new primary index made public, old primary index made DELETE_AND_WRITE_ONLY, now writes to secondary index need to be reading off of the primary index. The secondary index is still is still DELETE_AND_WRITE_ONLY.
  6. v3: index backfill to unique index occurs
  7. v5: new column and secondary index is made public, old primary index can move to DELETE_ONLY,...

@RaduBerinde
Copy link
Member

In this scenario, during "index backfill to unique index" we will need to read values of j from the new primary index (e.g. if we delete a row, we need to delete any potential entry in the unique index which might have been backfilled). This won't work if our primary index is still the old one. I think we could make the new primary index public right after "index backfill to new primary index", no?

@ajwerner
Copy link
Contributor Author

I think we could make the new primary index public right after "index backfill to new primary index", no?

Indeed

@ajwerner
Copy link
Contributor Author

Updated and added numbers for future discussion clarity

@ajwerner
Copy link
Contributor Author

Maybe the secondary index should not enter DELETE_AND_WRITE_ONLY until the new primary index has been swapped in. Seems like this is what it boils down to:

  • If a column is not in a primary index, then, when synthesizing values for indexes which contain it, the default value should be used. Computed columns not in the primary index, regardless of whether or not they are stored, will need to be computed.
  • Any index that contains columns not in the primary index should be considered a new primary index (and marked such in their encoding)
  • Columns which are not in the primary index should only be in exactly one other index which is writable (aka not DELETE_ONLY or absent).

@ajwerner
Copy link
Contributor Author

This leaves:

  1. v1: before schema change
  2. v2: new column, new primary index, new secondary index in DELETE_ONLY
  3. v3: new column, new primary index enter DELETE_AND_WRITE_ONLY, new secondary index remains DELETE_ONLY
  4. v3: index backfill to new primary index occurs
  5. v4: new primary index made public, old primary index made DELETE_AND_WRITE_ONLY, now writes to secondary index need to be reading off of the primary index. The secondary index enters DELETE_AND_WRITE_ONLY
  6. v4: index backfill to unique index occurs
  7. v5: new column and secondary index is made public, old primary index can move to DELETE_ONLY
  8. v6: old primary index is dropped.

@thoszhang
Copy link
Contributor

Questions about this proposal that I have yet to think through:

  • What happens when dropping a column with a unique constraint (backed by a unique index)? As you said, we have to preserve all constraints on the column as long as there's a possibility of rolling back the column drop, so I think that means we can't drop old unique indexes until we've backfilled new ones (which may fail). But then the old unique index will be indexing a non-public column that isn't indexed by the new primary index, so we're back at the same problem.

    Maybe you could solve this particular problem by having two primary index swaps: in a transaction that adds column a and drops column b, you'd first swap to a primary index including both a and b, then build all the new indexes in a state where all columns are public, then swap to a primary index with just a and not b (at which point you'd have reached the point of no return, and can start removing constraints). But that makes some schema changes twice as expensive.

  • I'm not sure I understand this:

    Imagine we didn't drop the check on j but rather just didn't violate it in that transaction and pre-existing data violated it. In that case, during the entire schema change, no concurrent writers would be able to write to the table. I'm honestly fine with this.

    How can pre-existing rows violate the check constraint? If this is about unvalidated check constraints, I think we should feel free to drop and re-add unvalidated constraints at will.

    But I take your point that there are situations where it would be impossible to continue writing to the table (like dropping a column having a default value incompatible with a constraint). I guess what you're outlining isn't worse than our previous plan; even if we built indexes before making new columns public, if we kept enforcing constraints on dropped columns, then we still have the entire duration of the backfill (and before that) during which writes are impossible. I'm starting to think this is bad and we should figure out a different approach.

Ultimately, I still think the "correct" way to solve this would be to allow secondary indexes, at least in this restricted case, to also provide values for non-public columns. I would be disappointed if we ended up codifying this set of schema change state dependencies purely to work around the limitation on indexes on non-public columns (AFAIK that would be the only reason why we'd adopt it). It's true that we can change the state graphs much more easily in the new schema changer, but reasoning about correctness is still not straightforward.

@thoszhang
Copy link
Contributor

@RaduBerinde Will check and foreign key constraints work as-is for mutation columns not in the primary index, or do we also need to make changes there? (FK constraints aren't supported for computed columns, but they are supported for columns with default values.)

@ajwerner
Copy link
Contributor Author

Great point Lucy! I lost sight of the whole drop column debacle motivating this. I think the saving fact for all of this is the fact that new columns, until they are public, are only getting written with default values (or computed values which can be derived from them). I think we need to keep those new columns in that state until nothing can fail.

I believe that this motivates the original ordering of the proposal (edited away) where the swap to the new primary index happens only after all new indexes have been built and constraints verified.

@RaduBerinde
Copy link
Member

I don't think we support FKs on mutation columns currently (or at the very least, they're not tested in the optimizer).. I was under the impression that this isn't currently possible? I don't see how that would work, you need to read values to check FKs (and you can't just recompute the default value every time, because it could change, e.g. unique_rowid()).

@ajwerner
Copy link
Contributor Author

I don't think we support FKs on mutation columns currently (or at the very least, they're not tested in the optimizer).. I was under the impression that this isn't currently possible? I don't see how that would work, you need to read values to check FKs (and you can't just recompute the default value every time, because it could change, e.g. unique_rowid()).

This comes back to the need to have another state that indicates that the values have been backfilled and are readable.

@thoszhang
Copy link
Contributor

I don't think we support FKs on mutation columns currently (or at the very least, they're not tested in the optimizer).. I was under the impression that this isn't currently possible? I don't see how that would work, you need to read values to check FKs (and you can't just recompute the default value every time, because it could change, e.g. unique_rowid()).

We definitely do support FKs on mutation columns (as Andrew implied, we first get the column into delete-and-write-only, then backfill, then validate, then make the column public). This is how we support statements like ALTER TABLE t ADD COLUMN c INT DEFAULT <some expression> REFERENCES <something>. We've had this since 19.2, I think.

I'll come back to the other points later.

@RaduBerinde
Copy link
Member

The FK relation only appears (to the optimizer) after the column backfill, correct? If yes, then it works but only because the column is really in this "non-public, but readable" state. It's definitely fragile as far as opt is concerned.

@thoszhang
Copy link
Contributor

The FK relation only appears (to the optimizer) after the column backfill, correct? If yes, then it works but only because the column is really in this "non-public, but readable" state. It's definitely fragile as far as opt is concerned.

OK, that makes sense, and was my vague impression as well (@andy-kimball and I discussed it in the context of check constraints a long time ago). We do want to introduce a more formal notion of this state (non-public, but readable) in the new schema changer, since it's essential to constraint validation.

@thoszhang
Copy link
Contributor

I believe that this motivates the original ordering of the proposal (edited away) where the swap to the new primary index happens only after all new indexes have been built and constraints verified.

But Radu's point was that we can't proceed with this approach without extensive changes to the optimizer, right? We have to be able to read the default values that we wrote (since default expressions can use impure functions) in order to delete from the secondary index on the non-public column, but those values aren't in the old primary index.

It seems like we have two unpalatable options:

  1. Make changes to the optimizer to somehow also support scans on the new replacement primary index in addition to the current primary index, and combine those results to get the full set of column values for the table. (@RaduBerinde do you have an estimate of how much effort would be required for this?)
  2. Reorder the schema changer steps to avoid ever having to read from a column not in the primary index. We're also constrained in other ways (e.g., making sure we don't lose data for dropped columns with constraints), so I'm not even 100% sure this is possible.

Are there other options I'm missing?

@RaduBerinde
Copy link
Member

1 would be a lot of effort.. And might need some kind of execution support (for the "combine those results" part).

Regarding the "non-public but readable" column state - from a planning/execution standpoint, is there any difference between this state and a public hidden column (other than not being able to select the column even by name)? If not, the best way to expose that to the optimizer to avoid a lot of unnecessary complication would be to present it as a regular column and add a new kind of "hidden" property ("non-selectable" or whatever).

@ajwerner
Copy link
Contributor Author

ajwerner commented Mar 9, 2021

This issue turns out the be the biggest blocker for 21.2 and the new schema changer. To recap:

  1. We want to get rid of the column backfiller and in-place rewriting of primary indexes
  2. We want to make rolling back from failed schema changes safe and possible for all schema change.
  3. We need to support schema changes which involve both adding and dropping column in the same schema change. Dropped columns may have secondary indexes or constraints.
  4. Schema changes may fail.
  5. After a schema change fails, the state will match the state before the schema change.
  6. Given that, we must uphold the constraints and values for the dropped column[s] until the schema change can no longer fail.
  7. If the only primary indexes are the final one and the original one, then some indexes will require a lookup in order to be maintained.

There are two basic directions we could take:

  1. In transactions where users both add and drop columns, we could create an intermediate primary index which is the union of the columns from before and after the change. We could backfill this new index, swap it in as the intermediate primary index, then backfill any secondary indexes (and the final primary index), validate constraints, and then after that has all happened, swap to the final primary index and drop the original and intermediate primary index.
  • This means that adding and dropping a column will store the table 3 times (rather than the current 2x)
  1. We could support planning the required reads to maintain the new (or old) secondary indexes.
  • This is more complex
  • This opens up the door to more generally loosening the concept of a primary index.

Additional nuance:

AddSSTable does not interact perfectly transactions. We can slot SSTs with data with timestamps which may be shadowed but we cannot have concurrent operations read from the index being backfilled. So, say we want to insert or update a row, what value should we use for the new column? Can we always re-evaluate the default expression before it gets used? Is this going to lead to weirdness with sequences?

@RaduBerinde
Copy link
Member

Note that currently when you do an UPDATE on a row, the optimizer projects the default expression for any mutation column (because the updated row might not get backfilled). So there is precedent for re-evaluating these expressions. Any weirdness with sequences and the like is not a correctness violation (we don't make any guarantees about the order in which we backfill rows, and values generated by sequences are allowed to have gaps).

@ajwerner
Copy link
Contributor Author

@RaduBerinde here's the related question. Say we do go down this path where we're going to swap to using a new primary index. The main things we'd need to be able to support is:

  1. not writing the new columns we're in the process of backfilling to the existing primary index
  2. writing these new columns to the primary index we're backfilling

The important thing to note here is that we can fully synthesize these values. It seems possible to constraint this concept of having columns which we do not want to encode into the current primary index entirely down in the row writing layer. Does that seem correct? My reading of the code (similar to what you just said) is that we synthesize new values for all of these mutation columns any time we insert or update.

@RaduBerinde
Copy link
Member

We also read the "old" values on updates, even if we synthesize new ones; this is problematic because the canonical primary index scans have to be able to produce these columns. These values are needed by execution, my guess is for when they are part of an index key.

This probably stems from a disconnect on what "mutation" columns/index combinations are possible, compounded with the lack of a "non-public but backfilled" state. If we make it clear that a mutation column can never be part of an index (is that true?), then we could fix this.

@ajwerner
Copy link
Contributor Author

If we make it clear that a mutation column can never be part of an index (is that true?), then we could fix this.

That's roughly my plan. It's to leave the indexes which involve new columns in DELETE_ONLY until the new column backfilled and made part of a new primary index which is set to PUBLIC even though the new column remaines DELETE_AND_WRITE_ONLY.

@RaduBerinde
Copy link
Member

Even in delete-only index state, if the column is part of the index key, we need to read the old value in order to delete the corresponding KV. So we would only be able to support indexes that store these columns.

@RaduBerinde
Copy link
Member

I think the best way to evaluate the proposal from the optimizer side is to make a tentative diff to the opt/cat documentation which explains precisely which combinations of mutation states would be possible. It will take some effort and back-and-forth, but it will be invaluable.

@ajwerner
Copy link
Contributor Author

Even in delete-only index state, if the column is part of the index key, we need to read the old value in order to delete the corresponding KV. So we would only be able to support indexes that store these columns.

We can deal with that, it'll take some descriptor finagling but we can add yet another descriptor step. We'll just commit the new indexes as totally not existing and then move them through the relevant steps after the primary index swap occurs.

@ajwerner
Copy link
Contributor Author

Alright, I think we've reached a plan here. We're going to accept the extra index backfill in some cases. Those cases are:

ALTER TABLE ... ADD COLUMN

and either

ALTER TABLE ... DROP COLUMN

or

ALTER TABLE ... ALTER PRIMARY KEY -- to something that includes a newly added column

For various reasons, this seems fine.

The new concepts we need to introduce is a column which we mark as not being stored in the primary index. The critical limitation of these is that they must not appear as a key in any index.

We'll expose the first concept to the optimizer just to make sure we don't do anything insane but the optimizer doesn't need to do anything about it. The main user of this is going to be the row writing logic which is going to need to avoid writing omitted columns into the primary index. This will be quite well contained.

We don't need to expose any new absent index concepts because the new schema changer state already represents that.

There's an invariant that we'll enforce that any time a column is a key in a secondary index it much be present in the current primary index of the table. We'll want to make that rule concrete in the schemachange planner.

I'm going to prototype this tomorrow.

@postamar postamar self-assigned this May 11, 2021
postamar pushed a commit that referenced this issue Jun 16, 2021
In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes #59149.

Release note: None
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
postamar pushed a commit that referenced this issue Jun 16, 2021
In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes #59149.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jun 18, 2021
In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes cockroachdb#59149.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jun 21, 2021
In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes cockroachdb#59149.

Release note: None
craig bot pushed a commit that referenced this issue Jun 22, 2021
66599: sql: enable adding columns which are not written to the current primary index r=postamar a=postamar

    row,schemachanger: use StoreColumnIDs/Names in primary index
    
    In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
    primary indexes were populated when previously they had simply been
    empty. We simply assumed that all non-virtual columns in a table would
    be stored in the primary index: primary key columns in the key, the rest
    in the value.
    
    This commit breaks that assumption by using the StoreColumnIDs slice to
    determine what goes into the primary index. This makes it possible for
    the new schema changer to add columns safely, preventing unwanted writes
    to the old primary index while the schema change is underway.
    
    Fixes #59149.
    
    Release note: None


    sql,tabledesc: add new IndexDescriptorVersion for primary indexes
    
    Previously, the IndexDescriptorVersion type was only used to describe
    the encoding of secondary indexes. This commit adds a new value for
    use in primary indexes, PrimaryIndexWithStoredColumnsVersion, to signify
    that the StoredColumnIDs and StoredColumnNames slices are populated
    correctly.
    
    Previously, these slices did not need to be populated at all. This is
    because the set of columns comprising the primary index of a table is
    assumed to be all non-virtual columns of that table. Our upcoming work on
    the new schema changer will require us to violate that assumption
    however. This commit is in preparation of that change.
    
    In our effort to make meaningful the concept of stored columns in
    primary indexes, this commit also changes the contents of the
    information_schema.statistics table. As a result, SHOW INDEXES and SHOW
    COLUMNS behave the same way regardless of whether an index is primary or
    secondary.
    
    Release note (sql change): The contents of the statistics table in the
    information schema have changed, therefore so have the results of SHOW
    INDEX and SHOW COLUMNS. A column which is not in the primary key will
    now be listed as belonging to the primary index as a stored column.
    Previously, it was simply not listed as belonging to the primary index.

66664: sql: First round of cleanup of schemachange/random-load r=ajwerner,otan a=ajstorm

This issue addresses several issues uncovered in running the randomized
schema changer. Specifically:

- Makes several errors pgcodes, so that they can be properly added to
  the expected errors list in the randomized schema changer.
- Detects cases where the region column (crdb_region) is used multiple
  times in an index definition.
- Allows for column type changes, which must have the experimental flag
  enable_experimental_alter_column_type_general flag set.

It also disables the testing of setColumnType (tracked with #66662) as
well as making a column nullable/non-nullable due to a timing hole
(tracked with #66663).

Release note: None.

Co-authored-by: Marius Posta <[email protected]>
Co-authored-by: Adam Storm <[email protected]>
@craig craig bot closed this as completed in 3dd87a4 Jun 22, 2021
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

6 participants