Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deal with computed and default column generation in index backfi… #58990

Merged

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Jan 14, 2021

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in #47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

I spent a bit of time firstly working on a low-level test to try to force the index backfill to happen and then also to do the backfill on a new column but that wasn't supported. Hence the small amount of testing here. This code gets exercised in the new schema changer code. Curious to hear takes.

@ajwerner
Copy link
Contributor Author

:( this code doesn't affect the virtual index code at all. I think this just works because of the way that the table fetcher works. In principle what this code enables is:

CREATE TABLE t (i INT PRIMARY KEY);
BEGIN;
    ALTER TABLE t ADD COLUMN j INT AS (i+1) VIRTUAL;
    CREATE INDEX idx ON t(j);
COMMIT;

But the above isn't supported for other reasons. I think what this means is this commit has zero testing. I think the test will need to synthesize some table state that isn't currently possible in order to exercise this code. I started typing it, I can finish it. It may come in handy one day.

Copy link
Contributor

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the code closely yet. I'm still trying to understand this example with the virtual column and the index added in the same transaction. If the fetcher used by the current index backfiller "just works" on already-existing public virtual columns (presumably because it just computes and returns values for the virtual columns like any other column?), and we currently have the ability to build indexes from non-public columns, then shouldn't your example also work once we get past ADD COLUMN being unimplemented for virtual columns?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)

@ajwerner
Copy link
Contributor Author

I'm still trying to understand this example with the virtual column and the index added in the same transaction. If the fetcher used by the current index backfiller "just works" on already-existing public virtual columns (presumably because it just computes and returns values for the virtual columns like any other column?), and we currently have the ability to build indexes from non-public columns, then shouldn't your example also work once we get past ADD COLUMN being unimplemented for virtual columns?

Yes, I believe so. The missing logic there is that we need to change the add column schema change logic to know that it doesn't need to run a backfill on the primary index. I suppose I could just try to pick that up in order to test that this logic works.

@thoszhang
Copy link
Contributor

I suppose I could just try to pick that up in order to test that this logic works.

"this logic" meaning what's on master, or with this PR? From what you're saying, it sounds like making the required changes to the add column logic on top of master, even without the changes to index backfill in this PR, will make #58990 (comment) work.

@ajwerner
Copy link
Contributor Author

From what you're saying, it sounds like making the required changes to the add column logic on top of master, even without the changes to index backfill in this PR, will make #58990 (comment) work.

I don't think so. The table fetcher doesn't deal with computed expressions for columns that aren't public, hence the need for this change. Though, I guess, now that I say it, computed virtual columns could go straight to public, so then you're right.

@thoszhang
Copy link
Contributor

The table fetcher doesn't deal with computed expressions for columns that aren't public, hence the need for this change.

Oh, that makes sense.

It seems like it'd be beneficial to backfill all indexes on virtual columns in the same way, regardless of whether the column is public. I'm not sure that we'd actually want to make virtual columns public immediately, since it's good to have all the columns become public simultaneously at the end of the schema change, and having (e.g.) SELECT * queries see some but not all columns being added during a schema change might cause new problems. In the new schema changer, I think we'd want to skip ahead to the backfilled state immediately, but not to the public state.

Could we modify the table fetcher to handle non-public virtual columns? It doesn't seem that ridiculous, given that the fetcher already handles non-public physical columns. We'd just need to guarantee that all the physical columns have already been backfilled, right? I don't have a good grasp on the details here.

@thoszhang
Copy link
Contributor

thoszhang commented Jan 14, 2021

Another relevant possibility is something like this:

CREATE TABLE t ();
BEGIN;
    ALTER TABLE t ADD COLUMN i INT DEFAULT 1;
    ALTER TABLE t ADD COLUMN j INT AS (i+1) VIRTUAL;
    CREATE INDEX idx ON t(j);
COMMIT;

@ajwerner
Copy link
Contributor Author

Could we modify the table fetcher to handle non-public virtual columns? It doesn't seem that ridiculous, given that the fetcher already handles non-public physical columns.

It does? I thought it was enforcing the whole thing about the column not being readable.

———

That last possibility doesn’t work because we’ll do the column backfill for the primary index first.

@thoszhang
Copy link
Contributor

Summarizing some offline discussion: We'll start off in 21.1 with all added virtual columns becoming public immediately, and disallowing any other schema changes that would be incompatible with this (e.g., adding virtual columns on non-public columns, or adding constraints on virtual columns that need validation). Since adding secondary indexes on public virtual columns is already supported, there's nothing else to be done for that. So this change is actually addressing an entirely separate concern and is pretty specific to the new schema changer itself.

@thoszhang
Copy link
Contributor

Since adding secondary indexes on public virtual columns is already supported

Actually, maybe not. The table fetcher seems to already handle this, but we need to make sure.

@ajwerner ajwerner force-pushed the ajwerner/generated-columns-index-backfill branch from a9fabad to 45b7e58 Compare January 19, 2021 07:09
@ajwerner
Copy link
Contributor Author

I fixed this up to work with virtual computed columns. I also added testing. That was the harder part. Github seems to be struggling this morning but when it gets its act together the build should pass now.

@ajwerner ajwerner force-pushed the ajwerner/generated-columns-index-backfill branch 2 times, most recently from 3e1da55 to 167be49 Compare January 19, 2021 17:23
@ajwerner
Copy link
Contributor Author

@adityamaru you should probably have a look at this as well.

@ajwerner ajwerner requested a review from adityamaru January 19, 2021 18:08
@ajwerner ajwerner force-pushed the ajwerner/generated-columns-index-backfill branch from 167be49 to 59b52c7 Compare January 19, 2021 18:15
Copy link
Contributor

@thoszhang thoszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly LGTM, but I do worry about testing more cases. It seems like a pain to write these low-level tests, and maybe our best bet for more comprehensive testing is the random schema change tests (both for indexes on virtual computed columns and for the new index backfiller column changes).

Reviewed 9 of 9 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @RaduBerinde)


pkg/sql/indexbackfiller_test.go, line 194 at r2 (raw file):

}

type indexBackfillTestCase struct {

nit: I think this and run should go inside the test function. It is annoying but ultimately I think it's good to have these scoped to the test.


pkg/sql/indexbackfiller_test.go, line 203 at r2 (raw file):

	// setupDesc should mutate the descriptor such that the mutation with
	// id 1.

nit: finish this sentence


pkg/sql/indexbackfiller_test.go, line 209 at r2 (raw file):

}

func (test indexBackfillTestCase) run(t *testing.T) {

I think this function would benefit from a few more comments inline in general.


pkg/sql/indexbackfiller_test.go, line 391 at r2 (raw file):

			mut.Mutations[len(mut.Mutations)-1].State = descpb.DescriptorMutation_DELETE_AND_WRITE_ONLY

			// This tests that we can backfill new computed, stored columns into

I think this is a stray comment.


pkg/sql/indexbackfiller_test.go, line 500 at r2 (raw file):

	))

	require.NoError(t, fetcher.StartScan(ctx, txn, spans, false, 0, true))

I think you'll have a merge conflict with #59208 which updated the fetcher scan arguments.


pkg/sql/backfill/backfill.go, line 409 at r2 (raw file):

	addedCols []descpb.ColumnDescriptor

	// computedCols are the column in this index which are computed and are do

nit: columns


pkg/sql/backfill/backfill.go, line 701 at r2 (raw file):

				isPrimaryIndex := idx.GetEncodingType(desc.GetPrimaryIndexID()) == descpb.PrimaryIndexEncoding
				if (idxContainsColumn || isPrimaryIndex) &&
					!(ib.cols[i].Virtual || i >= len(desc.Columns)) {

nit: I think I'd prefer !ib.cols[i].Virtual && i < len(desc.Columns).

I also find this opaque since it relies on the ordering of ib.cols. Add a comment here explaining what's going on?


pkg/sql/catalog/schemaexpr/computed_column.go, line 172 at r2 (raw file):

// slice of input column descriptors, or nil if none of the input column
// descriptors have computed expressions. The caller provides the set of
// valid to which the expr may refer.

nit: valid columns


pkg/sql/catalog/schemaexpr/computed_column.go, line 179 at r2 (raw file):

//
// Note that the order of columns is critical. Expressions cannot reference
// columns that come after them in cols.

nit: Update this comment with the new variable names

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other cases you'd like to see tested?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @lucy-zhang, and @RaduBerinde)


pkg/sql/indexbackfiller_test.go, line 194 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: I think this and run should go inside the test function. It is annoying but ultimately I think it's good to have these scoped to the test.

sure, that's where it started.


pkg/sql/indexbackfiller_test.go, line 203 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: finish this sentence

Done.


pkg/sql/indexbackfiller_test.go, line 209 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

I think this function would benefit from a few more comments inline in general.

Done.


pkg/sql/indexbackfiller_test.go, line 391 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

I think this is a stray comment.

Removed.


pkg/sql/indexbackfiller_test.go, line 500 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

I think you'll have a merge conflict with #59208 which updated the fetcher scan arguments.

resolved.

@ajwerner ajwerner force-pushed the ajwerner/generated-columns-index-backfill branch from 59b52c7 to ebc5e1b Compare January 28, 2021 07:45
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityamaru any chance I could get you to sign off on this guy?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @lucy-zhang, and @RaduBerinde)

@adityamaru
Copy link
Contributor

@adityamaru any chance I could get you to sign off on this guy?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @lucy-zhang, and @RaduBerinde)

sorry, giving it a look right now

@adityamaru
Copy link
Contributor

LGTM! Those unit tests are really cool.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @lucy-zhang)


pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):

	}

	// fetchIndex fetches the contents of an a table index returning the results

I think there would be a lot of value in exposing something like this in SQL via a special statement or a virtual table. It would output a representation of the KVs that we could check for in various tests (including logic tests).

…ller

This commit was mostly motivated by work on the new schema changer to enable
the behavior described in cockroachdb#47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.

The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/generated-columns-index-backfill branch from ebc5e1b to 5b2198f Compare January 28, 2021 18:24
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @lucy-zhang, and @RaduBerinde)


pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):

Previously, RaduBerinde wrote…

I think there would be a lot of value in exposing something like this in SQL via a special statement or a virtual table. It would output a representation of the KVs that we could check for in various tests (including logic tests).

I agree. I think I'd like to defer that to a later time but I've filed an issue. #59549


pkg/sql/backfill/backfill.go, line 409 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: columns

Done.


pkg/sql/backfill/backfill.go, line 701 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: I think I'd prefer !ib.cols[i].Virtual && i < len(desc.Columns).

I also find this opaque since it relies on the ordering of ib.cols. Add a comment here explaining what's going on?

Done.


pkg/sql/catalog/schemaexpr/computed_column.go, line 172 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: valid columns

Done.


pkg/sql/catalog/schemaexpr/computed_column.go, line 179 at r2 (raw file):

Previously, lucy-zhang (Lucy Zhang) wrote…

nit: Update this comment with the new variable names

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @lucy-zhang)


pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):

Previously, ajwerner wrote…

I agree. I think I'd like to defer that to a later time but I've filed an issue. #59549

Thanks, yeah I didn't mean in this PR :)

@ajwerner
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 28, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 29, 2021

Build succeeded:

@craig craig bot merged commit 054c618 into cockroachdb:master Jan 29, 2021
craig bot pushed a commit that referenced this pull request Feb 2, 2021
58979: schemachanger,*: introduce new schema changer r=lucy-zhang a=ajwerner

This PR introduces the new schema changer, for which a partially written RFC (including a "guide-level explanation") can be found in #59288. It contains a somewhat complete implementation for a subset of `ALTER TABLE ... ADD COLUMN`. Like the existing implementation, these schema changes are planned and partially executed in the user transaction, but most of the work happens in an asynchronous job that runs after the transaction has committed. We use the index backfiller to build new columns, as described in #47989.

See the RFC and individual commits for details.

Missing things that will be in later PRs:

- Making execution changes to support concurrent writes for columns added using the index backfiller. We currently don't have a way to write column values which are not part of the primary index. See #59149. (#58990 takes care of the backfill part).
- Ensuring compatible coexistence with the old schema changer; in particular, polling for queued mutations to finish before any schema changes can start, using child transactions within the user transaction (#56588). The new schema changer assumes it has exclusive rights to perform schema changes on the table, which anticipates the transactional schema changes of the future.
- Reverting schema changes when failed or canceled.
- Properly checkpointing and reading job progress for the backfill. Currently there's a placeholder implementation that doesn't interact with the job at all.

Co-authored-by: Andrew Werner <[email protected]>
craig bot pushed a commit that referenced this pull request May 19, 2022
81388: authors: add chinemeremchigbo to authors r=lassenordahl a=ChinemeremChigbo

authors: add chinemeremchigbo to authors

Release note: None
Release justification: non-production code change

81460: acceptance: run `python`, `psql` containers as current uid r=knz a=rickystewart

`postgres`'s permission checking for certificates has gotten more
rigorous since [this commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a59c79564bdc209a5bc7b02d706f0d7352eb82fa).
This has broken a couple `acceptance` tests which do not pin to any
specific `postgres` version (see #81313, #81437).

Here we attempt to solve the problem "once and for all" by ensuring that
these containers run with a UID that is equal to the one that created
the certificates.

Release note: None

81464: ui: Add tests for the TimeScaleDropdown component r=jocrl a=jocrl

This commit splits out tests in `timescale.spec.tsx` and adds additional tests
for the TimeScaleDropdown component, including testing that the custom time
picker is initialized to the currently selected time. This also adds
TimeScaleDropdown stories.

Release note: None

81486: sql/backfill: fix bug adding new columns to new index with volatile default r=ajwerner a=ajwerner

In #58990 we added support for the index backfiller to evaluate expressions. This unblocked the usage of virtual computed columns in secondary expressions, so wasn't a totally bogus change, but probably was trying to do too much without understanding all of the implications. The original motivation for that change was to unblock the then nascent declarative schema changer prototype which wanted to only implement #47989 as a column change protocol. In order to do that, it needed to evaluate default expressions when adding new columns to a new primary index. For whatever reason, probably lack of understanding, I made it such that all default expressions for columns which were in the adding state were evaluated. This is wrong in cases where the column has already been backfilled into the current primary index; it's wrong because volatile expressions will evaluate to different values causing the newly backfilled secondary index to be bogus and corrupt. 

This change addresses the failure of the above change by being more thoughtful about whether it's sane to evaluate a default expression when backfilling into an index. Note that it's still insane to backfill a volatile expression for a new column into the key of a new primary index. As of writing, there is no way to do this. 

Backports will address #81448.

Release note (bug fix): In 21.1 a bug was introduced whereby default values
were recomputed when populating data in new secondary indexes for columns
which were added in the same transaction as the index. This arises, for example
in cases like `ALTER TABLE t ADD COLUMN f FLOAT8 UNIQUE DEFAULT (random())`.
If the default expression is not volatile, then the recomputation is harmless.
If, however, the default expression is volatile, the data in the secondary
index will not match the data in the primary index: a corrupt index will have
been created. This bug has now been fixed.

81504: acceptance/roachtest: update ORM versions under test r=ecwall a=rafiss

See DXTPT-35

Release note: None

81523: sql: fix typo in warning on using DateStyle/IntervalStyle r=rafiss a=otan

Release note (sql change): Fixed a small typo when using DateStyle and
IntervalStyle.

Co-authored-by: Chinemerem <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Josephine Lee <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants