-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: deal with computed and default column generation in index backfi… #58990
sql: deal with computed and default column generation in index backfi… #58990
Conversation
I spent a bit of time firstly working on a low-level test to try to force the index backfill to happen and then also to do the backfill on a new column but that wasn't supported. Hence the small amount of testing here. This code gets exercised in the new schema changer code. Curious to hear takes. |
:( this code doesn't affect the virtual index code at all. I think this just works because of the way that the table fetcher works. In principle what this code enables is:
But the above isn't supported for other reasons. I think what this means is this commit has zero testing. I think the test will need to synthesize some table state that isn't currently possible in order to exercise this code. I started typing it, I can finish it. It may come in handy one day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the code closely yet. I'm still trying to understand this example with the virtual column and the index added in the same transaction. If the fetcher used by the current index backfiller "just works" on already-existing public virtual columns (presumably because it just computes and returns values for the virtual columns like any other column?), and we currently have the ability to build indexes from non-public columns, then shouldn't your example also work once we get past ADD COLUMN
being unimplemented for virtual columns?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde)
Yes, I believe so. The missing logic there is that we need to change the add column schema change logic to know that it doesn't need to run a backfill on the primary index. I suppose I could just try to pick that up in order to test that this logic works. |
"this logic" meaning what's on master, or with this PR? From what you're saying, it sounds like making the required changes to the add column logic on top of master, even without the changes to index backfill in this PR, will make #58990 (comment) work. |
I don't think so. The table fetcher doesn't deal with computed expressions for columns that aren't public, hence the need for this change. Though, I guess, now that I say it, computed virtual columns could go straight to public, so then you're right. |
Oh, that makes sense. It seems like it'd be beneficial to backfill all indexes on virtual columns in the same way, regardless of whether the column is public. I'm not sure that we'd actually want to make virtual columns public immediately, since it's good to have all the columns become public simultaneously at the end of the schema change, and having (e.g.) Could we modify the table fetcher to handle non-public virtual columns? It doesn't seem that ridiculous, given that the fetcher already handles non-public physical columns. We'd just need to guarantee that all the physical columns have already been backfilled, right? I don't have a good grasp on the details here. |
Another relevant possibility is something like this:
|
It does? I thought it was enforcing the whole thing about the column not being readable. ——— That last possibility doesn’t work because we’ll do the column backfill for the primary index first. |
Summarizing some offline discussion: We'll start off in 21.1 with all added virtual columns becoming public immediately, and disallowing any other schema changes that would be incompatible with this (e.g., adding virtual columns on non-public columns, or adding constraints on virtual columns that need validation). Since adding secondary indexes on public virtual columns is already supported, there's nothing else to be done for that. So this change is actually addressing an entirely separate concern and is pretty specific to the new schema changer itself. |
Actually, maybe not. The table fetcher seems to already handle this, but we need to make sure. |
a9fabad
to
45b7e58
Compare
I fixed this up to work with virtual computed columns. I also added testing. That was the harder part. Github seems to be struggling this morning but when it gets its act together the build should pass now. |
3e1da55
to
167be49
Compare
@adityamaru you should probably have a look at this as well. |
167be49
to
59b52c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly LGTM, but I do worry about testing more cases. It seems like a pain to write these low-level tests, and maybe our best bet for more comprehensive testing is the random schema change tests (both for indexes on virtual computed columns and for the new index backfiller column changes).
Reviewed 9 of 9 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @RaduBerinde)
pkg/sql/indexbackfiller_test.go, line 194 at r2 (raw file):
} type indexBackfillTestCase struct {
nit: I think this and run
should go inside the test function. It is annoying but ultimately I think it's good to have these scoped to the test.
pkg/sql/indexbackfiller_test.go, line 203 at r2 (raw file):
// setupDesc should mutate the descriptor such that the mutation with // id 1.
nit: finish this sentence
pkg/sql/indexbackfiller_test.go, line 209 at r2 (raw file):
} func (test indexBackfillTestCase) run(t *testing.T) {
I think this function would benefit from a few more comments inline in general.
pkg/sql/indexbackfiller_test.go, line 391 at r2 (raw file):
mut.Mutations[len(mut.Mutations)-1].State = descpb.DescriptorMutation_DELETE_AND_WRITE_ONLY // This tests that we can backfill new computed, stored columns into
I think this is a stray comment.
pkg/sql/indexbackfiller_test.go, line 500 at r2 (raw file):
)) require.NoError(t, fetcher.StartScan(ctx, txn, spans, false, 0, true))
I think you'll have a merge conflict with #59208 which updated the fetcher scan arguments.
pkg/sql/backfill/backfill.go, line 409 at r2 (raw file):
addedCols []descpb.ColumnDescriptor // computedCols are the column in this index which are computed and are do
nit: columns
pkg/sql/backfill/backfill.go, line 701 at r2 (raw file):
isPrimaryIndex := idx.GetEncodingType(desc.GetPrimaryIndexID()) == descpb.PrimaryIndexEncoding if (idxContainsColumn || isPrimaryIndex) && !(ib.cols[i].Virtual || i >= len(desc.Columns)) {
nit: I think I'd prefer !ib.cols[i].Virtual && i < len(desc.Columns)
.
I also find this opaque since it relies on the ordering of ib.cols
. Add a comment here explaining what's going on?
pkg/sql/catalog/schemaexpr/computed_column.go, line 172 at r2 (raw file):
// slice of input column descriptors, or nil if none of the input column // descriptors have computed expressions. The caller provides the set of // valid to which the expr may refer.
nit: valid columns
pkg/sql/catalog/schemaexpr/computed_column.go, line 179 at r2 (raw file):
// // Note that the order of columns is critical. Expressions cannot reference // columns that come after them in cols.
nit: Update this comment with the new variable names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other cases you'd like to see tested?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @lucy-zhang, and @RaduBerinde)
pkg/sql/indexbackfiller_test.go, line 194 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit: I think this and
run
should go inside the test function. It is annoying but ultimately I think it's good to have these scoped to the test.
sure, that's where it started.
pkg/sql/indexbackfiller_test.go, line 203 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit: finish this sentence
Done.
pkg/sql/indexbackfiller_test.go, line 209 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
I think this function would benefit from a few more comments inline in general.
Done.
pkg/sql/indexbackfiller_test.go, line 391 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
I think this is a stray comment.
Removed.
pkg/sql/indexbackfiller_test.go, line 500 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
I think you'll have a merge conflict with #59208 which updated the fetcher scan arguments.
resolved.
59b52c7
to
ebc5e1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adityamaru any chance I could get you to sign off on this guy?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @lucy-zhang, and @RaduBerinde)
sorry, giving it a look right now |
LGTM! Those unit tests are really cool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @lucy-zhang)
pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):
} // fetchIndex fetches the contents of an a table index returning the results
I think there would be a lot of value in exposing something like this in SQL via a special statement or a virtual table. It would output a representation of the KVs that we could check for in various tests (including logic tests).
…ller This commit was mostly motivated by work on the new schema changer to enable the behavior described in cockroachdb#47989, however, it also turns out to be a prerequisite of the work to use virtual computed columns in secondary indexes. Given we haven't released virtual computed columns, I'm going to omit a release not for this PR. The basic idea is that we need to track dependencies for computed columns to make sure they are retrieved. Default expressions need to be evaluated first. Much of the code is testing. Release note: None
ebc5e1b
to
5b2198f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @lucy-zhang, and @RaduBerinde)
pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):
Previously, RaduBerinde wrote…
I think there would be a lot of value in exposing something like this in SQL via a special statement or a virtual table. It would output a representation of the KVs that we could check for in various tests (including logic tests).
I agree. I think I'd like to defer that to a later time but I've filed an issue. #59549
pkg/sql/backfill/backfill.go, line 409 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit:
columns
Done.
pkg/sql/backfill/backfill.go, line 701 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit: I think I'd prefer
!ib.cols[i].Virtual && i < len(desc.Columns)
.I also find this opaque since it relies on the ordering of
ib.cols
. Add a comment here explaining what's going on?
Done.
pkg/sql/catalog/schemaexpr/computed_column.go, line 172 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit:
valid columns
Done.
pkg/sql/catalog/schemaexpr/computed_column.go, line 179 at r2 (raw file):
Previously, lucy-zhang (Lucy Zhang) wrote…
nit: Update this comment with the new variable names
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, and @lucy-zhang)
pkg/sql/indexbackfiller_test.go, line 349 at r3 (raw file):
Previously, ajwerner wrote…
I agree. I think I'd like to defer that to a later time but I've filed an issue. #59549
Thanks, yeah I didn't mean in this PR :)
bors r+ |
Build failed (retrying...): |
Build succeeded: |
58979: schemachanger,*: introduce new schema changer r=lucy-zhang a=ajwerner This PR introduces the new schema changer, for which a partially written RFC (including a "guide-level explanation") can be found in #59288. It contains a somewhat complete implementation for a subset of `ALTER TABLE ... ADD COLUMN`. Like the existing implementation, these schema changes are planned and partially executed in the user transaction, but most of the work happens in an asynchronous job that runs after the transaction has committed. We use the index backfiller to build new columns, as described in #47989. See the RFC and individual commits for details. Missing things that will be in later PRs: - Making execution changes to support concurrent writes for columns added using the index backfiller. We currently don't have a way to write column values which are not part of the primary index. See #59149. (#58990 takes care of the backfill part). - Ensuring compatible coexistence with the old schema changer; in particular, polling for queued mutations to finish before any schema changes can start, using child transactions within the user transaction (#56588). The new schema changer assumes it has exclusive rights to perform schema changes on the table, which anticipates the transactional schema changes of the future. - Reverting schema changes when failed or canceled. - Properly checkpointing and reading job progress for the backfill. Currently there's a placeholder implementation that doesn't interact with the job at all. Co-authored-by: Andrew Werner <[email protected]>
81388: authors: add chinemeremchigbo to authors r=lassenordahl a=ChinemeremChigbo authors: add chinemeremchigbo to authors Release note: None Release justification: non-production code change 81460: acceptance: run `python`, `psql` containers as current uid r=knz a=rickystewart `postgres`'s permission checking for certificates has gotten more rigorous since [this commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a59c79564bdc209a5bc7b02d706f0d7352eb82fa). This has broken a couple `acceptance` tests which do not pin to any specific `postgres` version (see #81313, #81437). Here we attempt to solve the problem "once and for all" by ensuring that these containers run with a UID that is equal to the one that created the certificates. Release note: None 81464: ui: Add tests for the TimeScaleDropdown component r=jocrl a=jocrl This commit splits out tests in `timescale.spec.tsx` and adds additional tests for the TimeScaleDropdown component, including testing that the custom time picker is initialized to the currently selected time. This also adds TimeScaleDropdown stories. Release note: None 81486: sql/backfill: fix bug adding new columns to new index with volatile default r=ajwerner a=ajwerner In #58990 we added support for the index backfiller to evaluate expressions. This unblocked the usage of virtual computed columns in secondary expressions, so wasn't a totally bogus change, but probably was trying to do too much without understanding all of the implications. The original motivation for that change was to unblock the then nascent declarative schema changer prototype which wanted to only implement #47989 as a column change protocol. In order to do that, it needed to evaluate default expressions when adding new columns to a new primary index. For whatever reason, probably lack of understanding, I made it such that all default expressions for columns which were in the adding state were evaluated. This is wrong in cases where the column has already been backfilled into the current primary index; it's wrong because volatile expressions will evaluate to different values causing the newly backfilled secondary index to be bogus and corrupt. This change addresses the failure of the above change by being more thoughtful about whether it's sane to evaluate a default expression when backfilling into an index. Note that it's still insane to backfill a volatile expression for a new column into the key of a new primary index. As of writing, there is no way to do this. Backports will address #81448. Release note (bug fix): In 21.1 a bug was introduced whereby default values were recomputed when populating data in new secondary indexes for columns which were added in the same transaction as the index. This arises, for example in cases like `ALTER TABLE t ADD COLUMN f FLOAT8 UNIQUE DEFAULT (random())`. If the default expression is not volatile, then the recomputation is harmless. If, however, the default expression is volatile, the data in the secondary index will not match the data in the primary index: a corrupt index will have been created. This bug has now been fixed. 81504: acceptance/roachtest: update ORM versions under test r=ecwall a=rafiss See DXTPT-35 Release note: None 81523: sql: fix typo in warning on using DateStyle/IntervalStyle r=rafiss a=otan Release note (sql change): Fixed a small typo when using DateStyle and IntervalStyle. Co-authored-by: Chinemerem <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Josephine Lee <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
This commit was mostly motivated by work on the new schema changer to enable
the behavior described in #47989, however, it also turns out to be a
prerequisite of the work to use virtual computed columns in secondary indexes.
Given we haven't released virtual computed columns, I'm going to omit a
release not for this PR.
The basic idea is that we need to track dependencies for computed columns
to make sure they are retrieved. Default expressions need to be evaluated
first. Much of the code is testing.
Release note: None