Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: enable adding columns which are not written to the current primary index #66599

Merged

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Jun 17, 2021

row,schemachanger: use StoreColumnIDs/Names in primary index

In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes #59149.

Release note: None


sql,tabledesc: add new IndexDescriptorVersion for primary indexes

Previously, the IndexDescriptorVersion type was only used to describe
the encoding of secondary indexes. This commit adds a new value for
use in primary indexes, PrimaryIndexWithStoredColumnsVersion, to signify
that the StoredColumnIDs and StoredColumnNames slices are populated
correctly.

Previously, these slices did not need to be populated at all. This is
because the set of columns comprising the primary index of a table is
assumed to be all non-virtual columns of that table. Our upcoming work on
the new schema changer will require us to violate that assumption
however. This commit is in preparation of that change.

In our effort to make meaningful the concept of stored columns in
primary indexes, this commit also changes the contents of the
information_schema.statistics table. As a result, SHOW INDEXES and SHOW
COLUMNS behave the same way regardless of whether an index is primary or
secondary.

Release note (sql change): The contents of the statistics table in the
information schema have changed, therefore so have the results of SHOW
INDEX and SHOW COLUMNS. A column which is not in the primary key will
now be listed as belonging to the primary index as a stored column.
Previously, it was simply not listed as belonging to the primary index.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the primary-index-store-columns-index-format branch 3 times, most recently from 3fae6b5 to 3efb885 Compare June 18, 2021 02:03
@postamar postamar marked this pull request as ready for review June 18, 2021 02:58
@postamar postamar requested a review from a team as a code owner June 18, 2021 02:58
@postamar postamar requested review from a team and removed request for a team June 18, 2021 02:58
@postamar postamar force-pushed the primary-index-store-columns-index-format branch from 3efb885 to d8ebd2b Compare June 21, 2021 13:52
@postamar
Copy link
Contributor Author

Rebased to fix merge conflicts.

@postamar postamar requested a review from ajwerner June 21, 2021 13:52
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just small things. Also looks like you've still got some merge skew.

Reviewed 65 of 68 files at r1, 4 of 12 files at r2, 14 of 14 files at r3, 12 of 12 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/truncate.go, line 226 at r2 (raw file):

Quoted 8 lines of code…
	// Exit early with an error if the table is undergoing a new-style schema
	// change, before we try to get job IDs and update job statuses later. See
	// createOrUpdateSchemaChangeJob.
	if tableDesc.NewSchemaChangeJobID != 0 {
		return pgerror.Newf(pgcode.ObjectNotInPrerequisiteState,
			"cannot perform a schema change on table %q while it is undergoing a new-style schema change",
			tableDesc.GetName(),
		)

Should this get pushed into checkTableForDisallowedMutationsWithTruncate?


pkg/sql/catalog/table_elements.go, line 168 at r2 (raw file):

	GetStoredColumnID(storedColumnOrdinal int) descpb.ColumnID
	GetStoredColumnName(storedColumnOrdinal int) string
	HasOldStoredColumns() bool

I wish this concept (which I understand is not new) was better commented and explained somewhere. There's a small comment in tabledesc but it's not satisfying.


pkg/sql/catalog/tabledesc/structured.go, line 680 at r2 (raw file):

		indexHasOldStoredColumns := idx.HasOldStoredColumns()
		// Need to clear KeySuffixColumnIDs and StoreColumnIDs

Can you use more words to explain what's going on here? I know this existed before but this code is hard to follow.


pkg/sql/catalog/tabledesc/table_desc.go, line 190 at r2 (raw file):

		return true
	}

Should this be asserting something about the format of idx?


pkg/sql/schemachanger/schemachanger_test.go, line 417 at r4 (raw file):

func TestInsertDuringAddColumnNotWritingToCurrentPrimaryIndex(t *testing.T) {
	defer leaktest.AfterTest(t)()
	defer sqltestutils.SetTestJobsAdoptInterval()()

This no longer exists.

@postamar postamar force-pushed the primary-index-store-columns-index-format branch from d8ebd2b to f98f401 Compare June 21, 2021 14:28
@postamar
Copy link
Contributor Author

Thanks for the review. I pushed a fix to the merge skew concurrently to you posting your review. I'll address your comments ASAP.

Copy link
Contributor Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)


pkg/sql/truncate.go, line 226 at r2 (raw file):

Previously, ajwerner wrote…
	// Exit early with an error if the table is undergoing a new-style schema
	// change, before we try to get job IDs and update job statuses later. See
	// createOrUpdateSchemaChangeJob.
	if tableDesc.NewSchemaChangeJobID != 0 {
		return pgerror.Newf(pgcode.ObjectNotInPrerequisiteState,
			"cannot perform a schema change on table %q while it is undergoing a new-style schema change",
			tableDesc.GetName(),
		)

Should this get pushed into checkTableForDisallowedMutationsWithTruncate?

I thought about this and my rationale for not putting it in there was that this test didn't really involve mutations. Also it's new-schemachanger stuff and that function is old-schemachanger stuff, I guessed that it would be better to keep them separate.

I don't feel strongly about any of this so if you prefer I'll push it there.


pkg/sql/catalog/table_elements.go, line 168 at r2 (raw file):

Previously, ajwerner wrote…

I wish this concept (which I understand is not new) was better commented and explained somewhere. There's a small comment in tabledesc but it's not satisfying.

I completely agree. I poked around and found this, which has a bit more substance to it:

// Columns in the primary key that aren't in index.KeyColumnNames or
// index.StoreColumnNames are implicit columns in the index.
var implicitCols map[string]struct{}
var hasImplicitCols bool
if index.HasOldStoredColumns() {
// Old STORING format: implicit columns are extra columns minus stored
// columns.
hasImplicitCols = index.NumKeySuffixColumns() > index.NumSecondaryStoredColumns()
} else {
// New STORING format: implicit columns are extra columns.
hasImplicitCols = index.NumKeySuffixColumns() > 0
}

I'll improve the comment on HasOldStoredColumns in light of this.


pkg/sql/catalog/tabledesc/structured.go, line 680 at r2 (raw file):

Previously, ajwerner wrote…

Can you use more words to explain what's going on here? I know this existed before but this code is hard to follow.

Yes this is almost certainly an omission on my part, sorry.


pkg/sql/catalog/tabledesc/table_desc.go, line 190 at r2 (raw file):

Previously, ajwerner wrote…

Should this be asserting something about the format of idx?

At this point the index is primary and is therefore assumed to be in the upgraded version. Adding an assertion here would be redundant due to on-commit descriptor validation... aaaand I just realized that we don't actually have a validation check for that! I'll address this.


pkg/sql/schemachanger/schemachanger_test.go, line 417 at r4 (raw file):

Previously, ajwerner wrote…

This no longer exists.

Thanks.

@postamar postamar force-pushed the primary-index-store-columns-index-format branch from bc925e8 to 12a8df4 Compare June 21, 2021 18:43
Marius Posta added 2 commits June 22, 2021 09:38
Previously, the IndexDescriptorVersion type was only used to describe
the encoding of secondary indexes. This commit adds a new value for
use in primary indexes, PrimaryIndexWithStoredColumnsVersion, to signify
that the StoredColumnIDs and StoredColumnNames slices are populated
correctly.

Previously, these slices did not need to be populated at all. This is
because the set of columns comprising the primary index of a table is
assumed to be all non-virtual columns of that table. Our upcoming work on
the new schema changer will require us to violate that assumption
however. This commit is in preparation of that change.

In our effort to make meaningful the concept of stored columns in
primary indexes, this commit also changes the contents of the
information_schema.statistics table. As a result, SHOW INDEXES and SHOW
COLUMNS behave the same way regardless of whether an index is primary or
secondary.

Release note (sql change): The contents of the statistics table in the
information schema have changed, therefore so have the results of SHOW
INDEX and SHOW COLUMNS. A column which is not in the primary key will
now be listed as belonging to the primary index as a stored column.
Previously, it was simply not listed as belonging to the primary index.
In a recent commit, the StoreColumnIDs and StoreColumnNames slices in
primary indexes were populated when previously they had simply been
empty. We simply assumed that all non-virtual columns in a table would
be stored in the primary index: primary key columns in the key, the rest
in the value.

This commit breaks that assumption by using the StoreColumnIDs slice to
determine what goes into the primary index. This makes it possible for
the new schema changer to add columns safely, preventing unwanted writes
to the old primary index while the schema change is underway.

Fixes cockroachdb#59149.

Release note: None
@postamar postamar force-pushed the primary-index-store-columns-index-format branch from 12a8df4 to 3dd87a4 Compare June 22, 2021 13:39
@postamar
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 22, 2021

Build succeeded:

@craig craig bot merged commit 2cafdfb into cockroachdb:master Jun 22, 2021
mgartner added a commit to mgartner/cockroach that referenced this pull request Jun 23, 2021
This commit fixes the `sqlsmith.TestGenerateParse` test that has been
flaking since cockroachdb#66599 when primary index stored columns were added to the
output of `SHOW INDEXES`.

sqlsmith uses `SHOW INDEXES` to build an in-memory representation of
indexes to help generate interesting randomized queries. It is not
equipped to handle `rowid` primary indexes, so rows output from
`SHOW INDEXES` with a `column_name` of `rowid` are filtered out. Stored
columns of primary indexes, now included in `SHOW INDEXES`, are not
filtered out, causing sqlsmith to build an in-memory representation of
`rowid` indexes that have only stored columns and no indexed columns.
This breaks the assumption within sqlsmith that each index should have
at least one indexed column, causing panics.

This commit fixes the issue by removing indexes with no indexed columns
from sqlsmith's in-memory index map.

Fixes cockroachdb#66723

Release note: None
craig bot pushed a commit that referenced this pull request Jun 24, 2021
66743: sqlsmith: fix flaky TestGenerateParse r=ajwerner a=mgartner

This commit fixes the `sqlsmith.TestGenerateParse` test that has been
flaking since #66599 when primary index stored columns were added to the
output of `SHOW INDEXES`.

sqlsmith uses `SHOW INDEXES` to build an in-memory representation of
indexes to help generate interesting randomized queries. It is not
equipped to handle `rowid` primary indexes, so rows output from
`SHOW INDEXES` with a `column_name` of `rowid` are filtered out. Stored
columns of primary indexes, now included in `SHOW INDEXES`, are not
filtered out, causing sqlsmith to build an in-memory representation of
`rowid` indexes that have only stored columns and no indexed columns.
This breaks the assumption within sqlsmith that each index should have
at least one indexed column, causing panics.

This commit fixes the issue by removing indexes with no indexed columns
from sqlsmith's in-memory index map.

Fixes #66723

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
pawalt pushed a commit to pawalt/cockroach that referenced this pull request Jul 6, 2021
This commit fixes the `sqlsmith.TestGenerateParse` test that has been
flaking since cockroachdb#66599 when primary index stored columns were added to the
output of `SHOW INDEXES`.

sqlsmith uses `SHOW INDEXES` to build an in-memory representation of
indexes to help generate interesting randomized queries. It is not
equipped to handle `rowid` primary indexes, so rows output from
`SHOW INDEXES` with a `column_name` of `rowid` are filtered out. Stored
columns of primary indexes, now included in `SHOW INDEXES`, are not
filtered out, causing sqlsmith to build an in-memory representation of
`rowid` indexes that have only stored columns and no indexed columns.
This breaks the assumption within sqlsmith that each index should have
at least one indexed column, causing panics.

This commit fixes the issue by removing indexes with no indexed columns
from sqlsmith's in-memory index map.

Fixes cockroachdb#66723

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants