Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: remove dead interleave code #70618

Merged
merged 2 commits into from
Nov 4, 2021

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Sep 23, 2021

sql: remove dead interleave code

In the previous release, we disabled interleaved tables and indexes.
Therefore, at this point in time, all code pertaining to this feature is
effectively dead. This commit removes a lot of it by removing the
catalog methods and their callers.

The remaining dead code is now located in the SQL grammar definition and
the plan nodes, and is removed in the subsequent commit.

Release note: None


sql: remove INTERLEAVE IN PARENT from grammar

This commit follows up on the previous one by removing INTERLEAVE
clauses from the SQL grammar itself.

Release note (sql change): Removed INTERLEAVE IN PARENT.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the schema-remove-interleave branch 3 times, most recently from cac6d53 to 649dc18 Compare September 23, 2021 17:01
@postamar postamar changed the title [WIP] sql: remove most dead interleave code [DNM] sql: remove most dead interleave code Sep 23, 2021
@postamar postamar force-pushed the schema-remove-interleave branch 2 times, most recently from dd50ce7 to 54da9ee Compare September 23, 2021 21:04
@postamar
Copy link
Contributor Author

This is not ready to merge, but it's ready to review, if anyone dares. One aspect of my change I'm not too sure of is if I was supposed to do something in particular to deprecate the cluster settings instead of outright removing those which I removed.

I appreciate that this diff is hard to review. I've broken it down into two commits but the substance of the change is in the first one. There's a lot of noise in there, but there are also test cases which I outright removed because I felt they no longer made sense, that kind of thing. What I'm trying to say is that this should probably be reviewed quite carefully despite the noise. I tried to break the change down further but everything is quite intertwined.

@postamar postamar marked this pull request as ready for review September 23, 2021 21:09
@postamar postamar requested a review from a team September 23, 2021 21:09
@postamar postamar requested review from a team as code owners September 23, 2021 21:09
@postamar postamar requested a review from a team September 23, 2021 21:09
@postamar postamar requested a review from a team as a code owner September 23, 2021 21:09
@postamar postamar requested a review from a team September 23, 2021 21:09
@postamar postamar requested review from a team as code owners September 23, 2021 21:09
@postamar postamar requested review from adityamaru and tbg and removed request for a team September 23, 2021 21:09
@yuzefovich
Copy link
Member

I have #70239 that removes the interleaved logic from the fetchers - it'd be probably good to rebase this PR on top of that one since #70239 has already been approved (I was thinking of waiting till 21.2.0 is out, just in case we need to backport something).

@postamar
Copy link
Contributor Author

Excellent! I'll do just that, once the release is done. I may even hold off until your PR is merged, depending on circumstances.

@postamar postamar added the do-not-merge bors won't merge a PR with this label. label Oct 5, 2021
@postamar postamar marked this pull request as draft October 26, 2021 18:14
@postamar postamar changed the title [DNM] sql: remove most dead interleave code sql: remove dead interleave code Oct 26, 2021
@postamar postamar force-pushed the schema-remove-interleave branch from 54da9ee to 8b900d8 Compare October 26, 2021 18:17
@postamar postamar removed the do-not-merge bors won't merge a PR with this label. label Oct 26, 2021
@yuzefovich
Copy link
Member

One aspect of my change I'm not too sure of is if I was supposed to do something in particular to deprecate the cluster settings instead of outright removing those which I removed.

All cluster settings that are being removed should be added to retiredSettings in setting/registry.go.

@postamar
Copy link
Contributor Author

Thanks @yuzefovich! Will do.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! It's exciting to see all this code go away.

Reviewed 110 of 111 files at r1, 48 of 48 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @postamar, and @tbg)


docs/generated/sql/bnf/BUILD.bazel, line 1 at r2 (raw file):

# There should be an entry in this list for every .bnf in this directory (omit

Seems like this changed more than necessary, what was the reason?


pkg/ccl/backupccl/backupresolver/targets.go, line 635 at r1 (raw file):

	}

	sort.Slice(matched.Descs, func(i, j int) bool { return matched.Descs[i].GetID() < matched.Descs[j].GetID() })

Do we still need to sort here? The old comment implied the sort was only needed because of the interleaved tables.


pkg/clusterversion/cockroach_versions.go, line 394 at r1 (raw file):

	},
	{
		Key:     PreventNewInterleavedTables,

I don't know anything about removing versions, so I'll defer to someone else to look at this. Maybe @irfansharif or @ajwerner.


pkg/sql/backfill.go, line 859 at r1 (raw file):

	}
	for _, idx := range dropped {
		var resume roachpb.Span

IIUC correctly resume can be removed too - this was introduced only to support dropping of the interleaved indexes, and it'll be always unset. Additionally, the for loop also now seems unnecessary since we always will do just a single iteration.


pkg/sql/crdb_internal.go, line 2419 at r1 (raw file):

	})

func showAlterStatement(

nit: I think this method is called in a single place, should we inline it then?


pkg/sql/show_create.go, line 62 at r1 (raw file):

// TABLE statement used to create the given table.
//
// The names of the tables references by foreign keys are prefixed by their own

nit: s/references/referenced/.


pkg/sql/tablewriter_delete.go, line 67 at r1 (raw file):

// zero resume-span. After a chunk is deleted a new resume-span is returned.
//
// limit is a limit on either the number of keys deleted in the operation.

nit: s/on either the number/on the number/.


pkg/sql/catalog/descriptor.go, line 488 at r1 (raw file):

	GetNextFamilyID() descpb.FamilyID

	// HasColumnBackfillMutation returns whether the table has any queued column

nit: why remove the comment?


pkg/sql/catalog/descpb/structured.proto, line 516 at r1 (raw file):

  // Interleave, if it's not the zero value, describes how this index's data is
  // interleaved into another index's data.
  optional InterleaveDescriptor interleave = 11 [(gogoproto.nullable) = false, deprecated = true];

Can we just delete these two fields and mark the corresponding ordinals as reserved? I'm guessing not because of the backups.


pkg/sql/logictest/testdata/logic_test/pg_builtins, line 541 at r1 (raw file):

indexed_b_d_idx  {2,4}  4  2

# information_schema._pg_numeric_precision and information_schema._pg_numeric_precision_radix

Why is this chunk removed?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 3436 at r1 (raw file):

oid     typname        typnamespace  typowner  typlen  typbyval  typtype

query OTOOIBT colnames

Why is this chunk removed?


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 5420 at r1 (raw file):

regression_70180  NULL

# Testing pg_statistic_ext

ditto


pkg/sql/opt/exec/execbuilder/mutation.go, line 603 at r1 (raw file):

		if maxRows, ok := b.indexConstraintMaxResults(&scan.ScanPrivate, scan.Relational()); ok {
			if maxKeys := maxRows * uint64(tab.FamilyCount()); maxKeys <= row.TableTruncateChunkSize {

nit: an extra new line.


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 62 at r1 (raw file):

    index customer_idx (c_w_id, c_d_id, c_last, c_first),
    foreign key (c_w_id, c_d_id) references district (d_w_id, d_id)
) interleave in parent district (c_w_id, c_d_id)

super nit: seems like these couple of changes belong better in the first commit.


pkg/sql/opt/xform/testdata/rules/join, line 255 at r1 (raw file):

  PRIMARY KEY(pid1, cid1)
)
INTERLEAVE IN PARENT parent1 (pid1)

I wonder whether these tests should be converted to use foreign keys instead of interleaving?


pkg/sql/randgen/schema.go, line 80 at r1 (raw file):

	tables := make([]tree.Statement, num)
	for i := 0; i < num; i++ {
		t := RandCreateTableWithColumnIndexNumberGenerator(rng, prefix, i+1, nil /* generateColumnIndexNumber */)

nit: should we use RandCreateTable here?


pkg/sql/row/fetcher.go, line 212 at r1 (raw file):

	// True if the index key must be decoded.
	// If there is more than one table, the index key must always be decoded.

nit: no need to change this comment since row.Fetcher now supports only a single table.


pkg/sql/rowenc/index_encoding.go, line 506 at r1 (raw file):

	// bookkeeping. We could improve it to n with a prefix tree of components.

	interleaves := append(make([]catalog.Index, 0, len(desc.ActiveIndexes())), desc.ActiveIndexes()...)

I think the logic here could be simplified further (maybe the for loop below can be removed entirely?).


pkg/sql/rowenc/index_encoding.go, line 606 at r1 (raw file):

	// interleavedSentinel is actually next, then this key is for a child
	// table.
	if _, ok := encoding.DecodeIfInterleavedSentinel(key); ok {

I think this can be removed. As a result we could remove DecodeIndexKeyWithoutTableIDIndexIDPrefix entirely and just inline DecodeKeyVals instead.


pkg/sql/rowenc/index_encoding.go, line 1527 at r1 (raw file):

	desc catalog.TableDescriptor, index catalog.Index,
) (signatures [][]byte, err error) {
	// signatures contains the slice reference to the signature of every

nit: the comment needs an update.


pkg/sql/rowenc/index_encoding.go, line 1533 at r1 (raw file):

	// fullSignature is the backing byte slice for each individual signature
	// as it buffers each block of table and index IDs.
	// We eagerly allocate 4 bytes for each of the two IDs per ancestor

ditto


pkg/sql/rowenc/index_encoding.go, line 1554 at r1 (raw file):

// if uncertain, pass in true to 'overestimate' the maxKeyTokens.
//
// In general, a key belonging to an interleaved index grandchild is encoded as:

ditto


pkg/sql/span/span_builder.go, line 87 at r1 (raw file):

	// Set up the interstices for encoding interleaved tables later.
	s.interstices[0] = s.KeyPrefix

I think we can simplify these interstices further. Feel free to leave TODO(yuzefovich) about it.

@tbg tbg requested review from ajwerner and irfansharif October 27, 2021 20:37
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @postamar)


pkg/clusterversion/cockroach_versions.go, line 394 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I don't know anything about removing versions, so I'll defer to someone else to look at this. Maybe @irfansharif or @ajwerner.

In serverless, tenant pods can trail the KV host cluster by one release. So when KV runs 21.2, tenants can still run 21.1. I am wagering a guess that the migrations attached to these versions are entirely on the tenant side, and so things should be fine (a 21.1 tenant can have interleaved tables, but the 21.2 host cluster couldn't care less).

As for removing settings, current master is going to be the 22.1 release and so will only ever be run in cluster versions 21.2 and up. The version here is pre-21.2, so this should be good to remove.

@postamar postamar force-pushed the schema-remove-interleave branch from 8b900d8 to a3c15a6 Compare October 28, 2021 19:52
@postamar postamar marked this pull request as ready for review October 28, 2021 19:53
@postamar postamar force-pushed the schema-remove-interleave branch from a3c15a6 to 4180799 Compare October 28, 2021 20:17
Copy link
Contributor Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking time to review! I bet the excitement of seeing this go away is largely tempered by the drudgery of plowing through 10k line diff.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @yuzefovich)


docs/generated/sql/bnf/BUILD.bazel, line 1 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Seems like this changed more than necessary, what was the reason?

The reason was make bazel-generate followed by a careless git add. Fixed.


pkg/ccl/backupccl/backupresolver/targets.go, line 635 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do we still need to sort here? The old comment implied the sort was only needed because of the interleaved tables.

Agreed. I had a look at the call sites just to be sure: I'm reasonably confident that it's of no use now. Removed.


pkg/clusterversion/cockroach_versions.go, line 394 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

In serverless, tenant pods can trail the KV host cluster by one release. So when KV runs 21.2, tenants can still run 21.1. I am wagering a guess that the migrations attached to these versions are entirely on the tenant side, and so things should be fine (a 21.1 tenant can have interleaved tables, but the 21.2 host cluster couldn't care less).

As for removing settings, current master is going to be the 22.1 release and so will only ever be run in cluster versions 21.2 and up. The version here is pre-21.2, so this should be good to remove.

Correct, these migrations are tenant migrations. This seems to be the case for all SQL-related migrations, from what I can tell.


pkg/sql/backfill.go, line 859 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

IIUC correctly resume can be removed too - this was introduced only to support dropping of the interleaved indexes, and it'll be always unset. Additionally, the for loop also now seems unnecessary since we always will do just a single iteration.

Nice catch! Done.


pkg/sql/crdb_internal.go, line 2419 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think this method is called in a single place, should we inline it then?

The call site is already quite deeply nested, I'll leave it as it is, if you don't mind.


pkg/sql/show_create.go, line 62 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/references/referenced/.

Done.


pkg/sql/tablewriter_delete.go, line 67 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/on either the number/on the number/.

Done.


pkg/sql/catalog/descriptor.go, line 488 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: why remove the comment?

No good reason, this was a mistake. Good catch.


pkg/sql/catalog/descpb/structured.proto, line 516 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Can we just delete these two fields and mark the corresponding ordinals as reserved? I'm guessing not because of the backups.

Yes, it's because of the backups. More generally, we need to check dead fields such as this one during descriptor validation to make sure that it's empty. Consequently nothing in structured.proto can ever be deleted. It's not such a big deal as it sounds, this file doesn't change that much these days.


pkg/sql/logictest/testdata/logic_test/pg_builtins, line 541 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Why is this chunk removed?

No good reason. Good catch.


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 3436 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Why is this chunk removed?

No good reason. Good catch.


pkg/sql/logictest/testdata/logic_test/pg_catalog, line 5420 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Thanks.


pkg/sql/opt/testutils/opttester/testfixtures/tpcc_schema, line 62 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: seems like these couple of changes belong better in the first commit.

I'm not sure I agree, but don't feel too strongly about this. I'll leave it in the current commit at least out of laziness, unless someone has a strong opinion.


pkg/sql/opt/xform/testdata/rules/join, line 255 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I wonder whether these tests should be converted to use foreign keys instead of interleaving?

I'm surprised they weren't broken in the first place, DDLs with INTERLEAVE should trigger errors nowadays.


pkg/sql/randgen/schema.go, line 80 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: should we use RandCreateTable here?

You're right. Done.


pkg/sql/row/fetcher.go, line 212 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: no need to change this comment since row.Fetcher now supports only a single table.

Fixed.


pkg/sql/rowenc/index_encoding.go, line 506 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think the logic here could be simplified further (maybe the for loop below can be removed entirely?).

I think so too. I don't know why I didn't do it. Perhaps I was waiting for your changes? In any case, it's done.


pkg/sql/rowenc/index_encoding.go, line 606 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think this can be removed. As a result we could remove DecodeIndexKeyWithoutTableIDIndexIDPrefix entirely and just inline DecodeKeyVals instead.

Done.


pkg/sql/rowenc/index_encoding.go, line 1527 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: the comment needs an update.

As a matter of fact, this function isn't used anywhere except in tests, thanks to your recent changes. I removed the lot.


pkg/sql/rowenc/index_encoding.go, line 1533 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

See above.


pkg/sql/rowenc/index_encoding.go, line 1554 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Done.


pkg/sql/span/span_builder.go, line 87 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think we can simplify these interstices further. Feel free to leave TODO(yuzefovich) about it.

Done. I don't know what these are, so I don't dare touch them.


pkg/sql/opt/exec/execbuilder/mutation.go, line 603 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: an extra new line.

Fixed.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL

I think last thing is that we need to add the removed cluster setting into retiredSettings map. Otherwise, :lgtm:

Reviewed 1 of 111 files at r1, 76 of 76 files at r3, 49 of 49 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @ajwerner, @irfansharif, and @postamar)


pkg/sql/opt/xform/testdata/rules/join, line 255 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm surprised they weren't broken in the first place, DDLs with INTERLEAVE should trigger errors nowadays.

This test file probably runs through a special optimizer setup, without really executing the queries, so we weren't hitting those errors.


pkg/sql/rowenc/index_encoding.go, line 1527 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

As a matter of fact, this function isn't used anywhere except in tests, thanks to your recent changes. I removed the lot.

👍


pkg/sql/rowenc/index_encoding.go, line 509 at r3 (raw file):

	if tableID != desc.GetID() {
		return 0, nil, errors.Errorf(
			"Unexpected table ID %d, expected %d instead.", tableID, desc.GetID())

super nit: I think we use lower-case first word in the main error message.

@postamar postamar force-pushed the schema-remove-interleave branch from 4180799 to 95653c1 Compare October 29, 2021 13:53
Copy link
Contributor Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot all about retiredSettings, my bad. Done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @ajwerner, @irfansharif, and @yuzefovich)


pkg/sql/opt/xform/testdata/rules/join, line 255 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This test file probably runs through a special optimizer setup, without really executing the queries, so we weren't hitting those errors.

That makes sense.


pkg/sql/rowenc/index_encoding.go, line 509 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: I think we use lower-case first word in the main error message.

I'm never sure about this. Done.

In fact in that case I should remove the period also. I'll push that change at the next opportunity.

@postamar
Copy link
Contributor Author

I'll leave this PR up a few more days to allow for further comments. If there are none I'll go ahead and merge towards the end of next week.

Although I'm confident most of the interleave code will be removed, a quick search tells me that there are still plenty of comments mentioning interleaves. I suppose we'll all just have to edit these gradually over time.

@postamar postamar force-pushed the schema-remove-interleave branch from 95653c1 to bb0da77 Compare November 3, 2021 21:02
Marius Posta added 2 commits November 3, 2021 18:22
In the previous release, we disabled interleaved tables and indexes.
Therefore, at this point in time, all code pertaining to this feature is
effectively dead. This commit removes a lot of it by removing the
catalog methods and their callers.

The remaining dead code is now located in the SQL grammar definition and
the plan nodes, and is removed in the subsequent commit.

Release note: None
This commit follows up on the previous one by removing INTERLEAVE
clauses from the SQL grammar itself.

Release note (sql change): Removed INTERLEAVE IN PARENT.
@postamar postamar force-pushed the schema-remove-interleave branch from bb0da77 to 4c0a5c7 Compare November 3, 2021 22:24
@postamar
Copy link
Contributor Author

postamar commented Nov 3, 2021

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 4, 2021

Build succeeded:

@craig craig bot merged commit 4493953 into cockroachdb:master Nov 4, 2021
@postamar
Copy link
Contributor Author

postamar commented Nov 4, 2021

This is so cathartic. Thanks @yuzefovich !

@postamar postamar deleted the schema-remove-interleave branch November 4, 2021 13:05
@yuzefovich
Copy link
Member

I wonder whether we should close #69150 given that this has been merged?

@postamar
Copy link
Contributor Author

postamar commented Nov 4, 2021

You're right, we should. At the time that I started working on this PR, I didn't intend to remove all remaining interleave code, only that which involved NumAncestors() and other interleave-related catalog descriptor methods, so I didn't issue-link. I since forgot to update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants