Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: check equivalent constraint when creating hash index #74179

Conversation

chengxiong-ruan
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan commented Dec 21, 2021

Fixes #68031

Previously we only try to create constraint for shard column if
it's newly created. We check duplicate constraint for shard
column when Alter Primary Key and Create Index, however the
check is simply a name check. This pr adds logic to check
equivalent constraint by checking if the formatted expression
string is the same. With this logic we can try to create the
constraint no matter if a shard column is newly created or not.
With this fix, we also don't need to expose the constraint through
SHOW CREATE TABLE result since we make sure the constraint is
created or skipped if one already exists.

Release note (sql change): Before this change, the check constraint
on shard column used by hash sharded index was printed in the
corresponding SHOW CREATE TABLE. The constraint had been shown
because cockroach lacked logic to ensure that shard columns which
are part of hash sharded indexs always have the check constraint
which the optimizer relies on to achieve properly optimized plans
on hash sharded indexes. We no longer display this constraint in
SHOW CREATE TABLE as it is now implied by the USING HASH clause
on the relevant index.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch 4 times, most recently from fc13e13 to 0263058 Compare December 22, 2021 04:42
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/create_table.go, line 1645 at r1 (raw file):

				}
				if checkConstraintDesc.Expr == inputCheckConstraintDesc.Expr {
					inputCheckConstraint.Hidden = true

I kept questioning myself why we want to make a user defined constraint hidden.

I think my answer is that we may want to have a consistent SHOW CREATE experience for tables with hash sharded index, and it should be super rare that a super smart user defined a equivalent constraint for a shard column if we don't show the constraint at all >_> But if there would be a super smart user, showing the constraint would give them a chance to drop the constraint, do we want that? Dropping the constraint in this case could be fine since we trust the shard column so much that we skip validation during backfill already. The only thing is how optimizer gonna feel, I think it's also fine since it's user's choice.

However step back a bit I also questioned myself, why do we need all these equivalent constraint check at all. Why can't we just try to create the constraint anyway without checking? In this case, if a smart user defined an exactly same constraint, it would fail due to duplicated constraint. I think it's also an ok experience for user and this whole block of code would be gone.


pkg/sql/show_create_clauses.go, line 566 at r1 (raw file):

) error {
	for _, e := range desc.AllActiveAndInactiveChecks() {
		if e.Hidden {

Let me know if you have a strong feeling that it should be in a separate PR.

@chengxiong-ruan chengxiong-ruan marked this pull request as ready for review December 22, 2021 14:09
@chengxiong-ruan chengxiong-ruan requested review from a team, ajwerner and postamar December 22, 2021 14:09
@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch from 0263058 to 675f6e4 Compare December 22, 2021 19:45
@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch 2 times, most recently from 02f2f77 to 7ac4cb8 Compare January 10, 2022 15:27
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this. Please bear in mind that my review is very superficial. I didn't find anything obviously wrong in this PR. You should probably have someone more knowledgeable take a look. Please bear in mind that it's OK to annoy people for reviews 👍

Reviewed 1 of 8 files at r1, 1 of 4 files at r2, 1 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @chengxiong-ruan)


-- commits, line 2 at r3:
s/quivalent/equivalent/


-- commits, line 16 at r3:
Can you please reword this sentence?


pkg/sql/create_table.go, line 1645 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

I kept questioning myself why we want to make a user defined constraint hidden.

I think my answer is that we may want to have a consistent SHOW CREATE experience for tables with hash sharded index, and it should be super rare that a super smart user defined a equivalent constraint for a shard column if we don't show the constraint at all >_> But if there would be a super smart user, showing the constraint would give them a chance to drop the constraint, do we want that? Dropping the constraint in this case could be fine since we trust the shard column so much that we skip validation during backfill already. The only thing is how optimizer gonna feel, I think it's also fine since it's user's choice.

However step back a bit I also questioned myself, why do we need all these equivalent constraint check at all. Why can't we just try to create the constraint anyway without checking? In this case, if a smart user defined an exactly same constraint, it would fail due to duplicated constraint. I think it's also an ok experience for user and this whole block of code would be gone.

I may be wrong here, and I don't understand any of this too deeply, however I believe that anything that has crdb_internal in its name is assumed to not have been created by a user. This isn't enforced anywhere, but it is a convention.

I believe the goal here is that we want the output of a SHOW CREATE TABLE statement for a hash-sharded table to round-trip correctly. In other words, that executing the resulting CREATE TABLE statement would produce the same table.

I may be completely wrong here.


pkg/sql/create_table.go, line 1623 at r3 (raw file):

		}

		// if there is an equivalent check constraint from the CREATE TABLE (should be rare since we hide the constraint of shard column)

style: comments should wrap around at the 80th character column

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @postamar)


pkg/sql/create_index.go, line 237 at r3 (raw file):

		}
		columns = newColumns
		if err := params.p.maybeSetupConstraintForShard(params.ctx, tableDesc, shardCol, indexDesc.Sharded.ShardBuckets); err != nil {

nit: wrap this after maybeSetupConstraintForShard(

if err := params.p.maybeSetupConstraintForShard(
	params.ctx, tableDesc, shardCol, indexDesc.Sharded.ShardBuckets,
); err != nil {

pkg/sql/create_index.go, line 476 at r3 (raw file):

// setupShardedIndex creates a shard column for the given index descriptor. It
// returns the shard column and the new column list for the index If the shard

nit: missing period. Can you say "is new, it is added to the tableDesc as a mutation in DELETE_ONLY"?


pkg/sql/create_table.go, line 1645 at r1 (raw file):

The only thing is how optimizer gonna feel, I think it's also fine since it's user's choice.

The optimizer is going to feel sad. It absolutely relies on the constraint. If a user drops it, I'm inclined to not care that much. Alternatively, one rather dire thing we could do is fail validation if we're missing the check constraint. If we do that, we should also add code to disallow the user from dropping it.

I believe the goal here is that we want the output of a SHOW CREATE TABLE statement for a hash-sharded table to round-trip correctly. In other words, that executing the resulting CREATE TABLE statement would produce the same table.

This is definitely a goal.


pkg/sql/show_create_clauses.go, line 566 at r1 (raw file):

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Let me know if you have a strong feeling that it should be in a separate PR.

I'm happy with this.

@chengxiong-ruan
Copy link
Contributor Author


pkg/sql/create_table.go, line 1645 at r1 (raw file):

Previously, ajwerner wrote…

The only thing is how optimizer gonna feel, I think it's also fine since it's user's choice.

The optimizer is going to feel sad. It absolutely relies on the constraint. If a user drops it, I'm inclined to not care that much. Alternatively, one rather dire thing we could do is fail validation if we're missing the check constraint. If we do that, we should also add code to disallow the user from dropping it.

I believe the goal here is that we want the output of a SHOW CREATE TABLE statement for a hash-sharded table to round-trip correctly. In other words, that executing the resulting CREATE TABLE statement would produce the same table.

This is definitely a goal.

Thanks @postamar and @ajwerner , yeah I think I'm also inclined to not care too much if a user drop the constraint, I'll remove this line of code to make user defined constraint visible.

@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch from 7ac4cb8 to 0bfea0f Compare January 11, 2022 21:47
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)


-- commits, line 2 at r3:

Previously, postamar (Marius Posta) wrote…

s/quivalent/equivalent/

Done.


-- commits, line 16 at r3:

Previously, postamar (Marius Posta) wrote…

Can you please reword this sentence?

Reworded. Let me know if you're happy with it.


pkg/sql/create_index.go, line 476 at r3 (raw file):

Previously, ajwerner wrote…

nit: missing period. Can you say "is new, it is added to the tableDesc as a mutation in DELETE_ONLY"?

Done. It actually has two different cases.


pkg/sql/create_table.go, line 1623 at r3 (raw file):

Previously, postamar (Marius Posta) wrote…

style: comments should wrap around at the 80th character column

Done, thanks for catching this~

@chengxiong-ruan chengxiong-ruan changed the title sql: check quivalent constraint when creating hash index sql: check equivalent constraint when creating hash index Jan 11, 2022
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @postamar)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 805 at r4 (raw file):

    CONSTRAINT t_pkey PRIMARY KEY (a ASC) USING HASH WITH BUCKET_COUNT = 8,
    FAMILY "primary" (a)
)

I feel like this test is setting out to demonstrate that the table created from show create also gets a good plan. That's nice and what you want, but I think it can be done more elegantly. Also, please comment the intention of the test.

Copy the statement to a variable

let $create_statement
SELECT create_statement FROM [SHOW CREATE TABLE t]

Use the statement

statement ok
$create_statement

Code quote:


SELECT @2 FROM [SHOW CREATE TABLE t]
----
CREATE TABLE public.t (
    crdb_internal_a_shard_8 INT4 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(a)), 8:::INT8)) VIRTUAL,
    a INT8 NOT NULL,
    CONSTRAINT t_pkey PRIMARY KEY (a ASC) USING HASH WITH BUCKET_COUNT = 8,
    FAMILY "primary" (a)
)

@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch from 0bfea0f to 8755407 Compare January 11, 2022 23:42
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @postamar)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 805 at r4 (raw file):

Previously, ajwerner wrote…

I feel like this test is setting out to demonstrate that the table created from show create also gets a good plan. That's nice and what you want, but I think it can be done more elegantly. Also, please comment the intention of the test.

Copy the statement to a variable

let $create_statement
SELECT create_statement FROM [SHOW CREATE TABLE t]

Use the statement

statement ok
$create_statement

nice, thanks for suggesting that

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 8 files at r1, 1 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @chengxiong-ruan, and @postamar)


-- commits, line 16 at r3:

Previously, chengxiong-ruan (Chengxiong Ruan) wrote…

Reworded. Let me know if you're happy with it.

I still think the docs writers will struggle with this. Consider:

  Release note (sql change): Before this change, the check constraint on shard
  columns used by hash sharded index was printed in the corresponding
  `SHOW CREATE TABLE`. The constraint had been shown because cockroach
  lacked logic to ensure that such a constraint always exists for shard columns.
  The database now ensures that shard columns which are part of hash sharded
  indexes always have the corresponding check constraint which the optimizer relies
  on to achieve properly optimized query plans on hash sharded indexes. We no
  longer display this constraint in `SHOW CREATE TABLE` as it is now implied by
  the `USING HASH` clause on the relevant index.

-- commits, line 11 at r5:
Grammar nits:

constraint no matter if a shard column is newly created or not. With

Code quote:

  string is the same. With this logic we can try to create the
  constraint no matter a shard column is newly created or not. With

-- commits, line 15 at r5:
created, or skipped if one already exists

Fixes cockroachdb#68031

Previously we only try to create constraint for shard column if
it's newly created. We check duplicate constraint for shard
column when `Alter Primary Key` and `Create Index`, however the
check is simply a name check. This pr adds logic to check
equivalent constraint by checking if the formatted expression
string is the same. With this logic we can try to create the
constraint no matter if a shard column is newly created or not.
With this fix, we also don't need to expose the constraint through
`SHOW CREATE TABLE` result since we make sure the constraint is
created or skipped if one already exists.

Release note (sql change): Before this change, the check constraint
on shard column used by hash sharded index was printed in the
corresponding `SHOW CREATE TABLE`. The constraint had been shown
because cockroach lacked logic to ensure that shard columns which
are part of hash sharded indexs always have the check constraint
which the optimizer relies on to achieve properly optimized plans
on hash sharded indexes. We no longer display this constraint in
`SHOW CREATE TABLE` as it is now implied by the `USING HASH` clause
on the relevant index.
@chengxiong-ruan chengxiong-ruan force-pushed the check-equivalent-shard-col-constraint branch from 8755407 to 053d361 Compare January 12, 2022 19:03
Copy link
Contributor Author

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @postamar)


-- commits, line 16 at r3:

Previously, ajwerner wrote…

I still think the docs writers will struggle with this. Consider:

  Release note (sql change): Before this change, the check constraint on shard
  columns used by hash sharded index was printed in the corresponding
  `SHOW CREATE TABLE`. The constraint had been shown because cockroach
  lacked logic to ensure that such a constraint always exists for shard columns.
  The database now ensures that shard columns which are part of hash sharded
  indexes always have the corresponding check constraint which the optimizer relies
  on to achieve properly optimized query plans on hash sharded indexes. We no
  longer display this constraint in `SHOW CREATE TABLE` as it is now implied by
  the `USING HASH` clause on the relevant index.

Thank you!

@chengxiong-ruan
Copy link
Contributor Author

bors r+

@chengxiong-ruan
Copy link
Contributor Author

TFTR!

@craig
Copy link
Contributor

craig bot commented Jan 12, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 12, 2022

Build succeeded:

@craig craig bot merged commit a984bb3 into cockroachdb:master Jan 12, 2022
@chengxiong-ruan chengxiong-ruan deleted the check-equivalent-shard-col-constraint branch January 18, 2022 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: hash sharded column shows up in CREATE TABLE, interferes with proper sharded index creation
4 participants