UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

glennfawcett · 2023-07-21T21:52:48Z

Is your feature request related to a problem? Please describe.
ATLER TABLE ADD CONSTRAINT ... UNIQUE...
Creates as UNIQUE INDEX to enforce the constraint. Sometimes this can lead to hot-spotting.

Describe the solution you'd like
Would like to have some SYNTAX to create a HASH index to enforce uniqueness.... something like:

ALTER TABLE ADD CONSTRAINT ... UNIQUE...... USING HASH

Describe alternatives you've considered
As a work-around you can create a UNIQUE HASH index which inadvertently creates a unique constraint based on the HASH index. So, there is a work-around, but it would be nice to have this in the CONSTRAINT syntax.

work-around used:

root@localhost:26257/defaultdb> show create z;
  table_name |                                                         create_statement
-------------+-----------------------------------------------------------------------------------------------------------------------------------
  z          | CREATE TABLE public.z (
             |     id INT8 NOT NULL DEFAULT unique_rowid(),
             |     id2 INT8 NULL,
             |     crdb_internal_id2_shard_16 INT8 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(id2)), 16:::INT8)) VIRTUAL,
             |     CONSTRAINT z_pkey PRIMARY KEY (id ASC),
             |     UNIQUE INDEX id2_hash_uniq (id2 ASC) USING HASH WITH (bucket_count=16)
             | )
(1 row)

root@localhost:26257/defaultdb> show constraints from z;
  table_name |         constraint_name          | constraint_type |                                            details                                             | validated
-------------+----------------------------------+-----------------+------------------------------------------------------------------------------------------------+------------
  z          | check_crdb_internal_id2_shard_16 | CHECK           | CHECK ((crdb_internal_id2_shard_16 IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))) |     t
  z          | id2_hash_uniq                    | UNIQUE          | UNIQUE (id2 ASC)                                                                               |     t
  z          | z_pkey                           | PRIMARY KEY     | PRIMARY KEY (id ASC)                                                                           |     t
(3 rows)


Time: 13ms total (execution 13ms / network 0ms)

root@localhost:26257/defaultdb> insert into z values (1,1);
INSERT 0 1


Time: 6ms total (execution 6ms / network 0ms)

root@localhost:26257/defaultdb> insert into z values (2,1);
ERROR: duplicate key value violates unique constraint "id2_hash_uniq"
SQLSTATE: 23505
DETAIL: Key (id2)=(1) already exists.
CONSTRAINT: id2_hash_uniq

Jira issue: CRDB-30022

The text was updated successfully, but these errors were encountered:

rafiss · 2023-07-25T18:43:04Z

@glennfawcett Is the workaround to use a unique index sufficient, or is there a higher priority reason that we need the ALTER TABLE ADD CONSTRAINT ... UNIQUE...... USING HASH syntax?

Random column family mutations are now disabled for `CREATE TABLE` statements with unique, hash-sharded indexes. This prevents the AST from being reserialized with a `UNIQUE` constraint with invalid options, instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398. Release note: None

137805: sql/row: use Put instead of CPut when updating value of secondary index r=yuzefovich,stevendanna a=michae2 **sql/row: use Put instead of CPut when updating value of secondary index** When an UPDATE statement changes the value but not the key of a secondary index (e.g. an update to the stored columns of a secondary index) we need to write a new version of the secondary index KV with the new value. We were using a CPutAllowingIfNotExists to do this, which verified that if the KV existed, the expected value was the value before update. But there's no need for this verification. We have other mechanisms to detect a write-write conflict with any other transaction that could have changed the value concurrently. We can simply use a Put to overwrite the previous value. This also matches what we do for the primary index when the PK doesn't change. Epic: None Release note: None --- **sql: change CPutAllowingIfNotExists with nil expValue to CPut** CPutAllowingIfNotExists with empty expValue is equivalent to CPut with empty expValue, so change a few spots to use regular CPut. This almost gets rid of CPutAllowingIfNotExists entirely, but there is at least one spot in backfill (introduced in #138707) that needs to allow for both a non-empty expValue and non-existence of the KV. Also drop "(expecting does not exist)" from CPut tracing, as CPut with empty expValue is now overwhelmingly the most common use of CPut. And this matches the tracing in #138707. Epic: None Release note: None 138243: changefeedccl: fix PTS test r=stevendanna a=asg0451 Fix failing TestPTSRecordProtectsTargetsAndSystemTables test Fixes: #135639 Fixes: #138066 Fixes: #137885 Fixes: #137505 Fixes: #136396 Fixes: #135805 Fixes: #135639 Release note: None 138696: sql: add CHECK EXTERNAL CONNECTION command r=kev-cao a=kev-cao This patch adds the `CHECK EXTERNAL CONNECTION` command and replaces the old `SHOW BACKUP CONNECTION` syntax. Epic: None Release note: None 138740: opt/bench: fix benchmark regression r=mgartner a=mgartner #### opt/bench: fix benchmark regression PR #138641 caused extra allocations for plan gist factories in optimizer benchmarks. These allocations should not be included in benchmark results, so they have been eliminated. Release note: None #### util/base64: use `bytes.Buffer` instead of `strings.Builder` in `Encoder` For our purposes in base64-encoding plan gists, using `bytes.Buffer` in `Encoder` causes fewer allocations, presumably because of a more aggressive growth algorithm. Epic: None Release note: None 138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner `cat.HistogramBuckets` are now returned and passed by value in `getFilteredBucket` and `(*Histogram).addBucket`, respectively, eliminating some heap allocations. Also, two allocations when building spans from buckets via the `spanBuilder` have been combined into one. The new `(*spanBuilder).init` method simplifies the API by no longer requiring that prefix datums are passed to every invocation of `makeSpanFromBucket`. This also reduces redundant copying of the prefix. Epic: None Release note: None 139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner Random column family mutations are now disabled for `CREATE TABLE` statements with unique, hash-sharded indexes. This prevents the AST from being reserialized with a `UNIQUE` constraint with invalid options, instead of the original `UNIQUE INDEX`. See #65929 and #107398. Epic: None Release note: None 139039: ccl/serverccl: revise `TestTenantVars` cpu time checks r=xinhaoz a=xinhaoz Previously, this test verified that a portion of the test's user cpu time would be less than or equal to the entire tenant user cpu time up to that point. This check is flaky because there's no guarantee that the inactive tenant's cpu time will surpass the test cpu time. We now simply verify that the test cpu times are greater than or equal to the tenant metrics. The test was likely passing before 331596c because the reported tenant cpu time was accounting for the sql server prestart. A tenant's user cpu metrics are tracked from the time the `_status/load` endpoint is registered, and the commit above moved the router setup to occur just after the prestart. Epic: none Fixes: #119329 Release note: None Co-authored-by: Michael Erickson <[email protected]> Co-authored-by: Miles Frankel <[email protected]> Co-authored-by: Kevin Cao <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>

137947: ccl/changeedccl: Add changefeed options into nemesis tests r=wenyihu6 a=aerfrei This work makes sure our nemesis tests for changefeeds randomize over the options we use upon changefeed creation. This randomly adds the key_in_value option (see below) and full_table_name option half of the time and checks that the changefeed messages respect them in the beforeAfter validator. Note the following limitations: the full_table_name option, when on, asserts that the topic in the output will be d.public.{table_name} instead of checking for the actual name of the database/schema. This change also does not add the key_in_value option when for the webhook and cloudstorage sinks. Even before this change, since key_in_value is on by default for those sinks, we remove the key from the value in those testfeed messages for ease of testing. Unfortunately, this makes these cases hard to test, so we leave them out for now. See also: #134119 Epic: [CRDB-42866](https://cockroachlabs.atlassian.net/browse/CRDB-42866) Release note: None 138243: changefeedccl: fix PTS test r=stevendanna a=asg0451 Fix failing TestPTSRecordProtectsTargetsAndSystemTables test Fixes: #135639 Fixes: #138066 Fixes: #137885 Fixes: #137505 Fixes: #136396 Fixes: #135805 Fixes: #135639 Release note: None 138697: crosscluster: add crdb_route parameter for LDR and PCR r=jeffswenson a=jeffswenson The `crdb_route` query parameter determines how the destination cluster's stream processor connects to the source cluster. There are two options for the query parameter: "node" and "gateway". Here is an example of using the route paraemeter to create an external connection that is usable for LDR or PCR. ```SQL -- A connection that routes all replication traffic via the configured -- connection URI. CREATE EXTERNAL CONNECTION 'external://source-db' AS 'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=gateway' -- A connection that enumerates nodes in the source cluster and connects -- directly to nodes. CREATE EXTERNAL CONNECTION 'external://source-db' AS 'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=node' ``` The "node" option is the original and default behavior. The "node" option requires the source and destination clusters to be in the same IP network. The way it works is the connection string supplied to LDR and PCR is used to connect to the source cluster and generate a physical sql plan for the replication. The physical plan includes the `--sql-addvertise-addr` for nodes in the source cluster and processors in the destination cluster connect directly to the nodes. Using the "node" routing is ideal because there are no extra network hops and the source cluster can control how load is distributed across its nodes. The "gateway" option is a new option that is introduced in order to support routing PCR and LDR over a load balancer. When specified, the destination cluster ignores the node addresses returned by the physical plan and instead opens a connection for each processor to the URI supplied by the user. This introduces an extra network hop and does not distribute load as evenly, but it works in deployments where the source cluster is only reachable over a load balancer. Routing over a load balancer only requires changing the destination clusters behavior. Nodes in the source cluster were always implemented to act as a gateway and serve rangefeeds that are backed by data stored on different nodes. This support exists so that the cross cluster replication does not need to re-plan every time a range moves to a different node. Release note (sql change): LDR and PCR may use the `crdb_route=gateway` query option to route the replication streams over a load balancer. Epic: [CRDB-40896](https://cockroachlabs.atlassian.net/browse/CRDB-40896) 138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner `cat.HistogramBuckets` are now returned and passed by value in `getFilteredBucket` and `(*Histogram).addBucket`, respectively, eliminating some heap allocations. Also, two allocations when building spans from buckets via the `spanBuilder` have been combined into one. The new `(*spanBuilder).init` method simplifies the API by no longer requiring that prefix datums are passed to every invocation of `makeSpanFromBucket`. This also reduces redundant copying of the prefix. Epic: None Release note: None 139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner Random column family mutations are now disabled for `CREATE TABLE` statements with unique, hash-sharded indexes. This prevents the AST from being reserialized with a `UNIQUE` constraint with invalid options, instead of the original `UNIQUE INDEX`. See #65929 and #107398. Epic: None Release note: None 139036: testutils,kvserver: add StartExecTrace and adopt in TestPromoteNonVoterInAddVoter r=tbg a=tbg Every now and then we end up with tests that fail every once in a blue moon, and we can't reproduce at will. #138864 was one of them, and execution traces helped a great deal. This PR introduces a helper for unit tests that execution traces the test and keeps the trace on failure, and adopts it for one of these pesky unit tests. The trace contains the goroutine ID in the filename. Additionally, the test's main goroutine is marked via a trace region. Sample below: <img width="1226" alt="image" src="https://github.com/user-attachments/assets/3f641c28-64f7-4fba-9267-ddd48d8dda03" /> Closes #134383. Epic: None Release note: None Co-authored-by: Aerin Freilich <[email protected]> Co-authored-by: Miles Frankel <[email protected]> Co-authored-by: Jeff Swenson <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Tobias Grieger <[email protected]>

Random column family mutations are now disabled for `CREATE TABLE` statements with unique, hash-sharded indexes. This prevents the AST from being reserialized with a `UNIQUE` constraint with invalid options, instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398. Release note: None

glennfawcett added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect labels Jul 21, 2023

blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Jul 21, 2023

mgartner mentioned this issue Jan 14, 2025

sql/logictest: disable column family mutations in some cases #139029

Merged

mohini-crl mentioned this issue Jan 17, 2025

[Replicated] sql/logictest: disable column family mutations in some cases mohini-crl/cockroach#177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

glennfawcett commented Jul 21, 2023 •

edited by cockroach-jira-scripts

Loading

rafiss commented Jul 25, 2023

UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

Comments

glennfawcett commented Jul 21, 2023 • edited by cockroach-jira-scripts Loading

rafiss commented Jul 25, 2023

glennfawcett commented Jul 21, 2023 •

edited by cockroach-jira-scripts

Loading