Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNIQUE CONSTRAINT needs the ability to create HASH index to enforce uniqueness #107398

Open
glennfawcett opened this issue Jul 21, 2023 · 1 comment
Labels
A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@glennfawcett
Copy link

glennfawcett commented Jul 21, 2023

Is your feature request related to a problem? Please describe.
ATLER TABLE ADD CONSTRAINT ... UNIQUE...
Creates as UNIQUE INDEX to enforce the constraint. Sometimes this can lead to hot-spotting.

Describe the solution you'd like
Would like to have some SYNTAX to create a HASH index to enforce uniqueness.... something like:

ALTER TABLE ADD CONSTRAINT ... UNIQUE...... USING HASH

Describe alternatives you've considered
As a work-around you can create a UNIQUE HASH index which inadvertently creates a unique constraint based on the HASH index. So, there is a work-around, but it would be nice to have this in the CONSTRAINT syntax.

work-around used:

root@localhost:26257/defaultdb> show create z;
  table_name |                                                         create_statement
-------------+-----------------------------------------------------------------------------------------------------------------------------------
  z          | CREATE TABLE public.z (
             |     id INT8 NOT NULL DEFAULT unique_rowid(),
             |     id2 INT8 NULL,
             |     crdb_internal_id2_shard_16 INT8 NOT VISIBLE NOT NULL AS (mod(fnv32(crdb_internal.datums_to_bytes(id2)), 16:::INT8)) VIRTUAL,
             |     CONSTRAINT z_pkey PRIMARY KEY (id ASC),
             |     UNIQUE INDEX id2_hash_uniq (id2 ASC) USING HASH WITH (bucket_count=16)
             | )
(1 row)

root@localhost:26257/defaultdb> show constraints from z;
  table_name |         constraint_name          | constraint_type |                                            details                                             | validated
-------------+----------------------------------+-----------------+------------------------------------------------------------------------------------------------+------------
  z          | check_crdb_internal_id2_shard_16 | CHECK           | CHECK ((crdb_internal_id2_shard_16 IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))) |     t
  z          | id2_hash_uniq                    | UNIQUE          | UNIQUE (id2 ASC)                                                                               |     t
  z          | z_pkey                           | PRIMARY KEY     | PRIMARY KEY (id ASC)                                                                           |     t
(3 rows)


Time: 13ms total (execution 13ms / network 0ms)

root@localhost:26257/defaultdb> insert into z values (1,1);
INSERT 0 1


Time: 6ms total (execution 6ms / network 0ms)

root@localhost:26257/defaultdb> insert into z values (2,1);
ERROR: duplicate key value violates unique constraint "id2_hash_uniq"
SQLSTATE: 23505
DETAIL: Key (id2)=(1) already exists.
CONSTRAINT: id2_hash_uniq

Jira issue: CRDB-30022

@glennfawcett glennfawcett added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect labels Jul 21, 2023
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Jul 21, 2023
@rafiss
Copy link
Collaborator

rafiss commented Jul 25, 2023

@glennfawcett Is the workaround to use a unique index sufficient, or is there a higher priority reason that we need the ALTER TABLE ADD CONSTRAINT ... UNIQUE...... USING HASH syntax?

mgartner added a commit to mgartner/cockroach that referenced this issue Jan 14, 2025
Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398.

Release note: None
craig bot pushed a commit that referenced this issue Jan 14, 2025
137805: sql/row: use Put instead of CPut when updating value of secondary index r=yuzefovich,stevendanna a=michae2

**sql/row: use Put instead of CPut when updating value of secondary index**

When an UPDATE statement changes the value but not the key of a secondary index (e.g. an update to the stored columns of a secondary index) we need to write a new version of the secondary index KV with the new value.

We were using a CPutAllowingIfNotExists to do this, which verified that if the KV existed, the expected value was the value before update. But there's no need for this verification. We have other mechanisms to detect a write-write conflict with any other transaction that could have changed the value concurrently. We can simply use a Put to overwrite the previous value.

This also matches what we do for the primary index when the PK doesn't change.

Epic: None

Release note: None

---

**sql: change CPutAllowingIfNotExists with nil expValue to CPut**

CPutAllowingIfNotExists with empty expValue is equivalent to CPut with empty expValue, so change a few spots to use regular CPut. This almost gets rid of CPutAllowingIfNotExists entirely, but there is at least one spot in backfill (introduced in #138707) that needs to allow for both a non-empty expValue and non-existence of the KV.

Also drop "(expecting does not exist)" from CPut tracing, as CPut with empty expValue is now overwhelmingly the most common use of CPut. And this matches the tracing in #138707.

Epic: None

Release note: None

138243: changefeedccl: fix PTS test r=stevendanna a=asg0451

Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None

138696: sql: add CHECK EXTERNAL CONNECTION command r=kev-cao a=kev-cao

This patch adds the `CHECK EXTERNAL CONNECTION` command and replaces the old `SHOW BACKUP CONNECTION` syntax.

Epic: None

Release note: None

138740: opt/bench: fix benchmark regression r=mgartner a=mgartner

#### opt/bench: fix benchmark regression

PR #138641 caused extra allocations for plan gist factories in optimizer
benchmarks. These allocations should not be included in benchmark
results, so they have been eliminated.

Release note: None

#### util/base64: use `bytes.Buffer` instead of `strings.Builder` in `Encoder`

For our purposes in base64-encoding plan gists, using `bytes.Buffer` in
`Encoder` causes fewer allocations, presumably because of a more
aggressive growth algorithm.

Epic: None

Release note: None


138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner

`cat.HistogramBuckets` are now returned and passed by value in
`getFilteredBucket` and `(*Histogram).addBucket`, respectively,
eliminating some heap allocations.

Also, two allocations when building spans from buckets via the
`spanBuilder` have been combined into one. The new `(*spanBuilder).init`
method simplifies the API by no longer requiring that prefix datums are
passed to every invocation of `makeSpanFromBucket`. This also reduces
redundant copying of the prefix.

Epic: None

Release note: None


139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner

Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See #65929 and #107398.

Epic: None

Release note: None


139039: ccl/serverccl: revise `TestTenantVars` cpu time checks r=xinhaoz a=xinhaoz

Previously, this test verified that a portion of the test's user cpu time would be less than or equal to the entire tenant user cpu time up to that point. This check is flaky because there's no guarantee that the inactive tenant's cpu time will surpass the test cpu time. We now simply verify that the test cpu times are greater than or equal to the tenant metrics.

The test was likely  passing before 331596c because the reported tenant cpu time was accounting for the sql server prestart. A tenant's user cpu metrics are tracked from the time the `_status/load` endpoint is registered, and the commit above moved the router setup to occur just after the prestart.

Epic: none
Fixes: #119329

Release note: None

Co-authored-by: Michael Erickson <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: Kevin Cao <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
craig bot pushed a commit that referenced this issue Jan 14, 2025
137947: ccl/changeedccl: Add changefeed options into nemesis tests r=wenyihu6 a=aerfrei

This work makes sure our nemesis tests for changefeeds randomize
over the options we use upon changefeed creation. This randomly adds
the key_in_value option (see below) and full_table_name option half
of the time and checks that the changefeed messages respect them in
the beforeAfter validator.

Note the following limitations: the full_table_name option, when on,
asserts that the topic in the output will be d.public.{table_name}
instead of checking for the actual name of the database/schema.

This change also does not add the key_in_value option when for the
webhook and cloudstorage sinks. Even before this change, since
key_in_value is on by default for those sinks, we remove the key
from the value in those testfeed messages for ease of testing.
Unfortunately, this makes these cases hard to test, so we leave them
out for now.

See also: #134119

Epic: [CRDB-42866](https://cockroachlabs.atlassian.net/browse/CRDB-42866)

Release note: None

138243: changefeedccl: fix PTS test r=stevendanna a=asg0451

Fix failing TestPTSRecordProtectsTargetsAndSystemTables test

Fixes: #135639
Fixes: #138066
Fixes: #137885
Fixes: #137505
Fixes: #136396
Fixes: #135805
Fixes: #135639

Release note: None

138697: crosscluster: add crdb_route parameter for LDR and PCR r=jeffswenson a=jeffswenson

The `crdb_route` query parameter determines how the destination
cluster's stream processor connects to the source cluster. There are two
options for the query parameter: "node" and "gateway". Here is an
example of using the route paraemeter to create an external connection
that is usable for LDR or PCR.

```SQL
-- A connection that routes all replication traffic via the configured
-- connection URI.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=gateway'

-- A connection that enumerates nodes in the source cluster and connects
-- directly to nodes.
CREATE EXTERNAL CONNECTION 'external://source-db' AS
'postgresql://user:[email protected]:26257/sslmode=verify-full&crdb_route=node'
```

The "node" option is the original and default behavior. The "node"
option requires the source and destination clusters to be in the same IP
network. The way it works is the connection string supplied to LDR and
PCR is used to connect to the source cluster and generate a physical sql
plan for the replication. The physical plan includes the
`--sql-addvertise-addr` for nodes in the source cluster and processors
in the destination cluster connect directly to the nodes. Using the
"node" routing is ideal because there are no extra network hops and the
source cluster can control how load is distributed across its nodes.

The "gateway" option is a new option that is introduced in order to
support routing PCR and LDR over a load balancer. When specified, the
destination cluster ignores the node addresses returned by the physical
plan and instead opens a connection for each processor to the URI
supplied by the user. This introduces an extra network hop and does not
distribute load as evenly, but it works in deployments where the source
cluster is only reachable over a load balancer.

Routing over a load balancer only requires changing the destination
clusters behavior. Nodes in the source cluster were always implemented to
act as a gateway and serve rangefeeds that are backed by data stored on
different nodes. This support exists so that the cross cluster
replication does not need to re-plan every time a range moves to a
different node.

Release note (sql change): LDR and PCR may use the `crdb_route=gateway`
query option to route the replication streams over a load balancer.

Epic: [CRDB-40896](https://cockroachlabs.atlassian.net/browse/CRDB-40896)

138877: opt: reduce allocations when filtering histogram buckets r=mgartner a=mgartner

`cat.HistogramBuckets` are now returned and passed by value in
`getFilteredBucket` and `(*Histogram).addBucket`, respectively,
eliminating some heap allocations.

Also, two allocations when building spans from buckets via the
`spanBuilder` have been combined into one. The new `(*spanBuilder).init`
method simplifies the API by no longer requiring that prefix datums are
passed to every invocation of `makeSpanFromBucket`. This also reduces
redundant copying of the prefix.

Epic: None

Release note: None


139029: sql/logictest: disable column family mutations in some cases r=mgartner a=mgartner

Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See #65929 and #107398.

Epic: None

Release note: None


139036: testutils,kvserver: add StartExecTrace and adopt in TestPromoteNonVoterInAddVoter r=tbg a=tbg

Every now and then we end up with tests that fail every once in a blue moon, and we can't reproduce at will.
#138864 was one of them, and execution traces helped a great deal.

This PR introduces a helper for unit tests that execution traces the test and keeps the trace on failure, and adopts it for one of these pesky unit tests.

The trace contains the goroutine ID in the filename. Additionally, the test's main goroutine is marked via a trace region. Sample below:

<img width="1226" alt="image" src="https://github.com/user-attachments/assets/3f641c28-64f7-4fba-9267-ddd48d8dda03" />

Closes #134383.

Epic: None
Release note: None


Co-authored-by: Aerin Freilich <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
Co-authored-by: Jeff Swenson <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
InManuBytes pushed a commit to InManuBytes/cockroach that referenced this issue Jan 15, 2025
Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398.

Release note: None
kvoli pushed a commit to kvoli/cockroach that referenced this issue Jan 16, 2025
Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398.

Release note: None
mohini-crl pushed a commit to mohini-crl/cockroach that referenced this issue Jan 17, 2025
Random column family mutations are now disabled for `CREATE TABLE`
statements with unique, hash-sharded indexes. This prevents the AST
from being reserialized with a `UNIQUE` constraint with invalid options,
instead of the original `UNIQUE INDEX`. See cockroachdb#65929 and cockroachdb#107398.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-syntax Issues strictly related to the SQL grammar, with no semantic aspect C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

2 participants