-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: Support partitioned uniqueness checks #41535
Comments
I really like this idea. How does one address this if they don't get it right the first time? Or if they are already in production? I assume that its equivalent to needing to change the primary key and will need to be considered for that work @solongordon |
That's an easy one: ALTER TABLE t
ALTER CONSTRAINT x_unique UNIQUE VIRTUAL;
DROP INDEX t@t_x_unique; |
So I'm all for the feature, but I think the keyword "VIRTUAL" is misguided. See at the bottom for the alternative proposal. The way I think about this is to remind myself that uniqueness is treated by both pg and the SQL standard as a constraint, like CHECK and foreign key relationships. Constraints are really stored in a table's metadata separately from the columns. Constraints have two things of interest in pg's world, that crdb is part of:
In fact, pg also considers UNIQUE in CREATE to implicitly declare a constraint. It's not a property of the column; it's certainly not just an implicit unique index declaration: it's an entry in the constraint list, which happens to also be supported by an index. I'll remind you that this syntax is canonical: CREATE TABLE t(
x INT,
CONSTRAINT x_unique UNIQUE (x)
); and that With this in mind, the feature requested at the top is not "virtual uniqueness". There is absolutely nothing virtual about this. It's simply a constraint declaration without a supporting index ("indexless uniqueness check"). The "new" thing here is that indexless-ness in pg, when supported, is opt-out (the app must add the index for constraints without one, if it wants them), so there is no syntax to force it. However we do have already two perfect keywords for "I don't want an index": Hence: CREATE TABLE t(
x INT,
CONSTRAINT x_unique UNIQUE WITHOUT INDEX (x)
); (possibly aliased to |
PG already has |
Here you are: although it seems analogous there's a key distinction: in the distinction virtual/stored, the data either exists or does not exist on disk, in the physical world. This appeals to the fundamental meaning of the word "virtual" in English where there is no physical reality. With uniqueness checking, in either case the uniqueness is not a "thing" that exists. It is not "data"—it is a property of other data. Think about it: if you create a table with a unique constraint, then drop the constraint and the supporting index without updating (or adding) rows, did the column group become magically non-unique? Arguably, given that uniqueness is an emergent property of a dataset, we could say that it is always virtual regardless of whether there is a supporting index or even a constraint declaration. To come back to the proposal, let's remember that constraint declarations are primarily a specification about which checks to performs during SQL mutations (the exploitation of FK declarations for planning read-only queries was, historically, an unforeseen late addition). They are specifications for mutation mechanisms just as much as—and, in my opinion, even more than—descriptions of the properties of relationships between rows/tables. In that light, "WITHOUT INDEX" is more informative of what is going on. |
So to make the syntax proposal even more concrete:
Where STORING, INTERLEAVE, PARTITION BY, etc. are not allowed, because they only make sense with a real backing index. Also, it'd just be a simple name list, since ASC/DESC don't make sense. |
Something like that, yes. There are two cases to consider: the For the first case it's rather straightforward and I'd do this as follows:
For the second case there's a trade-off. You can either add a Or you can split
My opinion would be to go with the 1 AST node (and tell users in doc "ASC and DESC are meaningless and will be ignored") until we have enough experimental evidence that the other solution is preferable. |
I missed the initial discussion of this, but am in strong support of the idea. It seems like the natural mechanism to enforce uniqueness across a geopartitioned table without violating the constraint that all of a row's datums are "homed" within a single region. This is an important invariant that will help us provide much stronger latency guarantees than we would otherwise be able to. For instance, it allows us to guarantee that DELETEing a local row will always be a local operation (mod fk checks) and that an INSERT of a row will always write to the local region, even if it needs to read from other regions. It also keeps the door open for data domiciling and datum rehoming in the future. However, I'll echo @knz's thoughts from #41535 (comment). I'm not convinced that we need to introduce new syntax for this functionality, or even that it needs to be exposed to users at all. Instead, I think this is how we'll want to implement all UNIQUE constraints in REGIONAL tables (including the primary key constraint [1]). This fits with the model to prefix all of a REGIONAL table's indexes with the @rytaft, @RaduBerinde, and I just met to discuss this and came to an interesting realization. It turns out that the idea to "look for a row in the local region and fall back to a fanout query to all other regions" (do we have a name for this? "locality-optimized search"?) depends directly on this functionality. Without a UNIQUE constraint on the logical primary key column(s), the optimizer can't conclude that a "hit" in the local region is sufficient to skip the remote fanout. Such behavior relies on uniqueness being enforced for the logical primary key column itself. [1] though we will still explore a data type that can optimize away the uniqueness query. |
On the syntax, I just want to point out that the way to create a unique index is with |
I agree. We should absoultely create an index for each unique constraint like we currently do – this index should just be partitioned on the region column and enforced using a fan-out query across the closed set of possible regions. (It's a little tough for me to follow what's still being considered here, maybe we're all in agreement already with this) |
Random thought - have we ever considered optimizing the solution to this problem through the use of bloom filters? The thinking is that in addition to the index we'd have a set of bloom filters on the uniqueness column. While the actual data has sovereignty to a given region, it may be acceptable to replicate the bloom filters (which never give precise information as to the underlying data). Then, with the bloom filters in place, and replicated using non-voting replicas, we could perform uniqueness checking in the vast majority of cases without going cross-region (on the assumption that in most cases, uniqueness will not be violated and the bloom filter is large enough to reliably tell us that). There are still questions around whether or not the bloom filter replication will be acceptable, and whether or not its writing and replication would be justified to save the cost of the cross-region uniqueness check, but if those work out, I could see this significantly improving transaction times. |
This is interesting. I don't think I fully understand what we get from the bloom filter if it's being pushed asynchronously to non-voting replicas. Even if the bloom filter avoided all false positives, it seems like it would still have false negatives as an artifact of it being accessed inconsistently. If we're ok with uniqueness violations due to such inconsistencies then we could probably just perform the full partitioned uniqueness check on local non-voting replicas, since we expect to have a full copy of the data in every region. But I don't think we want to get in the business of allowing uniqueness violations. It's possible that I'm missing something here. |
After talking this over with @nvanbenschoten, I'm not sure bloom filters would help here, as we'd get a potential savings on the uniqueness check read path, at the expense of additional latency on the write path (to replicate the bloom filters and ensure consistency). Probably not the best trade-off. |
I pulled the locality optimized search portion of this into a separate issue: #55185. So this issue can remain focused on the creation and use of partitioned unique indexes. |
Can we prevent creating a |
This commit adds support for the syntax `... UNIQUE WITHOUT INDEX ...`, both when adding UNIQUE constraints and when adding UNIQUE columns. Using this syntax will currently return the error "unique constraints without an index are not yet supported", but support for the syntax serves as a starting point for adding support for these unique constraints. Informs cockroachdb#41535 Release note (sql change): Added support for using the syntax `... UNIQUE WITHOUT INDEX ...` in CREATE TABLE and ALTER TABLE statements, both when defining columns and unique constraints. Although this syntax can now be parsed successfully, using this syntax currently returns an error "unique constraints without an index are not yet supported".
One option is to print a |
This commit adds support for the syntax `... UNIQUE WITHOUT INDEX ...`, both when adding UNIQUE constraints and when adding UNIQUE columns. Using this syntax will currently return the error "unique constraints without an index are not yet supported", but support for the syntax serves as a starting point for adding support for these unique constraints. Informs cockroachdb#41535 Release note (sql change): Added support for using the syntax `... UNIQUE WITHOUT INDEX ...` in CREATE TABLE and ALTER TABLE statements, both when defining columns and unique constraints. Although this syntax can now be parsed successfully, using this syntax currently returns an error "unique constraints without an index are not yet supported".
55700: sql: add syntax to create unique constraints without an index r=rytaft a=rytaft This commit adds support for the syntax `... UNIQUE WITHOUT INDEX ...`, both when adding `UNIQUE` constraints and when adding `UNIQUE` columns. Using this syntax will currently return the error "unique constraints without an index are not yet supported", but support for the syntax serves as a starting point for adding support for these unique constraints. Informs #41535 Release note (sql change): Added support for using the syntax `... UNIQUE WITHOUT INDEX ...` in CREATE TABLE and ALTER TABLE statements, both when defining columns and unique constraints. Although this syntax can now be parsed successfully, using this syntax currently returns an error "unique constraints without an index are not yet supported". Co-authored-by: Rebecca Taft <[email protected]>
This commit adds checks for unique constraints when planning insertions in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
This commit adds checks for unique constraints when planning insertions in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
This commit adds checks for unique constraints when planning insertions in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
This commit adds checks for unique constraints when planning insertions in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
56777: opt: add insertion checks for unique constraints r=rytaft a=rytaft This commit adds checks for unique constraints when planning insertions in the optimizer. This does not yet impact anything outside of the optimizer tests, since `UNIQUE WITHOUT INDEX` is still not supported outside of the optimizer test catalog. Informs #41535 Release note: None 56834: opt: build statistics for multi-column inverted index scans r=mgartner a=mgartner This commit updates the statistic build so that both a `Scan.Constraint` and `Scan.InvertedConstraint` alter the selectivity of a constrained scan. These two fields are non-nil when the scan operates over a multi-column inverted index. Release note: None Co-authored-by: Rebecca Taft <[email protected]> Co-authored-by: Marcus Gartner <[email protected]>
If a user adds a Edit: This question is addressed by #56201. |
This commit adds checks for unique constraints when planning updates in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not fully supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
This commit adds checks for unique constraints when planning updates in the optimizer. This does not yet impact anything outside of the optimizer tests, since UNIQUE WITHOUT INDEX is still not fully supported outside of the optimizer test catalog. Informs cockroachdb#41535 Release note: None
57930: opt: add update checks for unique constraints r=rytaft a=rytaft This commit adds checks for unique constraints when planning updates in the optimizer. This does not yet impact anything outside of the optimizer tests, since `UNIQUE WITHOUT INDEX` is still not fully supported outside of the optimizer test catalog. Informs #41535 Release note: None Co-authored-by: Rebecca Taft <[email protected]>
This commit updates the execbuilder to build uniqueness checks for insert and update queries when needed. It also fixes a few other bugs in the existing code that were preventing the checks from being built. As of this commit, uniqueness checks for insert and update now work end-to-end. This commit also includes a number of execbuilder and logic tests for uniqueness checks. There is no release note since these checks are still gated behind the experimental_enable_unique_without_index_constraints session variable. Informs cockroachdb#41535 Release note: None
This commit updates the execbuilder to build uniqueness checks for insert and update queries when needed. It also fixes a few other bugs in the existing code that were preventing the checks from being built. As of this commit, uniqueness checks for insert and update now work end-to-end. This commit also includes a number of execbuilder and logic tests for uniqueness checks. There is no release note since these checks are still gated behind the experimental_enable_unique_without_index_constraints session variable. Informs cockroachdb#41535 Release note: None
This commit updates the execbuilder to build uniqueness checks for insert and update queries when needed. It also fixes a few other bugs in the existing code that were preventing the checks from being built. As of this commit, uniqueness checks for insert and update now work end-to-end. This commit also includes a number of execbuilder and logic tests for uniqueness checks. There is no release note since these checks are still gated behind the experimental_enable_unique_without_index_constraints session variable. Informs cockroachdb#41535 Release note: None
58053: opt, sql: add end-to-end support for uniqueness checks r=rytaft a=rytaft **opt: generalize explain output for constraint checks** This commit changes the explain output for constraint checks to say "constraint-check" rather than "fk-check" so that it can be used for uniqueness checks in addition to foreign key checks. Release note (sql change): The explain output for foreign key checks is now labeled constraint-check rather than fk-check. This change is in preparation for adding support for unique constraint checks, which will use the same label. **sql: update unique violation error to be Postgres compatible** This commit fixes the violation error for unique constraints to match the message used by Postres. Previously, the error would have been printed as: `duplicate key value (k)=(1) violates unique constraint "primary"` Now this message is printed as: ``` ERROR: duplicate key value violates unique constraint "primary" DETAIL: Key (k)=(1) already exists. ``` Release note (sql change): The error message for unique constraint violations now matches the error used by Postgres. For example, the new error message looks like this: ERROR: duplicate key value violates unique constraint "primary" DETAIL: Key (k)=(1) already exists. **opt: add end-to-end support for uniqueness checks** This commit updates the `execbuilder` to build uniqueness checks for insert and update queries when needed. It also fixes a few other bugs in the existing code that were preventing the checks from being built. As of this commit, uniqueness checks for insert and update now work end-to-end. This commit also includes a number of `execbuilder` and logic tests for uniqueness checks. There is no release note since these checks are still gated behind the `experimental_enable_unique_without_index_constraints` session variable. Informs #41535 Release note: None Co-authored-by: Rebecca Taft <[email protected]>
This commit adds checks for unique constraints when planning upserts in the optimizer. This commit also includes a number of execbuilder and logic tests for upsert uniqueness checks. There is no release note since these checks are still gated behind the experimental_enable_unique_without_index_constraints session variable. Informs cockroachdb#41535 Release note: None
58301: opt: add upsert checks for unique constraints r=rytaft a=rytaft This commit adds checks for unique constraints when planning upserts in the optimizer. This commit also includes a number of `execbuilder` and logic tests for upsert uniqueness checks. There is no release note since these checks are still gated behind the `experimental_enable_unique_without_index_constraints` session variable. Informs #41535 Release note: None Co-authored-by: Rebecca Taft <[email protected]>
I think this issue can now be closed. Partitioned uniqueness checks are now fully supported for all mutation types and all unique indexes. |
This is an idea advanced by @danhhz and @sumeerbhola that is relevant to our efforts to make multi-region applications easier. I've added some thoughts on how we might support this in the optimizer.
Consider this schema:
Assume there is a requirement that email addresses are domiciled in the corresponding country of origin (because they are PII). However, there is also a requirement that email addresses are globally unique.
One possible approach is to create a global secondary index that enforces uniqueness:
However, this would put copies of the index in each region, which would violate the domiciling requirement. In addition, inserting a new user record (or updating the email) would require writing to every region, which adds I/O and multiplies storage requirements.
Note that any attempt to create a partitioned index on email that domiciles properly fails to ensure uniqueness:
Here, email uniqueness is only enforced per region, not globally.
So what is the solution? Enforcing email uniqueness using either a global or partitioned index does not work. Instead, we need to do a "virtual" uniqueness check:
Similar to virtual computed columns, a virtual unique constraint does not trigger the creation of an underlying storage artifact (i.e. column or index). Instead, before the
email
column is inserted or updated, a check query would be run to ensure that the new email value does not yet exist in any region. This would typically be planned as a fan-out query by the optimizer, resulting in a point lookup on each region using the(region, email)
index. Note that if a good index does not exist, the optimizer would plan a table scan, so it's important that an appropriate partitioned index exists when usingUNIQUE VIRTUAL
constraints (we could consider automatically creating one).Summing up the behavior of a
UNIQUE VIRTUAL
constraint:email
column in this example) are inserted/updated, a fan-out query must be run against all regions. However, no writes need to be made to other regions, as with a global index.The optimizer would likely implement
UNIQUE VIRTUAL
constraints in a way very similar to how it implements foreign key check constraints today. It attaches a list of check queries to the mutation operator, and those are run in parallel. If any fail, it results in an error that aborts the transaction. The uniqueness check would just be another check of that sort.Adding support for
UNIQUE VIRTUAL
columns would be a building block to a future where multi-region applications are easy(ier):Syntax similar to this would:
region
column that is filled by default with the gateway node's region onINSERT
.region
partitioning key column.UNIQUE VIRTUAL
constraint check on any unique columns (or sets of columns).The text was updated successfully, but these errors were encountered: