-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490
Closed
1 task done
Labels
Comments
karthik-ramanathan-3006
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Aug 13, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Aug 13, 2024
karthik-ramanathan-3006
added a commit
that referenced
this issue
Sep 3, 2024
Summary: ### Background Data undergoes multiple transformations during the lifetime of a query. The data that is input by a query may be type modified (`typmod`), padded and aligned (`typalign`), type casted, compressed (`typstorage`) and normalized before being stored in a Datum as part of a Postgres Tuple. This Datum is then massaged into a network format (`protobuf`) as it is sent over the network to a tserver. Finally, the Datum is unpacked and stored in persistent storage in a format that is supported by DocDB/RocksDB. The reverse process happens when the same piece of data needs to be output back to the user. Each of these formats have their own notion of equality: Postgres has the notion of semantic/logical equality (`{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are equivalent) and storage/binary equality (the binary representations of `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` maybe different if alternative representations of a value are not normalized). Similarly DocDB has its own notion of semantic and storage equality. In most cases, these notions of equality can be used interchangeably and with good reason: - There is a 1:1 correspondence between the data stored in different formats. That is, a datum’s representation in postgres has exactly one corresponding representation in DocDB that allows the datum to be transformed seamlessly between the formats. This also implies that two datums whose binary representations are identical in postgres will also have identical representations in DocDB. This property is ubiquitously exploited to pushdown postgres operations to DocDB, and to indeed use DocDB as a storage engine for postgres. - Postgres also normalizes the representation of a datum’s value when it is packed into a Datum prior to storage. That is, if a given value of a given data type has multiple representations (the json from the example above), postgres converts the value into a normalized representation, which allows semantic equality to be interchangeably used with storage equality (if multiple representations of a value are represented in-memory/on-disk identically, they will also be stored identically). For data types that are not normalized, postgres does not define an equality operator (`json` data type is not normalized and does not have an equality operator, while `jsonb` data type is normalized and has an equality operator) This leads to a couple of problems: - There are occasions where we may want to know if two datums are stored identically when the data type that the datums belong to, does not have an equality operator. On such occasions, there is a distinction between semantic (not defined) and storage equality (defined). - Postgres is a highly extensible database that allows users to define custom data types and equality operators. In user-defined scenarios, it is also possible to end up with a difference between semantic and storage equality. ### This revision We perform the following optimizations on UPDATE queries that rely on *some* notion of equality: 1. If a BEFORE UPDATE FOR EACH ROW trigger is defined, we skip redundant index updates by comparing the old (pre-update) and new (post-update) values of a column. 2. With D34040, we also have a framework to skip index updates and constraint checking in cases where the value of a column remains unchanged by the update process. Both of these optimizations rely on semantic equality today. However, they should rely on storage/binary equality to correctly handle the problems mentioned above: - A given data type may not define an equality operator. In the absence of storage equality, for correctness in such cases, we must assume that the columns of such data types always change in value. - A user-defined data type may have funky notions of semantic equality (and set membership). This can lead to correctness issues in cases such as partial indexes, when two representations of a given value are considered semantically equal, but are not stored identically (not normalized) and membership to the partial index relies on a membership function that is sensitive to the storage representation. (eg: `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are not stored identically and a partial index is defined on `begins with '{”a”: 1’`) This revision switches to the use of storage equality for the above optimizations with the caveat that the function used for the comparison (`datumIsEqual`) does not support TOASTed storage. Jira: DB-12404 Test Plan: ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressUpdateOptimized#schedule' ``` Reviewers: amartsinchyk, mihnea, smishra Reviewed By: amartsinchyk Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37384
karthik-ramanathan-3006
added a commit
that referenced
this issue
Sep 4, 2024
…for update optimizations Summary: **Conflict resolution for PG 15 cherrypick** - src/postgres/src/backend/executor/ybOptimizeModifyTable.c - Location: YBAreDatumsStoredIdentically (formerly called YBEqualDatums) top half - YB master b57e3c6 removes type cache lookup, and adds a comment explaining how values are stored in datums in a YB context - In YB pg15 be8504d, the initialization of `FunctionCallInfoData` has changed in YB-PG 15 for call information to be variable length (refer be8504d) - Merge resolution: `FunctionCallInfoData` is no longer required. Keep master commit changes, discard YB-PG 15 changes. - Location: YBAreDatumsStoredIdentically (formerly called YBEqualDatums) bottom half - YB master b57e3c6 removes call to data type’s comparator procedure and adds a call to datumIsEqual - In YB pg15 be8504d, the structure of `FunctionCallInfoData` has changed (refer above) - Merge resolution: `FunctionCallInfoData` is no longer required. Keep master commit changes, discard YB-PG 15 changes. Data undergoes multiple transformations during the lifetime of a query. The data that is input by a query may be type modified (`typmod`), padded and aligned (`typalign`), type casted, compressed (`typstorage`) and normalized before being stored in a Datum as part of a Postgres Tuple. This Datum is then massaged into a network format (`protobuf`) as it is sent over the network to a tserver. Finally, the Datum is unpacked and stored in persistent storage in a format that is supported by DocDB/RocksDB. The reverse process happens when the same piece of data needs to be output back to the user. Each of these formats have their own notion of equality: Postgres has the notion of semantic/logical equality (`{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are equivalent) and storage/binary equality (the binary representations of `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` maybe different if alternative representations of a value are not normalized). Similarly DocDB has its own notion of semantic and storage equality. In most cases, these notions of equality can be used interchangeably and with good reason: - There is a 1:1 correspondence between the data stored in different formats. That is, a datum’s representation in postgres has exactly one corresponding representation in DocDB that allows the datum to be transformed seamlessly between the formats. This also implies that two datums whose binary representations are identical in postgres will also have identical representations in DocDB. This property is ubiquitously exploited to pushdown postgres operations to DocDB, and to indeed use DocDB as a storage engine for postgres. - Postgres also normalizes the representation of a datum’s value when it is packed into a Datum prior to storage. That is, if a given value of a given data type has multiple representations (the json from the example above), postgres converts the value into a normalized representation, which allows semantic equality to be interchangeably used with storage equality (if multiple representations of a value are represented in-memory/on-disk identically, they will also be stored identically). For data types that are not normalized, postgres does not define an equality operator (`json` data type is not normalized and does not have an equality operator, while `jsonb` data type is normalized and has an equality operator) This leads to a couple of problems: - There are occasions where we may want to know if two datums are stored identically when the data type that the datums belong to, does not have an equality operator. On such occasions, there is a distinction between semantic (not defined) and storage equality (defined). - Postgres is a highly extensible database that allows users to define custom data types and equality operators. In user-defined scenarios, it is also possible to end up with a difference between semantic and storage equality. We perform the following optimizations on UPDATE queries that rely on *some* notion of equality: 1. If a BEFORE UPDATE FOR EACH ROW trigger is defined, we skip redundant index updates by comparing the old (pre-update) and new (post-update) values of a column. 2. With D34040, we also have a framework to skip index updates and constraint checking in cases where the value of a column remains unchanged by the update process. Both of these optimizations rely on semantic equality today. However, they should rely on storage/binary equality to correctly handle the problems mentioned above: - A given data type may not define an equality operator. In the absence of storage equality, for correctness in such cases, we must assume that the columns of such data types always change in value. - A user-defined data type may have funky notions of semantic equality (and set membership). This can lead to correctness issues in cases such as partial indexes, when two representations of a given value are considered semantically equal, but are not stored identically (not normalized) and membership to the partial index relies on a membership function that is sensitive to the storage representation. (eg: `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are not stored identically and a partial index is defined on `begins with '{”a”: 1’`) This revision switches to the use of storage equality for the above optimizations with the caveat that the function used for the comparison (`datumIsEqual`) does not support TOASTed storage. Jira: DB-12404 Original commit: b57e3c6 / D37384 Test Plan: ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressUpdateOptimized#schedule' ``` Reviewers: jason, tfoucher Reviewed By: jason Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37753
jasonyb
pushed a commit
that referenced
this issue
Sep 4, 2024
Summary: Manually excluded: f77dd6a [#23251] YCQL, ASH: Fix CQL test failures with ASH 1ebc289 [PLAT-13833]Upgrade ion java library 1302a2b [PLAT-15087] Connection pooling fails when ysqlAuth is enabled 1050ec4 [docs] Add a limitation for CDC in docs (#23586) Excluded: b57e3c6 [#23490] YSQL: Tighten notion of equality for update optimizations 9b4c4b5 [PLAT-10264] collect audit logs, ha config and xcluster data in support bundle Excluded: 9e0c569 Revert "[PLAT-14786] Add support to node_agent install to use bind ip and node_external_fqdn" 920989b [PLAT-14788]mask SAS token in backup config response and logs Excluded: 0ea4f54 [#23367] CDCSDK: Cleanup expired and not of interest tables from CDC stream Excluded: 0dc3a4a [#23737] YSQL: Change ysql conn mgr tests to fix them with warmup random mode eba9b49 [PLAT-13845] Upgrade aws sdk to 1.52+ f44c92e [PLAT-12933] [k8s] Ability to roll N nodes at a time during upgrades for multi-AZ(region) universes b8f0308 [PLAT-14008] Avoid rolling YBA managed n2n certificates when not needed (e.g. during Gflag Upgrades) Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Differential Revision: https://phorge.dev.yugabyte.com/D37738
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-12404
Description
Consider the following schema:
The following query fails:
The query fails because of the following:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: