[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490

karthik-ramanathan-3006 · 2024-08-13T18:21:49Z

Description

Consider the following schema:

CREATE TABLE test (k INT PRIMARY KEY, v1 INT, v2 JSON);

CREATE OR REPLACE FUNCTION no_update() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN
  RAISE NOTICE 'Trigger "no_update" invoked';
  RETURN NEW;
END;
$$;

CREATE TRIGGER test_trigger BEFORE UPDATE ON test FOR EACH ROW EXECUTE FUNCTION no_update();

-- Insert a sample row
INSERT INTO test VALUES (1, 1, '{}'::json);

The following query fails:

yugabyte=# UPDATE test SET v1 = v1 + 1;
NOTICE:  00000: Trigger "no_update" invoked
ERROR:  42883: could not identify a comparison function for type json

The query fails because of the following:

To identify the columns modified by the before row trigger, we try to compare the old and new value of each column.
To do the comparison, we look up the comparator function for the type associated with the column.
Column 'v2' is of type json which has no natural comparator function, causing the comparison and the query to fail.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

I confirm this issue does not contain any sensitive information.

The text was updated successfully, but these errors were encountered:

Summary: ### Background Data undergoes multiple transformations during the lifetime of a query. The data that is input by a query may be type modified (`typmod`), padded and aligned (`typalign`), type casted, compressed (`typstorage`) and normalized before being stored in a Datum as part of a Postgres Tuple. This Datum is then massaged into a network format (`protobuf`) as it is sent over the network to a tserver. Finally, the Datum is unpacked and stored in persistent storage in a format that is supported by DocDB/RocksDB. The reverse process happens when the same piece of data needs to be output back to the user. Each of these formats have their own notion of equality: Postgres has the notion of semantic/logical equality (`{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are equivalent) and storage/binary equality (the binary representations of `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` maybe different if alternative representations of a value are not normalized). Similarly DocDB has its own notion of semantic and storage equality. In most cases, these notions of equality can be used interchangeably and with good reason: - There is a 1:1 correspondence between the data stored in different formats. That is, a datum’s representation in postgres has exactly one corresponding representation in DocDB that allows the datum to be transformed seamlessly between the formats. This also implies that two datums whose binary representations are identical in postgres will also have identical representations in DocDB. This property is ubiquitously exploited to pushdown postgres operations to DocDB, and to indeed use DocDB as a storage engine for postgres. - Postgres also normalizes the representation of a datum’s value when it is packed into a Datum prior to storage. That is, if a given value of a given data type has multiple representations (the json from the example above), postgres converts the value into a normalized representation, which allows semantic equality to be interchangeably used with storage equality (if multiple representations of a value are represented in-memory/on-disk identically, they will also be stored identically). For data types that are not normalized, postgres does not define an equality operator (`json` data type is not normalized and does not have an equality operator, while `jsonb` data type is normalized and has an equality operator) This leads to a couple of problems: - There are occasions where we may want to know if two datums are stored identically when the data type that the datums belong to, does not have an equality operator. On such occasions, there is a distinction between semantic (not defined) and storage equality (defined). - Postgres is a highly extensible database that allows users to define custom data types and equality operators. In user-defined scenarios, it is also possible to end up with a difference between semantic and storage equality. ### This revision We perform the following optimizations on UPDATE queries that rely on *some* notion of equality: 1. If a BEFORE UPDATE FOR EACH ROW trigger is defined, we skip redundant index updates by comparing the old (pre-update) and new (post-update) values of a column. 2. With D34040, we also have a framework to skip index updates and constraint checking in cases where the value of a column remains unchanged by the update process. Both of these optimizations rely on semantic equality today. However, they should rely on storage/binary equality to correctly handle the problems mentioned above: - A given data type may not define an equality operator. In the absence of storage equality, for correctness in such cases, we must assume that the columns of such data types always change in value. - A user-defined data type may have funky notions of semantic equality (and set membership). This can lead to correctness issues in cases such as partial indexes, when two representations of a given value are considered semantically equal, but are not stored identically (not normalized) and membership to the partial index relies on a membership function that is sensitive to the storage representation. (eg: `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are not stored identically and a partial index is defined on `begins with '{”a”: 1’`) This revision switches to the use of storage equality for the above optimizations with the caveat that the function used for the comparison (`datumIsEqual`) does not support TOASTed storage. Jira: DB-12404 Test Plan: ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressUpdateOptimized#schedule' ``` Reviewers: amartsinchyk, mihnea, smishra Reviewed By: amartsinchyk Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37384

…for update optimizations Summary: **Conflict resolution for PG 15 cherrypick** - src/postgres/src/backend/executor/ybOptimizeModifyTable.c - Location: YBAreDatumsStoredIdentically (formerly called YBEqualDatums) top half - YB master b57e3c6 removes type cache lookup, and adds a comment explaining how values are stored in datums in a YB context - In YB pg15 be8504d, the initialization of `FunctionCallInfoData` has changed in YB-PG 15 for call information to be variable length (refer be8504d) - Merge resolution: `FunctionCallInfoData` is no longer required. Keep master commit changes, discard YB-PG 15 changes. - Location: YBAreDatumsStoredIdentically (formerly called YBEqualDatums) bottom half - YB master b57e3c6 removes call to data type’s comparator procedure and adds a call to datumIsEqual - In YB pg15 be8504d, the structure of `FunctionCallInfoData` has changed (refer above) - Merge resolution: `FunctionCallInfoData` is no longer required. Keep master commit changes, discard YB-PG 15 changes. Data undergoes multiple transformations during the lifetime of a query. The data that is input by a query may be type modified (`typmod`), padded and aligned (`typalign`), type casted, compressed (`typstorage`) and normalized before being stored in a Datum as part of a Postgres Tuple. This Datum is then massaged into a network format (`protobuf`) as it is sent over the network to a tserver. Finally, the Datum is unpacked and stored in persistent storage in a format that is supported by DocDB/RocksDB. The reverse process happens when the same piece of data needs to be output back to the user. Each of these formats have their own notion of equality: Postgres has the notion of semantic/logical equality (`{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are equivalent) and storage/binary equality (the binary representations of `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` maybe different if alternative representations of a value are not normalized). Similarly DocDB has its own notion of semantic and storage equality. In most cases, these notions of equality can be used interchangeably and with good reason: - There is a 1:1 correspondence between the data stored in different formats. That is, a datum’s representation in postgres has exactly one corresponding representation in DocDB that allows the datum to be transformed seamlessly between the formats. This also implies that two datums whose binary representations are identical in postgres will also have identical representations in DocDB. This property is ubiquitously exploited to pushdown postgres operations to DocDB, and to indeed use DocDB as a storage engine for postgres. - Postgres also normalizes the representation of a datum’s value when it is packed into a Datum prior to storage. That is, if a given value of a given data type has multiple representations (the json from the example above), postgres converts the value into a normalized representation, which allows semantic equality to be interchangeably used with storage equality (if multiple representations of a value are represented in-memory/on-disk identically, they will also be stored identically). For data types that are not normalized, postgres does not define an equality operator (`json` data type is not normalized and does not have an equality operator, while `jsonb` data type is normalized and has an equality operator) This leads to a couple of problems: - There are occasions where we may want to know if two datums are stored identically when the data type that the datums belong to, does not have an equality operator. On such occasions, there is a distinction between semantic (not defined) and storage equality (defined). - Postgres is a highly extensible database that allows users to define custom data types and equality operators. In user-defined scenarios, it is also possible to end up with a difference between semantic and storage equality. We perform the following optimizations on UPDATE queries that rely on *some* notion of equality: 1. If a BEFORE UPDATE FOR EACH ROW trigger is defined, we skip redundant index updates by comparing the old (pre-update) and new (post-update) values of a column. 2. With D34040, we also have a framework to skip index updates and constraint checking in cases where the value of a column remains unchanged by the update process. Both of these optimizations rely on semantic equality today. However, they should rely on storage/binary equality to correctly handle the problems mentioned above: - A given data type may not define an equality operator. In the absence of storage equality, for correctness in such cases, we must assume that the columns of such data types always change in value. - A user-defined data type may have funky notions of semantic equality (and set membership). This can lead to correctness issues in cases such as partial indexes, when two representations of a given value are considered semantically equal, but are not stored identically (not normalized) and membership to the partial index relies on a membership function that is sensitive to the storage representation. (eg: `{"a": 1, "b": 2}` and `{"b": 2, "a": 1}` are not stored identically and a partial index is defined on `begins with '{”a”: 1’`) This revision switches to the use of storage equality for the above optimizations with the caveat that the function used for the comparison (`datumIsEqual`) does not support TOASTed storage. Jira: DB-12404 Original commit: b57e3c6 / D37384 Test Plan: ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressUpdateOptimized#schedule' ``` Reviewers: jason, tfoucher Reviewed By: jason Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37753

Summary: Manually excluded: f77dd6a [#23251] YCQL, ASH: Fix CQL test failures with ASH 1ebc289 [PLAT-13833]Upgrade ion java library 1302a2b [PLAT-15087] Connection pooling fails when ysqlAuth is enabled 1050ec4 [docs] Add a limitation for CDC in docs (#23586) Excluded: b57e3c6 [#23490] YSQL: Tighten notion of equality for update optimizations 9b4c4b5 [PLAT-10264] collect audit logs, ha config and xcluster data in support bundle Excluded: 9e0c569 Revert "[PLAT-14786] Add support to node_agent install to use bind ip and node_external_fqdn" 920989b [PLAT-14788]mask SAS token in backup config response and logs Excluded: 0ea4f54 [#23367] CDCSDK: Cleanup expired and not of interest tables from CDC stream Excluded: 0dc3a4a [#23737] YSQL: Change ysql conn mgr tests to fix them with warmup random mode eba9b49 [PLAT-13845] Upgrade aws sdk to 1.52+ f44c92e [PLAT-12933] [k8s] Ability to roll N nodes at a time during upgrades for multi-AZ(region) universes b8f0308 [PLAT-14008] Avoid rolling YBA managed n2n certificates when not needed (e.g. during Gflag Upgrades) Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Differential Revision: https://phorge.dev.yugabyte.com/D37738

karthik-ramanathan-3006 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Aug 13, 2024

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 13, 2024

sushantrmishra removed the status/awaiting-triage Issue awaiting triage label Aug 25, 2024

sushantrmishra closed this as completed Sep 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490

[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490

karthik-ramanathan-3006 commented Aug 13, 2024 •

edited by jira bot

Loading

[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490

[YSQL] Updates fails on relation having columns with no comparator + before row trigger #23490

Comments

karthik-ramanathan-3006 commented Aug 13, 2024 • edited by jira bot Loading

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information

karthik-ramanathan-3006 commented Aug 13, 2024 •

edited by jira bot

Loading