Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…t 1 + Bug Fixes Summary: D38354 introduced batching support for INSERT ... ON CONFLICT queries. This revision seeks to address a subset of the design comments on that revision. In particular this refactor aims to: - Better integrate with/reuse existing YBCTID libraries for identifying unique index tuples. - Fix inconsistent results produced by different batch sizes. - Fix visibility issues observed in queries with a pipeline pattern (multiple CTEs, each doing a ModifyTable, each with a returning clause, feeding into the next) **Summary of changes** - Reuse existing YBCTID container infrastructure in pggate for storing tuple ID of “on-conflict” tuples. - Use separate containers for storing batch-read tuples and “just-inserted” tuples. - Move logic to update map of “just-inserted” tuples from execIndexing (at the time of enqueueing the index request) to nodeModifyTable **Reusing existing YBCTID container infrastructure** This revision reuses the YBCTID infrastructure that is used by the foreign key and explicit row locking buffers that already exists today. To do this, the key columns of unique (and non-NULL) indexes are treated as YBCTIDs. A session-level container is created to store the mapping between the IDs of tuples read by the index scan and their corresponding table slots. This design allows us to: - Reuse existing YBCTID helper functions for equality comparisons. - Mitigate tuple visibility issues through the creation of the global map when multiple INSERT … ON CONFLICT CTEs are chained together. - Inherit benefits from improvements to YBCTID processing logic in the future. **Use of separate containers to store batch-read tuples and “just-inserted” tuples** The index batch-read workflow requires mapping key columns of a unique index to the tuple returned by the index scan. To check if the same row is modified twice, identifiers (ie. key columns) to index tuples are required to be stored. While the data stored is similar (hash of the key columns of the index), the data can be stored in two separate structures: a **hash map** for the index batch-read, and a **hash set** for the list of modified tuples. This design allows us to: - Separate the scaling of the two data structures. If no restriction is placed on the size of the hash set, we can detect "command cannot affect row a second time" errors across batches of writes. - Remain consistent with the design of other buffers such as the foreign key buffer. The foreign key buffer currently has no restriction on how many YBCTID intents are placed within its hash set. - Cleanly remove the **hash set** when DocDB natively supports detection of tuples that are modified multiple times by the same query. **Rework logic of when entries are added/deleted to on-conflict map** The previous design adds and deletes entries to/from the on-conflict map when the payload for the index insert/delete operation is prepared (in ExecIndexing.c). This suffers from the following drawbacks: - Index updates (DELETEs + INSERTs) may be skipped. If the update of an arbiter index is skipped, it is incorrectly not added/deleted from the on-conflict map. - Primary key and secondary index modifications (insert/delete/update) have two distinct paths. This means that the previous design would have had to replicate this logic in both code paths, which it did not. This duplication is not required in the current design. This revision reworks the above design to add/delete entries to the map in two different places (in nodeModifyTable.c): - During on-conflict checking IF it is decided if the tuple will be inserted. Note that at this stage, the DO UPDATE projections have not been carried out (so we don’t know whether the updated values change the index keys). - After the updated values have been projected if it is decided that the tuple will be updated. This design currently has the drawback that the index tuple will have to be constructed one extra time (`FormIndexDatum()`) per tuple for tuples that are to be updated. This is particularly significant for expression indexes as it would involve evaluating the expression an extra time. This can be fixed for non-expression indexes by directly copying over the index key columns from the tuple slot. However, the fix for expression indexes requires storing and propagating the index tuple which is a large change relative to the benefit it offers. **Future Work** A second revision is planned that will: - Add support for RETURNING clause - Add limited support for foreign key relationships - Fix bugs related to partitioned tables Jira: DB-13766, DB-13883 Test Plan: On Almalinux 8: ``` #!/usr/bin/env bash set -euo pipefail ./yb_build.sh fastdebug --gcc11 find java/yb-pgsql/src/test/java/org/yb/pgsql -name 'TestPgRegressInsertOnConflict*' \ | grep -oE 'TestPgRegress\w+' \ | while read -r testname; do ./yb_build.sh fastdebug --gcc11 --java-test "$testname" --sj done ``` Reviewers: amartsinchyk, smishra, jason, telgersma Reviewed By: jason, telgersma Subscribers: jason, smishra, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D39023
- Loading branch information