2024.2.1.0-b160
jaki
tagged this
08 Jan 23:44
Summary: Commit 59b3636f505a7dc75ae91ab76348b20a89528e90 introduces read batching for INSERT ON CONFLICT. Commit 3f2ac8c27617a8775320f4edb0c829f059bd1eff refactors the map part of the design. This commit refactors the ExecInsert part of the design and introduces RETURNING support. - Decoupling of YB's read batching from PG's foreign data wrapper batching: before this change, the INSERT ON CONFLICT read batching flow sticks closely to the INSERT FDW batching. - Slots were added to per-relinfo buffers, and flushing happened whenever any of these buffers hit the batch size. This caused some inconsistency in the order rows were inserted, and depending on the batch size, it could give different results. Change these buffers to be per-plan so that an INSERT that gets routed to multiple partitioned tables still maintains insertion order. One unobvious behavior change is that, now, in case of a mixed stream of input values, the write RPCs will increase as each batch touches a lot more tables whereas previously, each batch touched one table. Furthermore, since only one buffer exists per plan now rather than one buffer per partitioned table, the memory usage of storing these buffers reduces by that much. - The flushing also used to happen after we exceeded the batch size, not beforehand. For example, if the batch size is 3, flushing happened when trying to insert the fourth value. This causes some headache when trying to support RETURNING as that fourth value is already consumed from input while we are trying to flush the previous first to third values and trying to return those values. Restructure the flow to remove this odd behavior: now, the flush would happen after batching the third value, before reading the fourth value. This involves a major code refactor, splitting ExecInsert into YbExecInsertPrologue and YbExecInsertAct. The previous code duplicated part of ExecInsert with YbFlushSlotsFromBatch. Now, that is deduplicated by YbExecInsertAct, which is called from both ExecInsert and YbFlushSlotsFromBatch. While there is this deduplication, this restructuring introduces some duplication elsewhere, such as rechecking indexes whether they should be considered for ON CONFLICT between YbBatchFetchConflictingRows and ExecCheckIndexConstraints. Hopefully, these are minor enough that they can be deduplicated in the future, and the bigger goal is to reuse the pieces of this new structure for supporting read batching of the MERGE command in the future. - INSERT FDW batching calls ExecPendingInserts to flush _all_ buffers, including those for other tables in other plan nodes (see estate->es_insert_pending_result_relations), right before a BEFORE INSERT or BEFORE UPDATE trigger or when the current plan node's ExecModifyTable is exausted and we are flushing out the remainder. This is not necessarily correct if we want to stick close to non-batched behavior. Neither is it always correct to flush only the current buffer and ignore other plan nodes' buffers. This is because, in the lifetime of a batching cycle, we could get context switched out during the batching phase (which calls for flushing other plan nodes' buffers) or during the flushing phase (which calls for not flushing other plan nodes' buffers). Two simple examples of these cases is a WITH CTE getting referenced (for the first time) in the values being inserted or it getting referenced (for the first time) in an ON CONFLICT DO UPDATE expression, respectively. Since it is not simple to always determine which buffers ought to be flushed, default to flushing only the current buffer as that is simpler to implement (don't have to worry about infinite loops). - Stop reusing the fields that INSERT FDW batching uses for keeping track of batching, such as ri_NumSlots and ri_Slots, except for ri_BatchSize. Store the rest of the information in a new struct YbInsertOnConflictBatchState. - RETURNING support: besides the restructuring to make INSERT ON CONFLICT read batching with RETURNING easier to support, make several additional notable changes: - To support context switching out after having consumed a value from input, store the slot and plan slot in new fields in YbInsertOnConflictBatchState called pickup_slot and pickup_plan_slot. Once flushing iscompleted, context is restored using these pickup slots. - Change the conflict map to be per-plan instead of global. This is necessary in case we get context-switched out during flushing phase and interact with the conflict map of a different plan. Keep the intent set global as we want to detect whether the same row is inserted twice across plans. Truthfully, such detection should not be fully global as it should not mix into plans created by triggers, but that can be loosened in the future. Along the same lines, there are other deficiencies with the map design such as how it is not updated when a plan does INSERT, UPDATE, or DELETE without involving INSERT ON CONFLICT as the map/set alteration only happens within INSERT ON CONFLICT code. Maybe it is better to go back to design closer to before 3f2ac8c27617a8775320f4edb0c829f059bd1eff where the map/set updates happen in the index INSERT/DELETE/UPDATE paths to capture all the situations. Push these finer details to be future work. Add a bunch of test coverage for these finer cases involving WITH CTEs and triggers. Initial code by Karthik; finalizing design and adding tests by me. Jira: DB-13712 [BACKPORT 2024.2][#25296] YSQL: remove bad assertion in INSERT ON CONFLICT In an INSERT ON CONFLICT read batching code path, there is an assertion that can fail: Assert(state->num_slots >= key_count); state->num_slots represents the number of input slots being processed this batch. This is typically equal to the batch size. key_count represents the number of keys read from the indexes. Note index"es". Assuming there's only one index, worst case, key_count equals num_slots. But if there is more than one index, the worst case is that many times num_slots. So this assertion is faulty. Remove it. Jira: DB-14494 Merge: - YbExecInsertPrologue - proute: not relevant in PG 11.2 - BR trigger: ExecPendingInserts does not exist here in this PG 11.2 based version, so disregard that. Elsewhere, follow the same spirit of changing slot to *slot. - generated columns: does not exist in PG 11.2, so disregard - ExecInsert - signature: preserve existing comment and signature, adding the one YB comment. - fdw: disregard since PG 11.2 doesn't support fdw batching. - tableOid: keep for PG 11.2. Disregard generated columns comment. - ExecPendingInserts: remove entirely - ExecDeletePrologue, ExecDeleteAct, ExecDeleteEpilogue: rm - ExecUpdatePrologue, ExecUpdatePrepareSlot, ExecUpdateAct, YBExecUpdateAct, ExecUpdateEpilogue, ExecCrossPartitionUpdateForeignKey: rm. It merge conflicts because YBExecUpdateAct modifies YbExecCheckIndexConstraints function call. Apply that same change to the corresponding call in ExecUpdate. - YbAddSlotToBatch: if (resultRelInfo->ri_NumSlots == resultRelInfo->ri_BatchSize): this is relevant to fdw code, so discard. - ExecInitModifyTable: add comment related to ri_BatchSize like in master. - Introduce YbInsertOnConflictBatchState.tableOids because PG 15 -> PG 11 loses table oid information from the slots. - Put planSlot in ModifyTableContext. - forming tuple: we want to do this in YbExecInsertAct, so since we materialize tuple there, no need to do so after BR trigger - change foreach_delete_current to list_delete_cell (see upstream PG 1cff1b95ab6ddae32faa3efe0d95a820dbfdc164). - bunch of management of estate->es_result_relation_info: PG 15 -> PG 11 means losing upstream PG a04daa97a4339c38e304cd6164d37da540d665a8, so resultRelInfo is passed around through estate. - yb_insert_on_conflict regress test: update output for Partitioned tables cannot have BEFORE / FOR EACH ROW triggers. - yb_insert_on_conflict_multiplan regress test: update output to ERROR: trigger "parttrigtrig_odd2" for relation "parttrigtab_odd" already exists (Likely the ordering of which error hits first changed.) Test Plan: On Almalinux 8: #!/usr/bin/env bash set -euo pipefail ./yb_build.sh fastdebug --gcc11 find java/yb-pgsql/src/test/java/org/yb/pgsql -name 'TestPgRegressInsertOnConflict*' \ | grep -oE 'TestPgRegress\w+' \ | while read -r testname; do ./yb_build.sh fastdebug --gcc11 --java-test "$testname" --sj done Co-authored-by: Karthik Ramanathan <[email protected]> Original commits: - a120c2588cab65ebe497e0b8c64bd826b54a99e1 / D39229 - 429547f83f9f62b40a4ee6b291d7f28b3bd63511 / D40759 Reviewers: kramanathan Reviewed By: kramanathan Subscribers: amartsinchyk Differential Revision: https://phorge.dev.yugabyte.com/D40839