Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Iceberg merge, update, delete, for tables with equality deletes #24062

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dain
Copy link
Member

@dain dain commented Nov 7, 2024

Description

  • Rewrite the creation of merge row id to avoid duplicate key exception.
  • Simplify by having data source produce row position for row id and then wrappering this block into a row id in IcebergPageSource
  • Remove unused update row id. This should have be removed during convertion to merge row ids.

Fixes #15952
Superseeds #16216

Release notes

(x) Release notes are required, with the following suggested text:

## Section
* Fix Iceberg merge, update, delete, for tables with equality deletes. ({issue}`15952`)

@cla-bot cla-bot bot added the cla-signed label Nov 7, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Nov 7, 2024
@dain dain force-pushed the update-after-equality-delete branch 2 times, most recently from b87430a to a080a7a Compare November 7, 2024 21:54
Update row id was replaced with merge row id
Rewrite the creation of merge row id to avoid duplicate key exception.
This also simplifies and consolidates the merge row id code.
@dain dain force-pushed the update-after-equality-delete branch from a080a7a to dddde02 Compare November 7, 2024 22:54
@@ -42,8 +42,7 @@ public class IcebergColumnHandle
private static final int INSTANCE_SIZE = instanceSize(IcebergColumnHandle.class);

// Iceberg reserved row ids begin at INTEGER.MAX_VALUE and count down. Starting with MIN_VALUE here to avoid conflicts.
public static final int TRINO_UPDATE_ROW_ID = Integer.MIN_VALUE;
public static final int TRINO_MERGE_ROW_ID = Integer.MIN_VALUE + 1;
public static final int TRINO_MERGE_ROW_ID = Integer.MIN_VALUE;
public static final String TRINO_ROW_ID_NAME = "$row_id";

public static final int TRINO_MERGE_PARTITION_SPEC_ID = Integer.MIN_VALUE + 2;
Copy link
Contributor

@findinpath findinpath Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can likely adjust the values (decrease by 1) of the other ID fields now that we don't have anymore Integer.MIN_VALUE + 1

@@ -72,20 +72,13 @@ public IcebergPageSource(

if (expectedColumn.isMergeRowIdColumn()) {
this.rowIdColumnIndex = i;

Map<Integer, Integer> fieldIdToColumnIndex = mapFieldIdsToIndex(requiredColumns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapFieldIdsToIndex function is not used anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

UPDATE failed in Iceberg: Multiple entries with same key: 3=$row_id.file_record_count
2 participants