Iceberg: support row-level delete and update #8565

jackye1995 · 2021-07-15T06:18:33Z

This PR adds support for writing Iceberg position delete. Similar to #8534 , I first present our working internal implementation backported to Trino, some parts might not work because of internal differences, but once we agree with the general approach I will make the fix and add unit tests.

Also, there is a missing piece that has to be added after #8534 is first merged, so that the IcebergPageSource has the ability to retain the row position channel and pass it to the updatable page source.

A few key points:

we choose to support row level delete through position delete spec instead of equality delete, this is based on the general guidance from Iceberg that position delete is preferred when possible. Plus the delete write mechanism in Trino fits perfectly with the position delete spec, as data is first scanned and filtered anyway, recording equality wastes the computation effort comparing to recording positions.
Trino delete row ID column has Trino type ROW(string file_path, long pos, row(table schema)), which matches Iceberg's position delete file schema.
update and delete share exactly the same row id type and also beginXXX and finishXXX operation implementation. The only difference is that update writes new data files after writing the delete files. This is because update in Iceberg is modelled as delete + insert.
In the page source provider, if it detects the operation is for DELETE or UPDATE (columns contain row id column), it will automatically read all the table columns (except for the identity partition columns), because the entire row is a part of the position delete schema anyway.
I directly reused the current Iceberg sink implementation to write back the position delete rows and updated rows. It's probably not the most optimal, but simple enough for the first iteration.

This is a bare minimum backport, I left some inline TODOs, and also there are many optimizaitons we can make after the base version is checked in, I tried to keep this as simple as possible to avoid too many disagreements around optimization related changes. Please let me know if this looks good or not, thanks!

@phd3 @electrum @findepi @losipiuk @caneGuy @rdblue @hashhar

findepi · 2021-07-29T11:04:26Z

@jackye1995 can you please add product test that would assert compatibility between Trino and Spark?
see

trino/testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveTransactionalTable.java

Line 1911 in 1364fbb

    
           private static void verifySelectForTrinoAndHive(String select, String whereClause, QueryAssert.Row... rows)

for how we test Trino/Hive compatibility for ORC ACID tables. This approach has proven to be useful and helped find some bugs too.

lhofhansl · 2021-08-30T21:12:30Z

If I wanted to try to this out, I'd need to create an Iceberg Table adhering to the Iceberg Format Specification V2, since you are proposing using delete snapshots, right?

And should we bump Iceberg to 0.12 (that version has the final V2 spec)?

@jackye1995

findepi

cc @alexjo2144

findepi · 2021-11-23T15:05:52Z

core/trino-spi/src/main/java/io/trino/spi/block/RowBlock.java

@@ -112,7 +112,7 @@ private RowBlock(int startOffset, int positionCount, @Nullable boolean[] rowIsNu
    }

    @Override
-    protected Block[] getRawFieldBlocks()
+    public Block[] getRawFieldBlocks()


Wonder why this is needed, and whether this is actually used correctly.

findepi · 2021-11-23T15:08:09Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergColumnHandle.java

+
+    public static IcebergColumnHandle createUpdateRowIdColumnHandle(Schema tableSchema, TypeManager typeManager)
+    {
+        return create(required(ROW_ID_COLUMN_INDEX, ROW_ID_COLUMN_NAME, DeleteSchemaUtil.posDeleteSchema(tableSchema).asStruct()), typeManager);


Is it used for deletes only, or for updates as well?

findepi · 2021-11-23T15:09:39Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                serializeToBytes(table.schema()),
+                serializeToBytes(table.spec()),


I think there already was an idea to add schema to IcebergTableHandle and it was rejected (?) for some reason.

@phd3 do you remember?

findepi · 2021-11-23T15:11:33Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

+        }
+        else {
+            Schema posDeleteSchema = DeleteSchemaUtil.posDeleteSchema(table.getSchema());
+            ConnectorPageSink posDeleteSink = new IcebergPageSink(


positionalDeletesSink

findepi · 2021-11-23T15:11:42Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceUpdate.java

+    private final List<IcebergColumnHandle> allTableColumns;
+    private final List<IcebergColumnHandle> updateColumns;
+    private final ConnectorPageSource source;
+    private final ConnectorPageSink posDeleteSink;


positionalDeletesSink

findepi · 2021-11-23T15:12:00Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

+                    FileContent.POSITION_DELETES,
+                    maxOpenPartitions);
+
+            ConnectorPageSink updateRowSink = new IcebergPageSink(


updatedDataSink

findepi · 2021-11-23T15:12:04Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceUpdate.java

+    private final List<IcebergColumnHandle> updateColumns;
+    private final ConnectorPageSource source;
+    private final ConnectorPageSink posDeleteSink;
+    private final ConnectorPageSink updateRowSink;


updatedDataSink

findepi · 2021-11-23T15:14:31Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceUpdate.java

+        }
+
+        Block[] updatedRows = new Block[allTableColumns.size()];
+        Block[] oldRows = ((RowBlock) rowIdBlock.getRawFieldBlocks()[2]).getRawFieldBlocks();


Cast to RowBlock isn't entirely correct.
See #9354 and perhaps we should use ColumnarRow here.
cc @djsstarburst

findepi · 2021-11-23T15:16:05Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceUpdate.java

+                resultBlocks[i] = RowBlock.fromFieldBlocks(pageSize, Optional.empty(), rowIdComponentBlocks);
+            }
+            else {
+                resultBlocks[i] = sourcePage.getBlock(allTableColumns.indexOf(columnHandle));


indexOf use here looks quadratic, and we seem to be doing this for every page.

jackye1995 · 2021-11-26T08:31:07Z

close in favor of #10075

Iceberg: support row-level delete and update

0e40560

cla-bot bot added the cla-signed label Jul 15, 2021

findepi force-pushed the master branch from 8538e49 to 1f896ea Compare July 30, 2021 22:13

findepi reviewed Nov 23, 2021

View reviewed changes

hankfanchiu mentioned this pull request Nov 23, 2021

Iceberg: Apply row-level delete when reading #7226

Closed

jackye1995 mentioned this pull request Nov 26, 2021

Support Iceberg row-level delete and update #10075

Closed

jackye1995 closed this Nov 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg: support row-level delete and update #8565

Iceberg: support row-level delete and update #8565

jackye1995 commented Jul 15, 2021

findepi commented Jul 29, 2021

lhofhansl commented Aug 30, 2021 •

edited

Loading

findepi left a comment

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

findepi Nov 23, 2021

jackye1995 commented Nov 26, 2021

		serializeToBytes(table.schema()),
		serializeToBytes(table.spec()),

Iceberg: support row-level delete and update #8565

Iceberg: support row-level delete and update #8565

Conversation

jackye1995 commented Jul 15, 2021

findepi commented Jul 29, 2021

lhofhansl commented Aug 30, 2021 • edited Loading

findepi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackye1995 commented Nov 26, 2021

lhofhansl commented Aug 30, 2021 •

edited

Loading