Avoid reading Iceberg delete files when not needed #13395

alexjo2144 · 2022-07-28T21:19:56Z

Description

Parqet only.

Skip reading the delete files associated with a data file if the deletes are
not relevant. This can happen when the statistics from the data file already
show the split can be skipped. Additionally, this can happen when the line
numbers read by the split are known and can be used to filter positional
deletes.

Is this change a fix, improvement, new feature, refactoring, or other?

Performance improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Minimize I/O operations

Related issues, pull requests, and links

#13219

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

findepi · 2022-08-02T12:23:48Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

+        ReaderPageSource dataPageSource = readerPageSourceWithRowPositions.getReaderPageSource();
+
+        if (dataPageSource.get().isFinished()) {
+            return new EmptyPageSource();


Why? add a comment

Switched the approach here to just wrap the DeleteFilter reading in a Supplier. I think that reads better

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteFile.java

findepi · 2022-08-02T12:28:26Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteFile.java

+                deleteFile.lowerBounds().entrySet().stream().collect(toImmutableMap(Map.Entry::getKey, entry -> entry.getValue().array()));
+        Map<Integer, byte[]> upperBounds = deleteFile.upperBounds() == null ?
+                ImmutableMap.of() :
+                deleteFile.upperBounds().entrySet().stream().collect(toImmutableMap(Map.Entry::getKey, entry -> entry.getValue().array()));


.array()

do we need to make a defensive copy of these?

Probably. Added a call to clone

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/DeleteFile.java

alexjo2144 · 2022-08-02T16:13:26Z

Applied comments in fixup commit, thanks @findepi

findepi · 2022-08-02T18:00:07Z

squashed

findepi · 2022-08-03T08:35:46Z

@alexjo2144 can you please rebase?

Parqet only. Skip reading the delete files associated with a data file if the deletes are not relevant. This can happen when the statistics from the data file already show the split can be skipped. Additionally, this can happen when the line numbers read by the split are known and can be used to filter positional deletes.

cla-bot bot added the cla-signed label Jul 28, 2022

alexjo2144 requested review from electrum, findepi and ebyhr July 28, 2022 21:20

homar approved these changes Jul 29, 2022

View reviewed changes

findepi approved these changes Aug 2, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/filter-delete-files branch from d3b7369 to 43369ad Compare August 2, 2022 16:12

findepi force-pushed the iceberg/filter-delete-files branch from b676542 to 716b527 Compare August 2, 2022 17:59

findepi approved these changes Aug 2, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/filter-delete-files branch from 716b527 to 6df8a8b Compare August 3, 2022 16:04

empty

77bc71e

findepi merged commit caf85ae into trinodb:master Aug 8, 2022

findepi mentioned this pull request Aug 8, 2022

Release notes for 393 #13474

Closed

github-actions bot added this to the 393 milestone Aug 8, 2022

colebow mentioned this pull request Aug 8, 2022

Add Trino 393 release notes #13519

Merged

alexjo2144 mentioned this pull request Feb 3, 2023

Filter Iceberg position delete reads for ORC #15969

Merged

wypb mentioned this pull request Oct 25, 2023

Add support for reading v2 row level deletes in Iceberg connector prestodb/presto#21189

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid reading Iceberg delete files when not needed #13395

Avoid reading Iceberg delete files when not needed #13395

alexjo2144 commented Jul 28, 2022

findepi Aug 2, 2022

alexjo2144 Aug 2, 2022

findepi Aug 2, 2022

alexjo2144 Aug 2, 2022

alexjo2144 commented Aug 2, 2022 •

edited

Loading

findepi commented Aug 2, 2022

findepi commented Aug 3, 2022

Avoid reading Iceberg delete files when not needed #13395

Avoid reading Iceberg delete files when not needed #13395

Conversation

alexjo2144 commented Jul 28, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

findepi Aug 2, 2022

Choose a reason for hiding this comment

alexjo2144 Aug 2, 2022

Choose a reason for hiding this comment

findepi Aug 2, 2022

Choose a reason for hiding this comment

alexjo2144 Aug 2, 2022

Choose a reason for hiding this comment

alexjo2144 commented Aug 2, 2022 • edited Loading

findepi commented Aug 2, 2022

findepi commented Aug 3, 2022

alexjo2144 commented Aug 2, 2022 •

edited

Loading