-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read delta lake table history when initial transaction logs are deleted #18845
Conversation
7f6e106
to
8f8bcb2
Compare
@@ -94,6 +94,45 @@ public static TransactionLogTail loadNewTail( | |||
return new TransactionLogTail(entriesBuilder.build(), version); | |||
} | |||
|
|||
// Load a section of the Transaction Log JSON entries. Optionally from a given end version (inclusive) through an start version (inclusive) | |||
public static TransactionLogTail loadNewTailBackward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for reading backward, because we try to reverse them later in getPageSource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the transaction log we intend to access through loadNewTailBackward
will start from the final checkpoint file version and continue all the way back to version 0. This will remain true even if the log retention period is exceeded, leading to the removal of some of the initial transaction logs (0, 1, etc.). In this way, we can still retrieve the complete history of the table till the last checkpoint version.
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeConnectorSmokeTest.java
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeHistoryTable.java
Outdated
Show resolved
Hide resolved
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeConnectorSmokeTest.java
Outdated
Show resolved
Hide resolved
...rino-delta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeConnectorSmokeTest.java
Show resolved
Hide resolved
8f8bcb2
to
049175b
Compare
Thanks @findinpath for the review. I have addressed comments. |
if (lastCheckpointVersion.isPresent() && endVersionInclusive.isPresent()) { | ||
middleVersion = Optional.of(Math.min(lastCheckpointVersion.get(), endVersionInclusive.get())); | ||
} | ||
commitInfoEntries.addAll(TransactionLogTail.loadNewTailBackward(fileSystem, tableLocation, startVersionExclusive, middleVersion, true).getFileEntries().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to do this from the middle backwards and then from the middle forward separately? Rather than from endVersion backwards in one pass.
It's not obvious to me this way that you're not potentially going to have duplicates.
049175b
to
c91fde5
Compare
Thanks @alexjo2144 for the review. AC. |
c91fde5
to
27afeaf
Compare
(resolved conflicts) |
27afeaf
to
1b629c8
Compare
(resolved conflicts) |
1b629c8
to
d046069
Compare
(fixed CI failure) |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeHistoryTable.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeHistoryTable.java
Outdated
Show resolved
Hide resolved
d046069
to
1922381
Compare
Thanks @alexjo2144 for the review. Addressed comments. |
Failing for ADLS and GCS. S3 is passing. |
|
Description
This PR handles the case when transaction logs are deleted because the retention time of the transaction log is over.
This issue will occur when initial transaction logs get deleted because of log retention interval is exceeded while writing a new checkpoint file.
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: