-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core, API: Support scanning from refs #5364
Conversation
f6aa023
to
9393ade
Compare
94fe662
to
78f7071
Compare
78f7071
to
b381e9d
Compare
cd024a1
to
d8c9628
Compare
920b5d0
to
f3df820
Compare
9dfc330
to
dd0062d
Compare
Discussed offline with @rdblue , I got an understanding of how history entries are different than the snapshot ancestor lineage, posting the information here so folks referring to the PR have context: "History is the state of main at any given time. That can be changed, much like master in git can be changed. The current main can change when rollback is performed. The snapshot which gets resolved for a given time T should remain constant regardless of what is the current main ancestor lineage. History entries capture all the changes on the main table state and are what enable this ability to time travel to a snapshot which are outside the current table ancestors which is ultimately desired behavior. It also doesn't make sense to treat main as a separate case, and have a different semantic for non-main branches when it comes to time travel. Maybe we could maintain separate history metadata for branches, but most likely it would need to be opt-in by users for specific branches and would need to not add too much metadata to manage. So for now, it makes sense just to do time travel to a branch tip or time travel to a given tag. Updated the PR @rdblue @namrathamyske @jackye1995 |
d4c0ea4
to
88b4958
Compare
88b4958
to
1990c42
Compare
…tag, and scanning from branch tips
1990c42
to
c9db6b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. However I am a bit concerned about the fact that we cannot do time travel in branches, I see that as a pretty important feature for many use cases. I agree with the argument of the difference of tracing ancestor vs history entry, but in that case it seems like we need to update the spec to accommodate that change and record history entries in a way that allows time travel by branch.
@amogh-jahagirdar I think after this PR we should raise another PR against spec to discuss what is the best way to solve the time travel by branch use case.
@@ -39,13 +39,25 @@ public interface TableScan extends Scan<TableScan, FileScanTask, CombinedScanTas | |||
*/ | |||
TableScan useSnapshot(long snapshotId); | |||
|
|||
/** | |||
* Create a new {@link TableScan} from this scan's configuration that will use the given | |||
* reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use the snapshot ID of the given reference?
* | ||
* @param timestampMillis a timestamp in milliseconds. | ||
* @return a new scan based on this with the current snapshot at the given time | ||
* @throws IllegalArgumentException if the snapshot cannot be found | ||
* @throws IllegalArgumentException if the snapshot cannot be found or time travel is attempted on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the doc change here is no longer needed?
@@ -85,7 +85,7 @@ public TableScan appendsAfter(long fromSnapshotId) { | |||
@Override | |||
public TableScan useSnapshot(long scanSnapshotId) { | |||
Preconditions.checkArgument( | |||
snapshotId() == null, "Cannot override snapshot, already set to id=%s", snapshotId()); | |||
snapshotId() == null, "Cannot override snapshot, already set snapshot id=%s", snapshotId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "set to snapshot id=%s"?
@@ -46,6 +46,14 @@ public TableScan asOfTime(long timestampMillis) { | |||
timestampMillis, context().fromSnapshotId(), context().toSnapshotId())); | |||
} | |||
|
|||
@Override | |||
public TableScan useRef(String ref) { | |||
throw new UnsupportedOperationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine for now, but we may have a case where we want incremental to be able to specify a branch. It doesn't matter currently because the snapshot IDs are always explicit.
Looks great. Thanks, @amogh-jahagirdar! I'll merge when tests are passing. I had to fix a conflict in revapi. |
Thanks @rdblue @yzhaoqin @namrathamyske @hililiwei for the review! |
This change adds support for scanning from refs, where the scan can either be a time travel from a tag or a time travel from the tip of a branch.