-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7645] Optimize BQ sync tool for MDT #11065
[HUDI-7645] Optimize BQ sync tool for MDT #11065
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wombatu-kun : Please take a look at the comment and let me know your thoughts.
Stream<HoodieBaseFile> allLatestBaseFiles; | ||
if (useFileListingFromMetadata) { | ||
LOG.info("Fetching all base files from MDT."); | ||
allLatestBaseFiles = fsView.getLatestBaseFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like fsView.getLatestBaseFiles() only returns already loaded file-groups in the view so some partitions may not be loaded at all. Can you check if this is your intended behavior ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, fsView.getLatestBaseFiles()
only returns already loaded file-groups in the view. But I don't see any other approach to load all latest files in one call to HoodieMetadataFileSystemView
/HoodieTableMetadata
. It would be great if you or @nsivabalan (as reporter of this task) give me some advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've found fsView.loadAllPartitions()
to load all partitions in one call, and now all file-groups are loaded in the view before getting latest base files by fsView.getLatestBaseFiles()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. LGTM
d417a41
to
9464340
Compare
Stream<HoodieBaseFile> allLatestBaseFiles; | ||
if (useFileListingFromMetadata) { | ||
LOG.info("Fetching all base files from MDT."); | ||
allLatestBaseFiles = fsView.getLatestBaseFiles(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. LGTM
Change Logs
Looks like in BQ sync, we are polling fsview for latest files sequentially for every partition.
When MDT is enabled, we could load all partitions in one call.
Impact
none
Risk level (write none, low medium or high below)
none
Documentation Update
none
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist