-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement]: Table partition files list performance issue #2635
Comments
I propose to align
That would fix the performance issue because we don't have to iterate over all the entries to count files. The complexity would be reduced from millions to thousands for large tables whose partitions contain 1k files. However, the downside is that we have to drop the commit time and the storage size at the partition level which are calculated based on the entries. @majin1102 @zhoujinsong @baiyangtx WDYT? |
@link3280 |
Cool! Then we could still keep the partition storage size. |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
As apache/iceberg#8502 was released in iceberg 1.5, and currently we use iceberg 1.4.3, maybe this depends on #3084 |
This issue has been unblocked as #3084 has been merged. |
Search before asking
What would you like to be improved?
Currently, the table partition files API could be stuck for a very long time if the table has lots of files (e.g. over 100K). The root cause is that AMS gets all file entries to calculate partitions, instead of filtering the entries by partitions.
This may be due to a limitation that Iceberg Java API is not able to read the partition metadata table directly. But hopefully we could find a workaround or push Iceberg community to solve this problem.
How should we improve?
No response
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: