-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#4089] fix(hive catalog): the problem of slow acquisition of hive table list #4469
Conversation
if (!listAllTables) { | ||
String filter = | ||
String.format( | ||
"%stable_type = \"ICEBERG\"", hive_metastoreConstants.HIVE_FILTER_FIELD_PARAMS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering out only the Iceberg table seems to be insufficient, and we currently also do not support view type.
So I think you should use show managed type table when listAllTables=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not filter the view, what impact will it have on Gravitino?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not filter the view, what impact will it have on Gravitino?
I have tested it.The view table ,don‘t have any impact ,so i think we can do like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add some comments ,please review again
Hi, @mygrsun could you plz resolve the comment and CI issue? Thanks |
Now,the ci issue is a troublesome problem. https://github.com/apache/gravitino/actions/runs/10452698362/job/28942732918?pr=4469 This hive bug is only happen in Derby. In our environment , the metastore storage is mysql,so we don't encounter this problem. |
@mygrsun Is there any progress? May I take on this? |
thanks,you can do it |
a8e877d
to
ebc4b17
Compare
I have verified locally that through this PR, the time consumption of listing 1000 tables can be reduced from 2043ms to 14ms. It's ready for review now. @jerryshao |
}) | ||
.map(tb -> NameIdentifier.of(namespace, tb.getTableName())) | ||
.toArray(NameIdentifier[]::new)); | ||
if (!listAllTables) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the source codes of Paimon and tested locally, I updated the PR and use table param table_type=paimon
to filter out Paimon tables.
clientPool.run( | ||
c -> | ||
c.listTableNamesByFilter( | ||
schemaIdent.name(), icebergAndPaimonFilter, (short) -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of (short) -1
here? Can you define a constant and add a comment on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…ble list (#4469) ### What changes were proposed in this pull request? the problem of slow acquisition of hive table list. Using listTableNamesByFilter replace the getTableObjectsByName method. ### Why are the changes needed? I found that list-table will takes 300s when a schema has 5000 tables . Fix: #4089 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manual testing --------- Co-authored-by: ericqin <[email protected]> Co-authored-by: mchades <[email protected]>
…ive table list (apache#4469) ### What changes were proposed in this pull request? the problem of slow acquisition of hive table list. Using listTableNamesByFilter replace the getTableObjectsByName method. ### Why are the changes needed? I found that list-table will takes 300s when a schema has 5000 tables . Fix: apache#4089 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manual testing --------- Co-authored-by: ericqin <[email protected]> Co-authored-by: mchades <[email protected]>
What changes were proposed in this pull request?
the problem of slow acquisition of hive table list.
Using listTableNamesByFilter replace the getTableObjectsByName method.
Why are the changes needed?
I found that list-table will takes 300s when a schema has 5000 tables .
Fix: #4089
Does this PR introduce any user-facing change?
no
How was this patch tested?
Manual testing