Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug report] list-table api is very slow when table quantity is very large #4089

Closed
mygrsun opened this issue Jul 5, 2024 · 9 comments · Fixed by #4469
Closed

[Bug report] list-table api is very slow when table quantity is very large #4089

mygrsun opened this issue Jul 5, 2024 · 9 comments · Fixed by #4469
Assignees
Labels
0.7.0 Release v0.7.0 bug Something isn't working

Comments

@mygrsun
Copy link
Contributor

mygrsun commented Jul 5, 2024

Version

main branch

Describe what's wrong

Through my test,I found that list-table will takes 300s when a schema has 5000 tables .
I analysis the code and add some logs ,then found is the reason for calling the getTableObjectsByName interface.
listtable use the getTableObjectsByName .this metatore interface is very slow.
image

Error message and/or stacktrace

I add some logs at 3 positions.
image

the result is:

image

How to reproduce

add 5000 tables to one schema

Additional context

No response

@mygrsun mygrsun added the bug Something isn't working label Jul 5, 2024
@mygrsun
Copy link
Contributor Author

mygrsun commented Jul 5, 2024

I found that using this getTableObjectsByName mainly to filter out inner and outer tables, as well as to filter out iceberg tables.
If I don't filter the inner and outer surfaces. What is the impact here? What additional types of tables will be return?

@mygrsun
Copy link
Contributor Author

mygrsun commented Jul 5, 2024

Can you provide the direct query time for HMS without Gravitino?

we have tested id。 when I excute "show tables" in hive
beeline and sprark .it is very fast.
I gusess hiveserver2 don't use this getTableObjectsByName interface .because 'show tables' just return table names.

@mchades
Copy link
Contributor

mchades commented Jul 5, 2024

time1 and time2 do not seem to appear in the picture?

@mygrsun
Copy link
Contributor Author

mygrsun commented Jul 5, 2024

time1 and time2 do not seem to appear in the picture?

sorry, i will send you a new one

@mygrsun
Copy link
Contributor Author

mygrsun commented Jul 5, 2024

time1 and time2 do not seem to appear in the picture?

image

@mygrsun
Copy link
Contributor Author

mygrsun commented Jul 8, 2024

image

I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach.
but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way.
if it is acceptable ,I can submit a pr.

@mchades
Copy link
Contributor

mchades commented Jul 8, 2024

image I have tryed the listTableNamesByFilter inteface to filter iceberg table。It is a feasible approach. but I did not pay attention to filter the manager and external table,I dont know the point of filtering manager and external table.

so, please check this way. if it is acceptable ,I can submit a pr.

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

@FANNG1
Copy link
Contributor

FANNG1 commented Jul 8, 2024

Great! I think we can work on this way. WDYT? @jerryshao @FANNG1

I think it's ok, because this method seems extensible and not only works for filter Iceberg tables.

@mchades
Copy link
Contributor

mchades commented Jul 15, 2024

Hi @mygrsun , is there any progress? Can I assign this issue to you?

@jerryshao jerryshao added the 0.6.0 label Aug 1, 2024
mygrsun pushed a commit to mygrsun/gravitino that referenced this issue Aug 9, 2024
mygrsun pushed a commit to mygrsun/gravitino that referenced this issue Aug 19, 2024
@jerryshao jerryshao removed the 0.6.0 label Aug 19, 2024
mygrsun pushed a commit to mygrsun/gravitino that referenced this issue Aug 19, 2024
@jerryshao jerryshao added the 0.7.0 Release v0.7.0 label Nov 4, 2024
github-actions bot pushed a commit that referenced this issue Nov 4, 2024
…ble list (#4469)

### What changes were proposed in this pull request?

the problem of slow acquisition of hive table list.
Using listTableNamesByFilter replace the getTableObjectsByName method.


### Why are the changes needed?

I found that list-table will takes 300s when a schema has 5000 tables .

Fix: #4089 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Manual testing

---------

Co-authored-by: ericqin <[email protected]>
Co-authored-by: mchades <[email protected]>
jerryshao pushed a commit that referenced this issue Nov 4, 2024
…ble list (#5439)

### What changes were proposed in this pull request?

the problem of slow acquisition of hive table list.
Using listTableNamesByFilter replace the getTableObjectsByName method.


### Why are the changes needed?

I found that list-table will takes 300s when a schema has 5000 tables .

Fix: #4089 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Manual testing

Co-authored-by: mygrsun <[email protected]>
Co-authored-by: ericqin <[email protected]>
Co-authored-by: mchades <[email protected]>
mplmoknijb pushed a commit to mplmoknijb/gravitino that referenced this issue Nov 6, 2024
…ive table list (apache#4469)

### What changes were proposed in this pull request?

the problem of slow acquisition of hive table list.
Using listTableNamesByFilter replace the getTableObjectsByName method.


### Why are the changes needed?

I found that list-table will takes 300s when a schema has 5000 tables .

Fix: apache#4089 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

Manual testing

---------

Co-authored-by: ericqin <[email protected]>
Co-authored-by: mchades <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.7.0 Release v0.7.0 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants