-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for storage table management of materialized views #21797
Comments
Yes. The fact that the storage exists is a necessity, but what the storage is -- this is abstracted.
That would break the abstraction, tying us to the current implementation. The necessary storage optimizations should be (and probably are) carried automatically by MV refreshes. |
Maybe closely moving it to storage table semantics breaks the abstraction, but that is NOT necessary. The abstraction can stay with the fact that it is a materialized view and not a view. That materializations automatically has needs for maintenance and the relevant tasks should be possible in Trino. Currently users are forced to use the tooling from the underlying Iceberg system to perform this maintenance. That is a bad user experience and also a more brittle approach. The refresh does NOT currently perform necessary maintenance and users are asking for more control. |
May be, keeping optimization separate than refresh can be handy, as users might want to only metadata optimization, instead of refreshing the MV. |
Currently, the refresh writes data anew (like CTAS). In Hive/Iceberg/Delta CTAS is supposed to write data in a form that's decent for querying. If this is not the case, all tables written by Trino are underperforming and we need to fix the CTAS/INSERT flow in the Iceberg connector. |
Sure .. if we can improve the writing and inserting that would be good. At the same time we should allow maintenance on materialized views and not force users into some hacks with other tools on the underlying storage to work around problems. |
Here are some points for consideration:
|
As we are adding support for incremental refresh, we may have to optimize the data files in the MV too. As @rstyp mentioned metadata compaction may help with planning times as well as the metadata grows. Currently no table maintenance operations can be done on the underlying iceberg tables of MV. |
Isn't metadata compaction implicit in Iceberg? |
Materialized views create storage tables transparently in Trino. However these tables are completely hidden in Trino and not accessible directly. As a result they can not be managed within Trino itself.
Use cases for this management include:
Potential approaches:
The text was updated successfully, but these errors were encountered: