You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use Glue to store metadata of hundreds of thousands of tables, spanning dozens of schemas (many of which are not Iceberg tables).
It would be a great feature to be able to initialize an IcebergCatalogProvider for query via DF using only specific schemas, mostly to avoid traversing everything.
It looks like this can be done with a small modification to IcebergCatalogProvider::try_new functions - see here
In this way, the user can choose to initialize the catalog on a specific schema(s)
/// attempts to create a schema provider for each namespace, and/// collects these providers into a `HashMap`.pubasyncfn try_new(client:Arc<dynCatalog>,schemas:Option<Vec<String>>) -> Result<Self>{// TODO:// Schemas and providers should be cached and evicted based on time// As of right now; schemas might become stale.let schema_names:Vec<_> = ifletSome(schemas) = schemas {
schemas
}else{
client
.list_namespaces(None).await?
.iter().flat_map(|ns| ns.as_ref().clone()).collect()};
what do you think?
The text was updated successfully, but these errors were encountered:
Thanks @a-agmon for raising this. The reason currently we need to fetch all schemas at once is to simplify implementation, since datafusion's catalog api currently is not async. I think instead of having more parameters in catalog construction, a better approach is to have a cache between the async iceberg catalog with datafusion catalog, which is good for both performance and functionality.
We use Glue to store metadata of hundreds of thousands of tables, spanning dozens of schemas (many of which are not Iceberg tables).
It would be a great feature to be able to initialize an IcebergCatalogProvider for query via DF using only specific schemas, mostly to avoid traversing everything.
It looks like this can be done with a small modification to
IcebergCatalogProvider::try_new
functions - see hereIn this way, the user can choose to initialize the catalog on a specific schema(s)
what do you think?
The text was updated successfully, but these errors were encountered: