-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Daft Catalog API #3036
[FEAT] Daft Catalog API #3036
Conversation
CodSpeed Performance ReportMerging #3036 will degrade performances by 45.6%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3036 +/- ##
==========================================
- Coverage 77.50% 75.89% -1.61%
==========================================
Files 666 675 +9
Lines 81333 83898 +2565
==========================================
+ Hits 63036 63677 +641
- Misses 18297 20221 +1924
|
src/daft-catalog/src/lib.rs
Outdated
named_data_catalogs: HashMap<String, Arc<dyn DataCatalog>>, | ||
view_catalog: HashMap<String, LogicalPlanBuilder>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using interior mutability (e.g., Mutex) for named_data_catalogs
and view_catalog
. This would allow modifications through shared references, enabling use of Arc<DaftMetaCatalog>
without &mut self in register methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points.... I was trying to find a way around this also specifically because I wanted to enable lazy materialization of PyIceberg objects inside of the GlueCatalog (if we receive information from Glue that a given table is an "iceberg"
table, then we can lazily initialize the PyIceberg Glue client to help us interface with it)
I'm not actively working on this PR yet until perhaps next week, but will work in interior mutability here which is needed to make these happen.
nit(opinion): |
e6d0350
to
f63513e
Compare
I ended up removing the glue implementation to simplify the PR, and immediately unlock registration of PyIceberg catalogs as in #3901 I also made this easily extendible from Python via simple |
a2e4847
to
85b6a9b
Compare
Add DataCatalogTable trait Add pyiceberg nit Wire up register_table and read_table Implement pyiceberg catalog Implement python catalog Integrate with iceberg tests Fix merge conflicts
51ba4c1
to
e5e8ed6
Compare
@universalmind303 this is ready for review now. Specifically I'm interested in how this can be integrated with our |
Adds a
DaftCatalog
API to help cement Daft's catalog data access patterns. Here is the intended UX:Other APIs which will be nice as follow-ons:
~/.daft.yaml
config filedaft.read_table("iceberg_table", options=daft.catalog.IcebergReadOptions(...))
)daft.catalog.register_aws_glue()
daft.catalog.register_hive_metastore()
df.write_table("table_name", mode="overwrite|append", create_if_missing=True)
df.upsert("table_name", match_columns={...}, update_columns={...}, insert_columns={...})
daft.catalog.create_table_parquet(...)
daft.catalog.create_table_iceberg(...)
daft.catalog.create_table_deltalake(...)
daft.catalog.list_tables(...)