-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets): support setting Ibis table schemas #833
Comments
I think so, because https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/__init__.py#L47 for example (called in the |
From @mark-druffel:
I'd be inclined to match the Ibis name, but create a section for |
Actually, looking at the code, it's more like So this would be very trivial to do. It just seems a bit dumb that you'd pass the schema name twice (to @mark-druffel curious to know what you think of this approach? This is what it would look like in your example: bronze_x:
type: ibis.TableDataset
filepath: x.csv
file_format: csv
table_name: x
connection: # Nit: Moved the keys under `connection`; assume that was just an oversight in your example
backend: duckdb
database: data.duckdb
database: bronze # Won't bother with `schema`, to be consistent with Ibis |
@deepyaman Staying consistent with ibis definitely makes sense. I do think the differing |
@deepyaman I've been looking back through this given my pyspark example over slack. I'm still thrown by the duplicative use of Given that, I think using your prior suggestion of
I'm updating TableDataset and doing some testing on my side, but would love to open a PR if the change makes sense to others? |
That makes sense and would be much appreciated! Appreciate any help in making the dataset as useful as possible for people such as yourself. :) I'm sorry for not getting back to you on the Slack thread (persistent reference: https://kedro.hall.community/loading-ibis-tabledataset-from-unity-catalog-fails-with-cannot-be-found-error-0m1YHF44RYjD), but would the update you're making address this? I know you mentioned some nuances with DuckDB, which I haven't parsed myself. |
@deepyaman no worries, you've been so helpful with every question I've had. It's so appreciated! To answer your direct question, yes. My main goal of the PR for me would be to make working with pyspark in databricks easier. Additional ContextBasically, using the Using table methodWe're using the pyspark backend with Unity Catalog (UC) so we need to be able to call
Using do_connect method
If I use the
Databricks documentation says:
Given that, it seems the only way to update Unity Catalog's default |
@deepyaman I added table_args to table dataset and got it working, but I'm having second thoughts. I think it's confusing where things go between table_args and save_args. I'm wondering though, will TableDataset keep
|
Description
I want to set the database schema for the Ibis table referenced by the
ibis.TableDataset
.Context
From @mark-druffel on Slack:
I can reproduce this error with vanilla ibis:
Found a related question on ibis' github, it sounds like duckdb can't set the schema globally so it has to be done in the table functions. Wondering if this would require a change to ibis.TableDataset, and if so, would this pattern work the same with other backends?
Possible Implementation
The text was updated successfully, but these errors were encountered: