-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Backend.upsert
#5391
Comments
cc @prmoore77 - someone mentioned in passing that you might have some use cases for upsert support. |
This might be useful, but let's wait until someone requests it. |
this would be a really nice feature to have. it would basically mean the only sql library my team needs to use is ibis -- rather than the current awkward mix of sqlalchemy + dataset (for upserting) + ibis |
reopening for triage, thanks for the input @mw3i! |
I have a data web app that has historically only been used for analysis using ibis, but I am about to need to add UPDATE features into it. I would love to stay inside the ibis ecosystem. @mw3i do you have any other tips or resources I should look at as I go down this path? |
hey @jcrist - yes, Upsert, or |
I think this and some related work in ensuring Ibis DDL things work w/ PyArrow tables and batches would be great and help close the loop. then you can run queries from one backend and use the results to populate tables in another, e.g. PySpark -> Postgres or Trino -> Clickhouse as we've heard recently |
Sorry for the late reply. Since ibis doesn't have upsert, our team uses the dataset library for inserts/updates/upserts. We do all our normal code in ibis, and then updates/upserts are done with: import dataset
params = f"{dialect}://{user}:{password}@{url}:{port}/{database}" # <-- sqlalchemy database uri string
with dataset.connect(params) as dbx:
dbx[tablename].upsert_many(
list_of_dictionaries, # <-- each dict is a db entry
column_to_use_as_identifier,
) It's not ideal but it lets us use ibis for everything except db writes (and ibis is just too good of a library to give up) |
Thank you @mw3i, that looks great! Really if we implement #9186 and #9185, this would make it very easy to just basically pawn off this responsibility to users to do themselves with an external library. Same thing for other feature requests that come in: the easier we make it to interoperate with other libs, the less pressure there is for us to implement it ourselves. |
Fair point about interoperability (anything to reduce internal developer workload is a plus); though the dataset library in particular might be idiosyncratic (idk how many other libraries expect data to be formatted that way) Given ibis leverages sqlalchemy already, it might not be too hard to maintain just 2 functions (insert and update, where update has an 'upsert' argument) that implement sqlalchemy inserts and upserts in the background without needing to make specific functions for each backend (though apologies if I'm being naive here) I've seen some relative straightforward gists that implement pseudo upserts in sqlalchemy. Idk how easily something like that could fit into the framework ibis uses |
Can we implement Backend.upsert in Snowflake please. I need it for my project. I'm stuck how to insert and update if record exists. Please advise on alternatives. |
@ramil-hx as a workaround, this is what I currently use to upsert records into a duckdb database. To work with snowflake's dialect, you probably need to modify this slightly, both in regards to the SQL, and to check how they handle primary keys. from ibis.backends.sql import SQLBackend
from ibis.expr import types as ir
def upsert(db: SQLBackend, table_name: str, new_data: ir.Table) -> ir.Table:
"""Upsert new data into a table using duckdb's INSERT OR REPLACE INTO tbl.
This assumes that the destination table has a primary key constraint.
Otherwise, the new rows will just be appended.
Not all fields need to be present in the new data.
Missing fields will be set to NULL.
Returns the appended table.
"""
existing_schema = db.table(table_name).schema()
new_data = new_data.cast(existing_schema)
new_data = new_data.select(*existing_schema)
new_sql = db.compile(new_data)
sql = f"INSERT OR REPLACE INTO {table_name} BY NAME \n{new_sql};"
db.raw_sql(sql)
return db.table(table_name) |
Hi @NickCrews (small world, eh!), have you ever used your workaround when In my use case the generated sql string is: INSERT INTO target_table SELECT * FROM "ibis_pandas_memtable_yynerbs36bezdf3us5awk7grji" AS "t0" ON CONFLICT DO NOTHING; Slightly different from yours as I need to run it both on duckdb and postgres, and needs to do nothing rather than updating; anyway, I totally see why the db engine complains! I am wondering if you managed to make it work also in these conditions. |
Lol hi! I assume the problem is is that the memtable is actually in a different backend (the default duckdb one?) than the one you are upsetting into. Not at a computer to test, but if you start with |
Forgot to get back to you: you were right! My backend is postgres, creating an intermediate table as you suggested made it work. |
Currently
ibis
supportscreate_table
for creating (or replacing) a table, andinsert
for inserting new data into a table. Another common operation isupsert
, which is supported by some backends. This is likeinsert
, but also supports updating existing rows.ibis
is targeted at is there a use case for this operation?Backend.upsert
support?The text was updated successfully, but these errors were encountered: