Add `Backend.upsert` #5391

jcrist · 2023-01-31T20:22:34Z

Currently ibis supports create_table for creating (or replacing) a table, and insert for inserting new data into a table. Another common operation is upsert, which is supported by some backends. This is like insert, but also supports updating existing rows.

For the data analysis DB tasks that ibis is targeted at is there a use case for this operation?
What backends support this operation?
If we add support, what options should Backend.upsert support?

The text was updated successfully, but these errors were encountered:

jcrist · 2023-01-31T20:30:39Z

cc @prmoore77 - someone mentioned in passing that you might have some use cases for upsert support.

cpcloud · 2023-04-22T11:33:24Z

This might be useful, but let's wait until someone requests it.

mw3i · 2023-12-11T17:46:22Z

this would be a really nice feature to have. it would basically mean the only sql library my team needs to use is ibis -- rather than the current awkward mix of sqlalchemy + dataset (for upserting) + ibis

lostmygithubaccount · 2023-12-11T17:54:24Z

reopening for triage, thanks for the input @mw3i!

NickCrews · 2024-02-01T07:29:09Z

I have a data web app that has historically only been used for analysis using ibis, but I am about to need to add UPDATE features into it. I would love to stay inside the ibis ecosystem.

@mw3i do you have any other tips or resources I should look at as I go down this path?

prmoore77 · 2024-02-01T14:09:50Z

cc @prmoore77 - someone mentioned in passing that you might have some use cases for upsert support.

hey @jcrist - yes, Upsert, or MERGE as it is known in quite a few databases is a very useful ETL tool. Having that functionality in back-ends which support the MERGE DML statement would be very useful, and likely get more folks to use Ibis for ETL needs. Thanks!

lostmygithubaccount · 2024-02-01T14:21:09Z

I think this and some related work in ensuring Ibis DDL things work w/ PyArrow tables and batches would be great and help close the loop. then you can run queries from one backend and use the results to populate tables in another, e.g. PySpark -> Postgres or Trino -> Clickhouse as we've heard recently

mw3i · 2024-05-13T14:41:45Z

I have a data web app that has historically only been used for analysis using ibis, but I am about to need to add UPDATE features into it. I would love to stay inside the ibis ecosystem.

@mw3i do you have any other tips or resources I should look at as I go down this path?

Sorry for the late reply. Since ibis doesn't have upsert, our team uses the dataset library for inserts/updates/upserts. We do all our normal code in ibis, and then updates/upserts are done with:

import dataset

params = f"{dialect}://{user}:{password}@{url}:{port}/{database}" # <-- sqlalchemy database uri string
with dataset.connect(params) as dbx:
    dbx[tablename].upsert_many(
        list_of_dictionaries, # <-- each dict is a db entry
        column_to_use_as_identifier,
    )

It's not ideal but it lets us use ibis for everything except db writes (and ibis is just too good of a library to give up)

NickCrews · 2024-05-13T18:17:36Z

Thank you @mw3i, that looks great! Really if we implement #9186 and #9185, this would make it very easy to just basically pawn off this responsibility to users to do themselves with an external library. Same thing for other feature requests that come in: the easier we make it to interoperate with other libs, the less pressure there is for us to implement it ourselves.

mw3i · 2024-05-13T20:02:18Z

Fair point about interoperability (anything to reduce internal developer workload is a plus); though the dataset library in particular might be idiosyncratic (idk how many other libraries expect data to be formatted that way)

Given ibis leverages sqlalchemy already, it might not be too hard to maintain just 2 functions (insert and update, where update has an 'upsert' argument) that implement sqlalchemy inserts and upserts in the background without needing to make specific functions for each backend (though apologies if I'm being naive here)

I've seen some relative straightforward gists that implement pseudo upserts in sqlalchemy. Idk how easily something like that could fit into the framework ibis uses

ramil-hx · 2024-12-26T18:26:16Z

Can we implement Backend.upsert in Snowflake please. I need it for my project. I'm stuck how to insert and update if record exists. Please advise on alternatives.

NickCrews · 2024-12-27T01:32:06Z

@ramil-hx as a workaround, this is what I currently use to upsert records into a duckdb database. To work with snowflake's dialect, you probably need to modify this slightly, both in regards to the SQL, and to check how they handle primary keys.

from ibis.backends.sql import SQLBackend
from ibis.expr import types as ir

def upsert(db: SQLBackend, table_name: str, new_data: ir.Table) -> ir.Table:
    """Upsert new data into a table using duckdb's INSERT OR REPLACE INTO tbl.

    This assumes that the destination table has a primary key constraint.
    Otherwise, the new rows will just be appended.

    Not all fields need to be present in the new data.
    Missing fields will be set to NULL.

    Returns the appended table.
    """
    existing_schema = db.table(table_name).schema()
    new_data = new_data.cast(existing_schema)
    new_data = new_data.select(*existing_schema)
    new_sql = db.compile(new_data)
    sql = f"INSERT OR REPLACE INTO {table_name} BY NAME \n{new_sql};"
    db.raw_sql(sql)
    return db.table(table_name)

lmores · 2025-01-10T14:17:04Z

Hi @NickCrews (small world, eh!), have you ever used your workaround when new_data is a memtable? I get: duckdb.duckdb.CatalogException: Catalog Error: Table with name ibis_pandas_memtable_yynerbs36bezdf3us5awk7grji does not exist!.

In my use case the generated sql string is:

INSERT INTO target_table SELECT * FROM "ibis_pandas_memtable_yynerbs36bezdf3us5awk7grji" AS "t0" ON CONFLICT DO NOTHING;

Slightly different from yours as I need to run it both on duckdb and postgres, and needs to do nothing rather than updating; anyway, I totally see why the db engine complains! I am wondering if you managed to make it work also in these conditions.

NickCrews · 2025-01-10T16:28:59Z

Lol hi!

I assume the problem is is that the memtable is actually in a different backend (the default duckdb one?) than the one you are upsetting into. Not at a computer to test, but if you start with new_data = backend.create_table(remember_to_cleanup, new data) then does that get the new data so where in the backend where you can get to it?

lmores · 2025-01-17T10:17:43Z

Forgot to get back to you: you were right! My backend is postgres, creating an intermediate table as you suggested made it work.

jcrist added feature Features or general enhancements ddl Issues related to creating or altering data definitions labels Jan 31, 2023

cpcloud closed this as completed Apr 22, 2023

lostmygithubaccount reopened this Dec 11, 2023

deepyaman mentioned this issue Feb 5, 2024

[spike] Investigate suitability of Kedro for EL pipelines and incremental data loading kedro-org/kedro#3578

Open

This was referenced May 13, 2024

feat: Table.to_records() #9185

Open

feat: backend.db_params #9186

Open

cpcloud mentioned this issue Jul 3, 2024

Upsert Functionality for BigQuery Backend #9491

Closed

1 task

deepyaman mentioned this issue Sep 14, 2024

Support for Upsert functionality in ibis.TableDataset kedro-org/kedro-plugins#835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Backend.upsert` #5391

Add `Backend.upsert` #5391

jcrist commented Jan 31, 2023

jcrist commented Jan 31, 2023

cpcloud commented Apr 22, 2023

mw3i commented Dec 11, 2023 •

edited

Loading

lostmygithubaccount commented Dec 11, 2023

NickCrews commented Feb 1, 2024

prmoore77 commented Feb 1, 2024

lostmygithubaccount commented Feb 1, 2024

mw3i commented May 13, 2024

NickCrews commented May 13, 2024

mw3i commented May 13, 2024

ramil-hx commented Dec 26, 2024 •

edited

Loading

NickCrews commented Dec 27, 2024

lmores commented Jan 10, 2025

NickCrews commented Jan 10, 2025

lmores commented Jan 17, 2025

Add Backend.upsert #5391

Add Backend.upsert #5391

Comments

jcrist commented Jan 31, 2023

jcrist commented Jan 31, 2023

cpcloud commented Apr 22, 2023

mw3i commented Dec 11, 2023 • edited Loading

lostmygithubaccount commented Dec 11, 2023

NickCrews commented Feb 1, 2024

prmoore77 commented Feb 1, 2024

lostmygithubaccount commented Feb 1, 2024

mw3i commented May 13, 2024

NickCrews commented May 13, 2024

mw3i commented May 13, 2024

ramil-hx commented Dec 26, 2024 • edited Loading

NickCrews commented Dec 27, 2024

lmores commented Jan 10, 2025

NickCrews commented Jan 10, 2025

lmores commented Jan 17, 2025

Add `Backend.upsert` #5391

Add `Backend.upsert` #5391

mw3i commented Dec 11, 2023 •

edited

Loading

ramil-hx commented Dec 26, 2024 •

edited

Loading