Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949

shravan2x · 2019-08-05T09:42:02Z

I propose adding an API to indicate to EF Core that an insert (via Add or AddRange and variants) may fail. This is useful when adding data to a table that may already exist.

For instance, I have a program that collects analytics data from a source and inserts it all into a table. Some of these rows may be duplicates of previously inserted rows and so will fail unique constraints. These errors can be safely ignored since the rows already exist in the table.

Using individual INSERT IGNORE ... statements results in horrible performance (~15x slower than AddRangeAsync).

The text was updated successfully, but these errors were encountered:

ErikEJ · 2019-08-05T11:38:05Z

What Database provider?

ajcvickers · 2019-08-05T14:14:29Z

EF Team Triage: This issue is not something that our team is planning to address in the EF6.x code base. This does not mean that we would not consider a community contribution to address this issue.

Moving forwards, our team will be fixing bugs, implementing small improvements, and accepting community contributions to the EF6.x code base. Larger feature work and innovation will happen in the EF Core code base (https://github.com/aspnet/EntityFramework).

Closing an issue in the EF6.x project does not exclude us addressing it in EF Core. In fact, a number of popular feature requests for EF have already been implemented in EF Core (alternate keys, batching in SaveChanges, etc.).

BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions.

shravan2x · 2019-08-05T20:04:05Z

@ErikEJ MySql.Data and MySqlConnector (I tested on both).

@ajcvickers Isn't this already the EF Core repo? I'm confused what you mean by your comment.

ajcvickers · 2019-08-05T20:15:50Z

@shravan2x It's a boilerplate message--see the note at the end of it. The point is we're not going to do it for EF6.

shravan2x · 2019-08-05T20:24:47Z

@ajcvickers Oh I see. I don't know which repo hosts the EF6 code base, but since this one is already the EF Core code base, can we leave the issue open?

EDIT: Hopefully I understood what you meant :)

ajcvickers · 2019-08-05T20:27:08Z

@shravan2x Are you asking for this to be implemented in EF Core or EF6?

shravan2x · 2019-08-05T20:28:43Z

@ajcvickers EF Core. Sorry, my original question was mis-worded. I'll fix that.

ajcvickers · 2019-08-05T20:48:53Z

@shravan2x We will discuss in triage.

ajcvickers · 2019-08-12T18:59:50Z

Notes from triage: it's not 100% clear that this is really useful in the context of an OR/M, but moving it to the backlog to consider at a later time.

LeaFrock · 2019-10-28T07:51:08Z

Notes from triage: it's not 100% clear that this is really useful in the context of an OR/M, but moving it to the backlog to consider at a later time.

At least for MySql, it's very useful.
I‘ve seen a lot of questions like INSERT-IGNORE-using-EF-Core about how to insert datas into a table , which already has datas and an unique index/key, without any exceptions(if the data exists, just skip it). MySql supports it by 'insert ignore into', while SqlServer is still unknown to me for the answer.
If EF Core provides the API, we'll get the best practice.

mqudsi · 2020-04-16T16:56:49Z

@ajcvickers can you explain what your doubts about its suitability for an ORM are? I'm not sure I see the difference between this or other SQL translations.

When attempting to insert data that may already be present in the database without insert or ignore, you basically need to either prepopulate some sort of bloom filter or hash set beforehand and pray you don't run into any coherence issues or else use fine-grained transactions and throw-and-catch lots of exceptions.

While the SQL specifics are certainly provider-dependent (just like everything else), its certainly not a concept specific to a single SQL dialect. INSERT IGNORE is very closely linked with upsert, which is quickly becoming "a thing" in the rdbms world. MySQL hase INSERT IGNORE and REPLACE INTO, SQLite has INSERT OR IGNORE for primary key conflicts and ON CONFLICT [IGNORE|REPLACE], PostgreSQL has ON CONFLICT ____ DO [NOTHING|UPDATE ...], SQL Server has IF NOT EXISTS(..) for old versions and now MERGE ....

Please correct me if I'm wrong, but we have a general concept (need to insert content that may or may not already exist in the database and either update/abort/ignore if it already does) that is available in the most popular database options supported/targeted by EF Core, but requires different syntax for each implementation, which to me makes it definitely ORM-worthy.

The only issue is that it does not map cleanly to LINQ because LINQ expressions don't really have any concept of constraints. But for the most part, updates and inserts are orthogonal to querying and I imagine this could be implemented without touching the IQueryable interface at all; if you take the most simple example of either completely replacing or completely ignoring (i.e. disregarding the more complex options afforded by the PostgreSQL and SQL Server languages) then in its most basic form this could simply be an optional parameter to the DbSet<>.Add() method specifying the conflict resolution action with ABORT or equivalent (which is basically the current behavior) being the default and the only required version to support.

ajcvickers · 2020-04-17T16:05:53Z

@mqudsi I don't remember the full extent of the discussion with the team. Nevertheless, this is something we have on the backlog to consider for a future release.

tombohub · 2021-09-02T03:24:07Z

Does anybody have a workaround?

shravan2x · 2021-09-02T03:44:53Z

@tombohub I use this package which works fine - https://github.com/artiomchi/FlexLabs.Upsert

roji · 2021-09-02T08:00:29Z

This is probably related to upsert support (#4526) - the two should probably be at least considered together.

On PostgreSQL, INSERT has an ON CONFLICT clause. ON CONFLICT DO NOTHING is the same as INSERT IGNORE, whereas ON CONFLICT DO UPDATE is upsert.

A tricky aspect here is the interaction with change tracking: if an insert is ignored, we end up with a discrepancy (since the existing row in the database may have other values as what we're trying to insert). This may make sense only with non-tracking inserts (Do not track after SaveChanges() #9118).
PostgreSQL allows defining ON CONFLICT actions per constraint (or not), so that primary key violations may be ignored/upserted while other unique index violations may still throw. MySQL INSERT IGNORE doesn't seem to support this - not sure if that's a problem.

mqudsi · 2021-09-02T20:45:17Z

An issue/problem is that arbitrary columns may be passed to PostgreSQL ON CONFLICT but it'll fail at runtime if the columns aren't covered by a unique index/constraint. MySQL's INSERT IGNORE is much simpler (and therefore more ORM-friendly).

One would be forgiven for thinking that this particular sub-feature could be elided from any ignore/upsert implementation, but such an abstraction would be severely crippled because any unique constraints that include a nullable column would never trigger (because NULL != NULL in SQL). PostgreSQL users could get around this with a functional index/constraint that coalesces the nullable column with a reserved invalid value (i.e. default(Guid)), but I don't think MySQL can do that without needing to also use generated columns (where supported) - but a WHERE clause would have to be provided.

(With postgres, you could also use (one or more) partial index(es) but that would require the ORM ignore/upsert API to additionally support/require a WHERE clause specifying the same predicate as the partial index so it would apply - and you'd still have to specifically name the columns in question since, IIRC, postgres index inference only searches for matching non-partial indices if no columns are specifically named.)

Otherwise, multiple indices covering all permutations of one nullable column + all other columns would be required.

roji · 2021-09-03T10:44:22Z

An issue/problem is that arbitrary columns may be passed to PostgreSQL ON CONFLICT but it'll fail at runtime if the columns aren't covered by a unique index/constraint. MySQL's INSERT IGNORE is much simpler (and therefore more ORM-friendly).

Interesting - what specific issues do you see with PG's arbitrary column support, and how do you see INSERT IGNORE this as more ORM-friendly? At the very least, the PG provider could generate ON CONFLICT IGNORE without any columns, providing (I think) the same behavior as MySQL, no? In that sense PG seems to be a superset of the MySQL functionality?

One would be forgiven for thinking that this particular sub-feature could be elided from any ignore/upsert implementation, but such an abstraction would be severely crippled because any unique constraints that include a nullable column would never trigger (because NULL != NULL in SQL).

I don't think this is really worse than other places where different database behavior leaks - EF Core doesn't really pretend to provide a uniform abstraction over all databases (e.g. string comparison is case-insensitive in some databases, insensitive in others). In other words, it may make sense to provide a general relational "add or ignore" mechanism, which would be implemented by ON CONFLICT on PG and INSERT IGNORE in MySQL, even if there are some behavioral differences. But I admit I haven't gone into the fine details here.

But your points (and suggested workarounds) are interesting... Note that PG also has exclusion constraints, which could be used with the IS DISTINCT FROM to disallow multiple nulls, if that's desired (just as standard SQL DISTINCT does consider NULLs equal). This would obviously be a PG-specific thing.

mqudsi · 2022-03-10T23:33:39Z

Not directly related to the issue but with regards to how it would be implemented and my earlier comments about the difficulties in using a unique index covering potentially null fields to handle insert ignore conflicts: PostgreSQL 15 is going to ship w/ support for UNIQUE [NULLS NOT DISTINCT], which is kind of a big deal. I guess this is a more ergonomic and directly usable alternative to the exclusion constraints Shay mentioned. I've also heard we can expect to see other providers also adding support for this?

More here: commit and discussion

roji · 2022-03-11T11:14:47Z

@mqudsi thanks for pointing that out. Do you have any indication of other databases implementing this feature? For now, I've opened npgsql/efcore.pg#2298 to track this as an Npgsql-specific feature, but we can always promote it into EF Core's relational layer if multiple other databases add support.

mqudsi · 2022-03-12T22:04:37Z

@roji Thanks for the interest in including support for this in Npgqsl!

I can't seem to find the article I was reading that mentioned support for another db, so perhaps I shouldn't have said others might be adding support.

(Technically the equivalent of a unique index that includes n non-distinct nullable columns (current pgsql approach is that either all or none of the nullable columns are treated as distinct, which helps preserve our sanity) is equivalent to 2^n unique indexes like on table(col1, col2) where col3 is null, on table(col1, col3) where col 2 is null, ... which an ORM could easily calculate, but I'm not sure it would be a smart thing to do unless the dev fully understands the implications!)

Regarding exclusion constraints - I looked rather hard but it seems that postgresql requires an actual symbolic operator rather than an equivalent expression for exclusion constraints (and there is none equivalent to IS DISTINCT FROM, excepting custom operators), but perhaps I missed something? While playing around with that, I did discover an alternative approach that might work even on other db engines: if you abuse the fact that (and I don't know if this, syntax aside, is postgres-specific) ARRAY[NULL] = ARRAY[NULL] then you can actually just create just one functional/computed index like

CREATE UNIQUE INDEX ON table( (ARRAY[notnull1, notnull2, nullable1, nullable2, nullable3]) ) 
        [ WHERE num_nulls(nullable1, nullable2, nullable3) > 0 ]

(You could optionally do USING HASH since we will never look up on its basis, depending on how often you expect a collision leading to a disk access, iiuc?)

(I never did reply to your question about why PostgreSQL's ability to name specific columns for a ON CONFLICT clause would be a problem if the shared functionality with MySQL's barebones INSERT IGNORE (an ON CONFLICT with no named columns) isn't a problem. And you're right: I just got so tangled up in my thinking of how best to abuse pgsql's features for my purposes that I completely missed the forest for the trees.)

roji · 2022-03-14T10:07:37Z

Thanks for the continued insights - this is definitely interesting and valuable.

(Technically the equivalent of a unique index that includes n non-distinct nullable columns (current pgsql approach is that either all or none of the nullable columns are treated as distinct, which helps preserve our sanity) is equivalent to 2^n - 1 unique indexes like on table(col1, col2) where col3 is null, on table(col1, col3) where col 2 is null, ... which an ORM could easily calculate, but I'm not sure it would be a smart thing to do unless the dev fully understands the implications!)

I'm not sure exactly what you mean here - can you elaborate?

Regarding exclusion constraints - I looked rather hard but it seems that postgresql requires an actual symbolic operator rather than an equivalent expression for exclusion constraints (and there is none equivalent to IS DISTINCT FROM, excepting custom operators), but perhaps I missed something?

That's interesting - and I think you're right, exclusion constraints seem to only work with basic operators, which IS DISTINCT FROM apparently is not. I think this could also mean that IS DISTINCT FROM may not (currently) be sped up with indexes, which I vaguely remember is indeed the case.

mqudsi · 2022-03-22T01:42:23Z

I'm not sure exactly what you mean here - can you elaborate?

Sure. Let's say you have multiple one-to-many mappings all for the same type, and for some reason it's going through a mapping table instead of having the principal key on each foreign entity directly: e.g. owners(user_id, document_id?, product_id?, foo_id?, ...) and you want to make sure no one document/product/foo is owned twice. With PostgreSQL 15 you could use

ALTER TABLE owners ADD CONSTRAINT one_owned_type CHECK( num_nonnulls(document_id, product_id, foo_id) = 1 );
CREATE UNIQUE INDEX owned_only_once on owners(document_id, product_id, foo_id) NULLS NOT DISTINCT;

You can accomplish the functional equivalent in current versions of PostgreSQL (or other databases) by creating multiple indexes that handle the various nullability cases:

/* some of the following index combinations will never be used because of our CHECK CONSTRAINT, but I've left them to illustrate the general approach */
CREATE UNIQUE INDEX owned_only_once1 on owners(document_id, product_id, foo_id) WHERE document_id IS NOT NULL AND product_id IS NOT NULL AND foo_id IS NOT NULL);
CREATE UNIQUE INDEX owned_only_once2 on owners(document_id, product_id) WHERE document_id IS NOT NULL AND product_id IS NOT NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once3 on owners(document_id, foo_id) WHERE document_id IS NOT NULL AND product_id IS NULL AND foo_id IS NOT NULL;
CREATE UNIQUE INDEX owned_only_once4 on owners(document_id) WHERE document_id IS NOT NULL AND product_id IS NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once5 on owners(product_id, foo_id) WHERE document_id IS NULL AND product_id IS NOT NULL AND foo_id IS NOT NULL;
CREATE UNIQUE INDEX owned_only_once6 on owners(product_id) WHERE document_id IS NULL AND product_id IS NOT NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once7 on owners(foo_id) WHERE document_id IS NULL AND product_id IS NULL AND foo_id IS NOT NULL;
/* if you had non-nullable columns included in the index, this additional index would be required: */
CREATE UNIQUE INDEX owned_only_once8 on owners(/* non-nullable member cols here */) WHERE document_id IS NULL AND product_id IS NULL AND foo_id IS NULL;

Those indexes should be functionally equivalent (from the perspective of what entries are allowed or rejected) to the NULLS NOT DISTINCT index from postgres 15, just much more of a pita to maintain. I believe the pg15 impl also performs better, though the size (minus constant per-index sizes) should be the same w/ all the WHERE preconditions left in.

(I realized after writing that the example I picked is not the best since it's typically going to have the only-one-non-null constraint for sanity reasons. I couldn't come up with a better example off the top of my head, so I just kept it!)

roji · 2022-03-22T15:35:31Z

Thanks @mqudsi, makes sense.

Note SqlServerIndexConvention, which is a SQL Server-specific convention that filters out NULLs on unique non-clustered indexes.

But let's continue any conversation on this in npgsql/efcore.pg#2298, as this is quite off-topic here.

dragoon · 2022-07-12T13:06:06Z

I also think this is a very important feature that even most basic app needs.

Consider a scenario where there is a 3rd-party data provider, and I can only query last N objects, that I later transform and insert in my db. Without this functionality I need to first check which object already exists by first querying the db, excluding the existing id, and only inserting the new ones.

roji · 2022-10-19T07:05:56Z

Note #14464 which was about using the following technique to achieve "add or ignore":

INSERT INTO Users (UserName, Email)
SELECT @UserName, @Email
EXCEPT 
SELECT UserName, Email
FROM Users
WHERE Email = @Email

On databases where a specific syntax already exists for add or ignore, that's obviously best. On SQL Server - where no such thing exists - the above could be an alternative implementation. Another way to approach this would be with MERGE - which may be more efficient - though MERGE would probably have concurrency issues (all this needs to be researched).

borisdj · 2023-05-08T10:22:42Z

To add one solution, using EFCore.BulkExtensions this can be done with

If we want Insert only new and skip existing ones in Db (Insert_if_not_Exist) then use BulkInsertOrUpdate with config PropertiesToIncludeOnUpdate = new List<string> { "" }

Disc. I'm the author, and Lib. now has Dual License - a fix for OSS sustainability.
(Community version is free and covers most users, Commercial one is for companies over 1 mil. revenue)

kitgrose · 2024-05-15T11:07:50Z

I would really like to see something like this supported in EF Core.

My specific use-case: I'm writing a (simple) data importer which periodically imports CSV data into my application database. The data being imported in each CSV is relatively small, but the operation happens often enough that the database is expected to be very large over time, and the operation may run in parallel with other imports. Since the data in the database only comes from the same source, it's never an issue if a new record can't be added (because its primary key is already in the table); in fact it's very likely since the CSV data deliberately contains an hour or so of overlapping data each import to avoid missing records added around the time of export.

On SQL Server this can be achieved with the IGNORE_DUP_KEY argument on the index, but using that throws exceptions in EF Core, presumably because of entity change tracking. I'm sure you're already familiar with this property, but I struggled to find this issue again because none of the other comments mention that specific argument by name.

roji · 2024-05-15T15:23:47Z

@kitgrose on SQL Server, this kind of thing is typically achieved via the MERGE statement, which is presumably what we'd use to implement the future insert-or-ignore API. For now, you should be able to drop down to SQL to code this specific importer.

ajcvickers closed this as completed Aug 5, 2019

ajcvickers reopened this Aug 5, 2019

ajcvickers added this to the Backlog milestone Aug 12, 2019

ajcvickers added the type-enhancement label Aug 12, 2019

ajcvickers added the customer-reported label Aug 21, 2019

ajcvickers added the area-save-changes label Aug 12, 2021

roji mentioned this issue Sep 2, 2021

Merge/Upsert/AddOrUpdate support #4526

Open

AndriySvyryd changed the title ~~Add support for INSERT IGNORE~~ Add support for INSERT IGNORE / ON CONFLICT DO NOTHING Nov 4, 2021

AndriySvyryd added the needs-design label Nov 4, 2021

AndriySvyryd mentioned this issue Nov 4, 2021

Update of an non-existing, optional entity ends in DbUpdateConcurrencyException #19508

Open

AndriySvyryd mentioned this issue Jan 14, 2022

Improve database updates #22959

Open

52 tasks

roji mentioned this issue Mar 11, 2022

Add support for specifying NULLS NOT DISTINCT for unique indexes npgsql/efcore.pg#2298

Closed

roji mentioned this issue Oct 19, 2022

Option to avoid attempting to insert duplicate rows #14464

Closed

AndriySvyryd mentioned this issue Apr 5, 2023

Improve experience for optimistic concurrency #14496

Open

12 tasks

roji mentioned this issue Jun 13, 2023

Flexible HasData seeding behaviors #31072

Closed

zoriya mentioned this issue May 15, 2024

duplicate keys zoriya/Kyoo#498

Open

roji mentioned this issue Aug 30, 2024

SqlString comparison regression bug dotnet/runtime#99507

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949

Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949

shravan2x commented Aug 5, 2019 •

edited

Loading

ErikEJ commented Aug 5, 2019

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019 •

edited

Loading

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019 •

edited

Loading

ajcvickers commented Aug 5, 2019 •

edited

Loading

ajcvickers commented Aug 12, 2019

LeaFrock commented Oct 28, 2019 •

edited

Loading

mqudsi commented Apr 16, 2020

ajcvickers commented Apr 17, 2020

tombohub commented Sep 2, 2021

shravan2x commented Sep 2, 2021

roji commented Sep 2, 2021

mqudsi commented Sep 2, 2021 •

edited

Loading

roji commented Sep 3, 2021

mqudsi commented Mar 10, 2022 •

edited

Loading

roji commented Mar 11, 2022

mqudsi commented Mar 12, 2022 •

edited

Loading

roji commented Mar 14, 2022

mqudsi commented Mar 22, 2022 •

edited

Loading

roji commented Mar 22, 2022 •

edited

Loading

dragoon commented Jul 12, 2022

roji commented Oct 19, 2022

borisdj commented May 8, 2023 •

edited

Loading

kitgrose commented May 15, 2024

roji commented May 15, 2024

Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949

Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949

Comments

shravan2x commented Aug 5, 2019 • edited Loading

ErikEJ commented Aug 5, 2019

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019 • edited Loading

ajcvickers commented Aug 5, 2019

shravan2x commented Aug 5, 2019 • edited Loading

ajcvickers commented Aug 5, 2019 • edited Loading

ajcvickers commented Aug 12, 2019

LeaFrock commented Oct 28, 2019 • edited Loading

mqudsi commented Apr 16, 2020

ajcvickers commented Apr 17, 2020

tombohub commented Sep 2, 2021

shravan2x commented Sep 2, 2021

roji commented Sep 2, 2021

mqudsi commented Sep 2, 2021 • edited Loading

roji commented Sep 3, 2021

mqudsi commented Mar 10, 2022 • edited Loading

roji commented Mar 11, 2022

mqudsi commented Mar 12, 2022 • edited Loading

roji commented Mar 14, 2022

mqudsi commented Mar 22, 2022 • edited Loading

roji commented Mar 22, 2022 • edited Loading

dragoon commented Jul 12, 2022

roji commented Oct 19, 2022

borisdj commented May 8, 2023 • edited Loading

kitgrose commented May 15, 2024

roji commented May 15, 2024

shravan2x commented Aug 5, 2019 •

edited

Loading

shravan2x commented Aug 5, 2019 •

edited

Loading

shravan2x commented Aug 5, 2019 •

edited

Loading

ajcvickers commented Aug 5, 2019 •

edited

Loading

LeaFrock commented Oct 28, 2019 •

edited

Loading

mqudsi commented Sep 2, 2021 •

edited

Loading

mqudsi commented Mar 10, 2022 •

edited

Loading

mqudsi commented Mar 12, 2022 •

edited

Loading

mqudsi commented Mar 22, 2022 •

edited

Loading

roji commented Mar 22, 2022 •

edited

Loading

borisdj commented May 8, 2023 •

edited

Loading