-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for INSERT IGNORE / ON CONFLICT DO NOTHING #16949
Comments
What Database provider? |
EF Team Triage: This issue is not something that our team is planning to address in the EF6.x code base. This does not mean that we would not consider a community contribution to address this issue. Moving forwards, our team will be fixing bugs, implementing small improvements, and accepting community contributions to the EF6.x code base. Larger feature work and innovation will happen in the EF Core code base (https://github.com/aspnet/EntityFramework). Closing an issue in the EF6.x project does not exclude us addressing it in EF Core. In fact, a number of popular feature requests for EF have already been implemented in EF Core (alternate keys, batching in SaveChanges, etc.). BTW this is a canned response and may have info or details that do not directly apply to this particular issue. While we'd like to spend the time to uniquely address every incoming issue, we get a lot traffic on the EF projects and that is not practical. To ensure we maximize the time we have to work on fixing bugs, implementing new features, etc. we use canned responses for common triage decisions. |
@ErikEJ MySql.Data and MySqlConnector (I tested on both). @ajcvickers Isn't this already the EF Core repo? I'm confused what you mean by your comment. |
@shravan2x It's a boilerplate message--see the note at the end of it. The point is we're not going to do it for EF6. |
@ajcvickers Oh I see. I don't know which repo hosts the EF6 code base, but since this one is already the EF Core code base, can we leave the issue open? EDIT: Hopefully I understood what you meant :) |
@shravan2x Are you asking for this to be implemented in EF Core or EF6? |
@ajcvickers EF Core. Sorry, my original question was mis-worded. I'll fix that. |
@shravan2x We will discuss in triage. |
Notes from triage: it's not 100% clear that this is really useful in the context of an OR/M, but moving it to the backlog to consider at a later time. |
At least for MySql, it's very useful. |
@ajcvickers can you explain what your doubts about its suitability for an ORM are? I'm not sure I see the difference between this or other SQL translations. When attempting to insert data that may already be present in the database without insert or ignore, you basically need to either prepopulate some sort of bloom filter or hash set beforehand and pray you don't run into any coherence issues or else use fine-grained transactions and throw-and-catch lots of exceptions. While the SQL specifics are certainly provider-dependent (just like everything else), its certainly not a concept specific to a single SQL dialect. Please correct me if I'm wrong, but we have a general concept (need to insert content that may or may not already exist in the database and either update/abort/ignore if it already does) that is available in the most popular database options supported/targeted by EF Core, but requires different syntax for each implementation, which to me makes it definitely ORM-worthy. The only issue is that it does not map cleanly to LINQ because LINQ expressions don't really have any concept of constraints. But for the most part, updates and inserts are orthogonal to querying and I imagine this could be implemented without touching the |
@mqudsi I don't remember the full extent of the discussion with the team. Nevertheless, this is something we have on the backlog to consider for a future release. |
Does anybody have a workaround? |
@tombohub I use this package which works fine - https://github.com/artiomchi/FlexLabs.Upsert |
This is probably related to upsert support (#4526) - the two should probably be at least considered together. On PostgreSQL, INSERT has an ON CONFLICT clause. ON CONFLICT DO NOTHING is the same as INSERT IGNORE, whereas ON CONFLICT DO UPDATE is upsert.
|
An issue/problem is that arbitrary columns may be passed to PostgreSQL One would be forgiven for thinking that this particular sub-feature could be elided from any ignore/upsert implementation, but such an abstraction would be severely crippled because any unique constraints that include a nullable column would never trigger (because (With postgres, you could also use (one or more) partial index(es) but that would require the ORM ignore/upsert API to additionally support/require a Otherwise, multiple indices covering all permutations of one nullable column + all other columns would be required. |
Interesting - what specific issues do you see with PG's arbitrary column support, and how do you see INSERT IGNORE this as more ORM-friendly? At the very least, the PG provider could generate ON CONFLICT IGNORE without any columns, providing (I think) the same behavior as MySQL, no? In that sense PG seems to be a superset of the MySQL functionality?
I don't think this is really worse than other places where different database behavior leaks - EF Core doesn't really pretend to provide a uniform abstraction over all databases (e.g. string comparison is case-insensitive in some databases, insensitive in others). In other words, it may make sense to provide a general relational "add or ignore" mechanism, which would be implemented by ON CONFLICT on PG and INSERT IGNORE in MySQL, even if there are some behavioral differences. But I admit I haven't gone into the fine details here. But your points (and suggested workarounds) are interesting... Note that PG also has exclusion constraints, which could be used with the |
Not directly related to the issue but with regards to how it would be implemented and my earlier comments about the difficulties in using a unique index covering potentially null fields to handle insert ignore conflicts: PostgreSQL 15 is going to ship w/ support for More here: commit and discussion |
@mqudsi thanks for pointing that out. Do you have any indication of other databases implementing this feature? For now, I've opened npgsql/efcore.pg#2298 to track this as an Npgsql-specific feature, but we can always promote it into EF Core's relational layer if multiple other databases add support. |
@roji Thanks for the interest in including support for this in Npgqsl! I can't seem to find the article I was reading that mentioned support for another db, so perhaps I shouldn't have said others might be adding support. (Technically the equivalent of a unique index that includes n non-distinct nullable columns (current pgsql approach is that either all or none of the nullable columns are treated as distinct, which helps preserve our sanity) is equivalent to 2^n unique indexes like Regarding exclusion constraints - I looked rather hard but it seems that postgresql requires an actual symbolic operator rather than an equivalent expression for exclusion constraints (and there is none equivalent to CREATE UNIQUE INDEX ON table( (ARRAY[notnull1, notnull2, nullable1, nullable2, nullable3]) )
[ WHERE num_nulls(nullable1, nullable2, nullable3) > 0 ] (You could optionally do (I never did reply to your question about why PostgreSQL's ability to name specific columns for a |
Thanks for the continued insights - this is definitely interesting and valuable.
I'm not sure exactly what you mean here - can you elaborate?
That's interesting - and I think you're right, exclusion constraints seem to only work with basic operators, which IS DISTINCT FROM apparently is not. I think this could also mean that IS DISTINCT FROM may not (currently) be sped up with indexes, which I vaguely remember is indeed the case. |
Sure. Let's say you have multiple one-to-many mappings all for the same type, and for some reason it's going through a mapping table instead of having the principal key on each foreign entity directly: e.g. ALTER TABLE owners ADD CONSTRAINT one_owned_type CHECK( num_nonnulls(document_id, product_id, foo_id) = 1 );
CREATE UNIQUE INDEX owned_only_once on owners(document_id, product_id, foo_id) NULLS NOT DISTINCT; You can accomplish the functional equivalent in current versions of PostgreSQL (or other databases) by creating multiple indexes that handle the various nullability cases: /* some of the following index combinations will never be used because of our CHECK CONSTRAINT, but I've left them to illustrate the general approach */
CREATE UNIQUE INDEX owned_only_once1 on owners(document_id, product_id, foo_id) WHERE document_id IS NOT NULL AND product_id IS NOT NULL AND foo_id IS NOT NULL);
CREATE UNIQUE INDEX owned_only_once2 on owners(document_id, product_id) WHERE document_id IS NOT NULL AND product_id IS NOT NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once3 on owners(document_id, foo_id) WHERE document_id IS NOT NULL AND product_id IS NULL AND foo_id IS NOT NULL;
CREATE UNIQUE INDEX owned_only_once4 on owners(document_id) WHERE document_id IS NOT NULL AND product_id IS NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once5 on owners(product_id, foo_id) WHERE document_id IS NULL AND product_id IS NOT NULL AND foo_id IS NOT NULL;
CREATE UNIQUE INDEX owned_only_once6 on owners(product_id) WHERE document_id IS NULL AND product_id IS NOT NULL AND foo_id IS NULL;
CREATE UNIQUE INDEX owned_only_once7 on owners(foo_id) WHERE document_id IS NULL AND product_id IS NULL AND foo_id IS NOT NULL;
/* if you had non-nullable columns included in the index, this additional index would be required: */
CREATE UNIQUE INDEX owned_only_once8 on owners(/* non-nullable member cols here */) WHERE document_id IS NULL AND product_id IS NULL AND foo_id IS NULL; Those indexes should be functionally equivalent (from the perspective of what entries are allowed or rejected) to the (I realized after writing that the example I picked is not the best since it's typically going to have the only-one-non-null constraint for sanity reasons. I couldn't come up with a better example off the top of my head, so I just kept it!) |
Thanks @mqudsi, makes sense. Note SqlServerIndexConvention, which is a SQL Server-specific convention that filters out NULLs on unique non-clustered indexes. But let's continue any conversation on this in npgsql/efcore.pg#2298, as this is quite off-topic here. |
I also think this is a very important feature that even most basic app needs. Consider a scenario where there is a 3rd-party data provider, and I can only query last N objects, that I later transform and insert in my db. Without this functionality I need to first check which object already exists by first querying the db, excluding the existing id, and only inserting the new ones. |
Note #14464 which was about using the following technique to achieve "add or ignore": INSERT INTO Users (UserName, Email)
SELECT @UserName, @Email
EXCEPT
SELECT UserName, Email
FROM Users
WHERE Email = @Email On databases where a specific syntax already exists for add or ignore, that's obviously best. On SQL Server - where no such thing exists - the above could be an alternative implementation. Another way to approach this would be with MERGE - which may be more efficient - though MERGE would probably have concurrency issues (all this needs to be researched). |
To add one solution, using EFCore.BulkExtensions this can be done with
Disc. I'm the author, and Lib. now has Dual License - a fix for OSS sustainability. |
I would really like to see something like this supported in EF Core. My specific use-case: I'm writing a (simple) data importer which periodically imports CSV data into my application database. The data being imported in each CSV is relatively small, but the operation happens often enough that the database is expected to be very large over time, and the operation may run in parallel with other imports. Since the data in the database only comes from the same source, it's never an issue if a new record can't be added (because its primary key is already in the table); in fact it's very likely since the CSV data deliberately contains an hour or so of overlapping data each import to avoid missing records added around the time of export. On SQL Server this can be achieved with the |
@kitgrose on SQL Server, this kind of thing is typically achieved via the MERGE statement, which is presumably what we'd use to implement the future insert-or-ignore API. For now, you should be able to drop down to SQL to code this specific importer. |
I propose adding an API to indicate to EF Core that an insert (via
Add
orAddRange
and variants) may fail. This is useful when adding data to a table that may already exist.For instance, I have a program that collects analytics data from a source and inserts it all into a table. Some of these rows may be duplicates of previously inserted rows and so will fail unique constraints. These errors can be safely ignored since the rows already exist in the table.
Using individual
INSERT IGNORE ...
statements results in horrible performance (~15x slower thanAddRangeAsync
).The text was updated successfully, but these errors were encountered: