-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk import for efficient importing of data from the client into the database #27333
Comments
We might need an overload with anonymous types for shadow properties. Note that this would also allow inserts for keyless entity types. So we might need a non-collection overload too. |
Yep..
Can you elaborate? |
Just some sugar: ctx.KeylessBlogs.ImportFrom(new KeylessBlog()); |
Ah I see - you're saying this would be the only way to insert keyless entity types (because we don't support them in Add/SaveChanges), so it makes sense to allow "bulk import" of a single instance... Right. |
Note: we have #9118 for optionally not tracking after SaveChanges, which would unlock using SqlBulkCopy for standard SaveChanges. At that point, the advantage of a dedicated bulk import API (this issue) becomes bypassing the change tracking machinery, which may or may not be worth it. |
Vote for taking a very close look at EFCore.BulkExtensions. The most underrated .net OSS libraries I use. It already respects EFCore table name mappings and column/property mappings. It supports all the "batch" scenarios: "upsert", insert only (only new), and "delete if missing" (MERGE with DELETE clause). I've needed all those scenarios. All way faster than current EFCore for 10k+ rows (aka importing a full dataset daily/monthly/x). It supports pulling back new identity values into entities, but off by default, which I think is sensible for bulk scenarios. But if you can reuse roji's RETURNING work here might speed it up. |
@yzorg thanks, yeah - we're aware of EFCore.BulkExtensions. The plan specifically in this issue is only to wrap the database's native bulk import mechanism - SqlBulkCopy for SQL Server, binary COPY for PostgreSQL, etc. These lower-level API generally allow simply copying in large quantities of data into a table in the most efficient manner possible. Various "merge" functionality such as upserting, deleting if missing, etc. are higher-level functionality which generally isn't covered by a database's bulk import mechanism, and are implemented by using additional database functionality. For example, it's typical to first use bulk import to efficiently copy data into a temporary table (e.g. via SqlBulkCopy), and then use MERGE to perform upsert/delete-if-missing/whatever between the temp table and the target table. Such "compound" operations would be out of scope for this specific issue, but it would provide the first building block (bulk import). You could then use raw SQL to express whatever MERGE operation you want (we could possibly even provide an EF API for that, though I'm not sure it would be very valuable). |
https://www.npgsql.org/doc/copy.html#binary-copy is rather crude. I need to ingest tens of thousands of record from Excel sheet into the DB without any duplicate. |
@khteh in what way is Npgsql's binary copy support crude? Deduplicating isn't a concern of bulk import in itself - that's something that needs to be done at a higher level (and therefore wouldn't be covered here in any case). More details on exactly what you need could help clarify things. |
Would love to see this support IAsyncEnumerable as the input, in case we're pulling from another data source like mongo into sql. |
Databases usually provide various mechanisms for efficient bulk import of data from the client (e.g. .NET application) into the database. We already have issues tracking improving INSERT performance (#15059, #9118, #10443), but bulk import is a specialized scenario where the native database mechanisms work more efficiently (and often considerably so) than multiple INSERTs. Note also that this is different from bulk copy of data across tables within the database (
INSERT ... SELECT
, tracked by #27320).Bulk import only allows targeting a table, so a simple method on DbSet should suffice (LINQ operators don't make sense here):
ImportFrom should accept an IEnumerable parameter; ImportFromAsync should have overloads accepting both IEnumerable and IAsyncEnumerable parameters. The method would pull entity instances and import them into the database table via the provider-specific mechanism.
Additional notes
Database support
Community implementations
The text was updated successfully, but these errors were encountered: