-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement bulk deletion #27550
Comments
Note: the problem here is concurrency detection, see #27562. |
NOTE: See below, the concurrency check can be partially solved. If we do decide to keep concurrency checks, we could do a single query after multiple updates to get the number of rows with the IDs we wanted to delete (and expect zero to come back). However, if it's possible for keys to repeat (i.e. not IDENTITY/GUID), then it's possible for a concurrent insert to add the row we just deleted, causing a spurious concurrency exception. Note that this is a problem even if we're in a transaction, since newly-committed rows do appear from outside the transaction; the only way to protect against this AFAIK is via the serializable isolation level, which we definitely shouldn't do. Bottom line, unless we decide to get rid of the concurrency checks, we probably shouldn't do anything here. |
Thinking about it some more, it seems the the concurrency check isn't actually a problem here. One can do: DELETE FROM Foo WHERE id IN (1, 2, 3) RETURNING 1; This returns a row with 1 for each row deleted. This allows us to detect when a concurrency issue, but does not allow us to know exactly which entry was the problematic one. The above assumes no concurrency token. When there is one, we can use row values for most databases (but not SQL Server): DELETE FROM Foo WHERE (id, token) IN ((1, 2), (3, 4)) RETURNING 1; |
Design discussion:
|
Note to self: when implementing this, note that a single bulk delete statement is not safe for running without transactions, as long as concurrency checking is enabled - for the same reasons as in #27532 (comment). In other words, if less rows were deleted than expected, we want to throw and roll back, but if there's no transaction the change has already been committed and there's nothing to roll back. |
The SQL Server provider implements bulk insertion, i.e. collapsing multiple insertions into a single multi-row MERGE (or INSERT) statement. We could do the same with deletion, so instead of multiple DELETE statements, we'd have:
Using an array parameter in PostgreSQL (to preserve same SQL/prepared statement):
For multiple conditions (composite keys and/or concurrency token):
Or for SQL Server, which does not support row values here:
For SQL Server specifically, MERGE can also be used (but no advantage apparently):
Benchmark with one condition (non-composite key, no concurrency token):
Benchmark results with composite key
Notes/caveats:
DELETE FROM Foo WHERE (Id1, Id2) IN ((1, 2), (3, 4))
Single-condition benchmark code
Multiple-condition benchmark code
The text was updated successfully, but these errors were encountered: