-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: database/sql: support a way to perform bulk actions #5171
Comments
Of note, doing this well would require new database/sql API for both the driver and the front end code. Having implemented several protocols that include a bulk copy method, this needs to be called out as different, as much of the control that is available in a Insert statement is exposed differently then in SQL. I'm wary of any suggestions to bind to an array, as arrays are legitimate data types in several rdbms. I'll be implementing a general interface shortly in the rdb front end. Send me a line if you'd like to discuss. |
For what it's worth - I've just tried using the lib/pq driver's Copy functionality to do bulk loading, and although I can't comment on whether this would work for other DB drivers, it seems like a reasonable API. |
The standard syntax for multi-value INSERT is (from PostgreSQL documentation): INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); What about adding a Value type that can be passed as the args argument to Exec or Query? As an example: type Value []interface{} // defined in the sql package
batch := []Value
for i := 0; i < N; i++ {
batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch) This will not require any changes to the existing interface. |
This will require holding data for the whole batch in memory. Also this proposal doesn't allow to batch updates if supported by database. FYI there is implementation of batch insert in github.com/lib/pq based on COPY. |
On Mon, Mar 7, 2016 at 4:17 PM, kostya-sh [email protected] wrote:
I will be happy with just the support for multi-value INSERT, since it is type Tuple []interface{} type Values []Tuple FYI there is implementation of batch insert in github.com/lib/pq based on
COPY is not standard, and the github.com/lib/pq seems (just looking at the |
Yes, this would require to support multiple return values though.
COPY is not standard indeed but it is fast and the driver doesn't hold the whole batch in memory. Have a look at the implementation at https://github.com/lib/pq/blob/master/copy.go I agree it would be nice to have generic batch API but it is quite difficult to design a single API that will allow drivers to choose the optimal method to implement batched operations. I think using driver library directly is a quite good compromise. BTW, in postgresql it is also possible to use the following SQL for bulk insert:
I haven't use it though and I don't know if Go driver supports arrays. |
On Mon, Mar 7, 2016 at 5:47 PM, kostya-sh [email protected] wrote:
This is what I was speaking about. And it is not PostgreSQL specific, but In Go, it can be defined, e.g.: type Tuple []interface{} // Since Row and Value are already defined type Values []Tuple This have the advantage that a Values value can be specified as a parameter |
@perillo I tried your method: type Value []interface{} // defined in the sql package
batch := []Value
for i := 0; i < N; i++ {
batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch) but results in the error:
|
Totally beyond the scope of golang. You could always manually open a transaction, process all your inserts individually (don't worry there's connection pooling), and commit the transaction. Avoiding the overhead of the implicit transaction, on each iteration, will be huge win. For accessing the bulk features of the various RDBMS, like bcp in SQL Server, for example you can always save the csv to disk and use exec to run the batch. |
@pimbrouwers I disagree. I think it would be great to create a standard bulk data interface. Yes, opening a transaction will increase the speed of many inserts. But bulk interfaces are also useful. I personally hate relying on native commands like |
Unless the standard interface has platform specific adapters, it won't work, because there isn't an adhered to standard for bulk action, for example SQL Server uses As far as relying on the presence of bulk tools, I see nothing wrong with that. You physically cannot install SQL Server with bcp, which SQL Server relies upon internally for so many things. So the presence of SQL Server means bcp is also present. |
Yes, like query protocols, each would need a driver. I run my app on Linux and Ms SQL server on Windows. The app won't have bcp installed. Often different boxes or app in bare container. Or database hosted as a service. |
I think a way to batch values before sending them is needed. It would save network roundtrips For reference jdbc provides 3 functions:
Statements could implement 2 methods to mirror this functionality:
Then calling an Exec/ExecContext would send the batched statements to the server. |
Related to golang#5171 Change-Id: I46a6d12b46d3802a338e5733ca81e8a0fb2ae125
I crafted a proposal in my own fork. I'm open to comments. If we get to something definitive, I'll write the necessary units tests and submit a CL. I'm also thinking of preparing a reference implementation on a fork of First let's talk if the proposed API is useful. I would like to note I'm a PostgreSQL / Driver interfaceIn order to truly benefit from increased performance and less (or single) round trips, a bulk action should happen asynchronous. If that is not the case, a prepared statement would suffice anyway. Currently In order to properly describe and work with the asynchronous relationship of data, result and error, I would like to propose the use of channels. The driver shall return a write-only
To be discussed:
sql package interfaceHere I went with an interface type, passed to If the
The rest of the |
Cool, thank you @muhlemmer for the proposal ideas. This issue is very old and I didn’t want to edit the original issue, but given your elaborate post in #5171 (comment), perhaps let’s reuse this issue as is and am kindly putting it on the radar of the @golang/proposal-review team to skip to the linked comment. |
The text was updated successfully, but these errors were encountered: