-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve BaseBuilder::updateBatch()
SQL
#6373
Conversation
Having some trouble with SQLite. What version are we running?
SQLite older than 3.33.0 will use the old SQL. |
1e78564
to
dd42a1a
Compare
Just SQLite3.
|
SQLite version must be determined by the driver. If driver is 3.33 or newer version than we use new method, otherwise the overide method is the old method. I'm not sure if you can update the driver to the newer one.. probably depends on version of php. protected function _updateBatch(string $table, array $values, string $index): string
{
if ((float) $this->db->getVersion() >= 3.33) {
return parent::_updateBatch($table, $values, $index);
} |
|
I tested casting to float. It basically seems to knock everything off from the second decimal.
|
It seems the behavior is not documented. Unless it is documented, we should not rely on the current behavior. Or you can use |
Thanks, didn't know about that function. |
8cd7717
to
e231617
Compare
e231617
to
869064c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and this is more elegant!
But I want some other reviews.
BaseBuilder::updateBatch()
SQL
BaseBuilder::updateBatch()
SQLBaseBuilder::updateBatch()
SQL
I benchmarked with CSV upload and updateBatch(). 149,970 records |
#6407 has a slightly better implementation as most of the query is cached and doesn't need rebuilt for each batch run. Also, while performance is important, there are other considerations in the design of the new query. It allows multiple keys to update on where the current implementation only allows one. The majority of the tables I work with have composite keys so this is very helpful to me. Another thing is that you can use a query as a datasource. Although its not implemented here you could also join a table to the dataset. I have made quite a few changes in my all inclusive PR #6407 I could update this PR with some of them. |
Another thing you can do with this PR is update a field that isn't in the dataset or filter dataset by field and not use it in update: UPDATE `db_update_batch` AS t
INNER JOIN (
SELECT 1 `primary_key`, 1 `second_key`, 'Name 1-1' `name`, '2022-08-24 12:56:11' `created_date` UNION ALL
SELECT 1 `primary_key`, 2 `second_key`, 'Name 1-2' `name`, '2022-08-26 09:22:36' `created_date` UNION ALL
SELECT 2 `primary_key`, 1 `second_key`, 'Name 2-1' `name` , '2022-08-26 09:31:57' `created_date`
) u
ON t.`primary_key` = u.`primary_key` AND t.`second_key` = u.`second_key` AND DATE(u.`created_date`) = CURDATE()
SET
t.`name` = u.`name`,
t.`last_update` = CURRENT_TIMESTAMP() This only updates records from the dataset that were created today. This query is possible with this PR using RawSql. |
This one will only update records where the table data doesn't match the update data. Then only if they don't match will update the record and change the last_update. UPDATE `db_update_batch` AS t
INNER JOIN (
SELECT 1 `primary_key`, 'Cat 3' `category`, 'Name 1-1' `name` UNION ALL
SELECT 1 `primary_key`, 'Cat 5' `category`, 'Name 1-2' `name` UNION ALL
SELECT 2 `primary_key`, 'Cat 5' `category`, 'Name 2-1' `name`
) u
ON t.`primary_key` = u.`primary_key` AND (t.`category` != u.`category` OR t.`name` != u.`name`)
SET
t.`name` = u.`name`,
t.`last_update` = CURRENT_TIMESTAMP() This gives you an accurate last_update because there truly was an update. |
@sclubricants Thank you for your work. #6407 does not help me. Because it is too big to understand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these changes. However, the most important element is missing. Users won't use the new features if they don't know about them.
We need to explain and demonstrate the new capabilities with examples in the user guide. We also need to indicate in which situations new features will not work (SQLite < 3.33.0)
As for other PRs - let's stick to one rule: 1 PR = 1 enhancement. This way, it's easier for everyone to follow the main purpose and agree/disagree on the feature.
@michalsn |
The generated SQL statements will be completely changed. So at least we should add it in the changelog. |
Oh... okay, sorry - my mistake. I was browsing the code from the linked PR and mixed things up. Please, just add the changelog, and we will be ready. After merging, we can go forward with your next proposals/ideas. Much appreciate. |
|
@sclubricants Can you fix PHPCPD error? |
@kenjis Its not as bad as it looks. A single statement can take 11 lines. $data = implode(
" UNION ALL\n",
array_map(
static fn ($value) => 'SELECT ' . implode(', ', array_map(
static fn ($key, $index) => $index . ' ' . $key,
$keys,
$value
)),
$values
)
) . "\n"; The top 6 lines will go away once things get refactored. I wasn't getting this error until I added these. // this is a work around until the rest of the platform is refactored
if ($index !== '') {
$this->QBOptions['constraints'] = [$index];
}
$keys = array_keys(current($values)); Can we suppress these somehow? |
Add
|
I wish it had a way to suppress certain lines instead of excluding the whole file. |
It seems you used Could you rebase? |
I usually do rebase but I did it through the web interface and it created a merge. |
This will be a place to store different options that can be used on queries. I thought about using $QBData but felt this could be confused with the data in QBSet.
This method removes the key fields from the update fields. It also handles using RawSql.
This is a bit more refined then the previous version. This version caches most of the query after the first run of the batch. It also is ready to accept various options that can be fed to it.. perhaps in a later PR. For instance the fields updated can be extracted from the field set as is currently implemented or they could be explicitly set before hand. Likewise the data is set by QBSet data but could potentially be fed a query instead.
entry was in the wrong place
Co-authored-by: kenjis <[email protected]>
c9999cf
to
c4cbaa7
Compare
Thank you for rebasing. |
|
||
return 'UPDATE ' . $this->compileIgnore('update') . $table . ' SET ' . substr($cases, 0, -2) . $this->compileWhereHaving('QBWhere'); | ||
return sprintf($sql, $data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sprintf()
was introduced here.
This PR redesigns the SQL queries generated in
_batchUpdate()
.This implementation provides more elegant and smaller queries. This method selects the data into a psudo in memory table which provides more flexibility and availability to do more things with.
Because it treats the dataset as a table, a query or a table can be substituted for the dataset. This will be helpful in future ambitions.
Here is a sample of the difference in MySQL:
Old way - 982 Characters - This gets really hard to read when 100 rows long and 15 columns wide:
New way - 791 Characters - This is easy to read no matter the number of rows:
The size difference varies depending on the query. A long or multiple primary key significantly lengthens the old method. This then is compounded by the length of the dataset.
On MySQL I ran some tests on performance:
There doesn't seem to be a huge gain in performance but no loss either. Smaller queries should perform much better when using a remote database where network speed becomes a factor.
Future ambitions include being able to update from a query or another table. Also, the current implementation limits the use of a single field as a constraint. This makes it useless if you have composite primary keys. I work with tables all the time that have as many as six fields composing the primary key. This PR has already built in the use of multiple constraints. The rest of it will need to be implemented later. I'd like to have it working with the
onConstraint()
method created in the upsert PR.All the drivers are updated to use this type of query. Oracle is using MERGE instead of UPDATE. It accomplishes the same thing and works in the same way as the other queries.
The
BaseBuilder
method was switched from MySQL to a method used by SQLite, MSSQL, Postgre. MySql and Oracle have methods inBuilder
. SQLite has method in Builder for < version 3.33.0.Checklist: