Improve `BaseBuilder::updateBatch()` SQL #6373

sclubricants · 2022-08-12T23:22:12Z

This PR redesigns the SQL queries generated in _batchUpdate().

This implementation provides more elegant and smaller queries. This method selects the data into a psudo in memory table which provides more flexibility and availability to do more things with.

Because it treats the dataset as a table, a query or a table can be substituted for the dataset. This will be helpful in future ambitions.

Here is a sample of the difference in MySQL:

Old way - 982 Characters - This gets really hard to read when 100 rows long and 15 columns wide:

UPDATE `db_update_batch` SET `column1` = CASE 
WHEN `primary_key` = 1 THEN 'test data string T'
WHEN `primary_key` = 2 THEN 'test data string T'
WHEN `primary_key` = 3 THEN 'test data string T'
ELSE `column1` END, `column2` = CASE 
WHEN `primary_key` = 1 THEN 'test data string T'
WHEN `primary_key` = 2 THEN 'test data string T'
WHEN `primary_key` = 3 THEN 'test data string T'
ELSE `column2` END, `column3` = CASE 
WHEN `primary_key` = 1 THEN 'test data string T'
WHEN `primary_key` = 2 THEN 'test data string T'
WHEN `primary_key` = 3 THEN 'test data string T'
ELSE `column3` END, `column4` = CASE 
WHEN `primary_key` = 1 THEN 'test data string T'
WHEN `primary_key` = 2 THEN 'test data string T'
WHEN `primary_key` = 3 THEN 'test data string T'
ELSE `column4` END, `column5` = CASE 
WHEN `primary_key` = 1 THEN 'test data string T'
WHEN `primary_key` = 2 THEN 'test data string T'
WHEN `primary_key` = 3 THEN 'test data string T'
ELSE `column5` END
WHERE `primary_key` IN(1,2,3)

New way - 791 Characters - This is easy to read no matter the number of rows:

UPDATE `db_update_batch` AS t
INNER JOIN (
SELECT 1 `primary_key`, 'test data string T' `column1`, 'test data string T' `column2`, 'test data string T' `column3`, 'test data string T' `column4`, 'test data string T' `column5` UNION ALL
SELECT 2 `primary_key`, 'test data string T' `column1`, 'test data string T' `column2`, 'test data string T' `column3`, 'test data string T' `column4`, 'test data string T' `column5` UNION ALL
SELECT 3 `primary_key`, 'test data string T' `column1`, 'test data string T' `column2`, 'test data string T' `column3`, 'test data string T' `column4`, 'test data string T' `column5`
) u
ON t.`primary_key` = u.`primary_key`
SET 
t.`column1` = u.`column1`,
t.`column2` = u.`column2`,
t.`column3` = u.`column3`,
t.`column4` = u.`column4`,
t.`column5` = u.`column5`

The size difference varies depending on the query. A long or multiple primary key significantly lengthens the old method. This then is compounded by the length of the dataset.

On MySQL I ran some tests on performance:

With 100 records with 55 character primary_key and 15 columns - Query strlen():
Old Method: 157,605
New Method: 54,821

50,000 records update - 15 Fields 255 characters - Four passes each:
Old Method: 29,368 ms 84.68 mb
New Method: 28,374 ms 83.25 mb

Query execution of batch of 100 Fields 55 Characters - 2 passes:
Old Method: 0.0195 s
New Method: 0.0095 s

There doesn't seem to be a huge gain in performance but no loss either. Smaller queries should perform much better when using a remote database where network speed becomes a factor.

Future ambitions include being able to update from a query or another table. Also, the current implementation limits the use of a single field as a constraint. This makes it useless if you have composite primary keys. I work with tables all the time that have as many as six fields composing the primary key. This PR has already built in the use of multiple constraints. The rest of it will need to be implemented later. I'd like to have it working with the onConstraint() method created in the upsert PR.

All the drivers are updated to use this type of query. Oracle is using MERGE instead of UPDATE. It accomplishes the same thing and works in the same way as the other queries.

The BaseBuilder method was switched from MySQL to a method used by SQLite, MSSQL, Postgre. MySql and Oracle have methods in Builder. SQLite has method in Builder for < version 3.33.0.

Checklist:

Securely signed commits
Component(s) with PHPDoc blocks, only if necessary or adds value
Unit testing, with >80% coverage
User guide updated
Conforms to style guide

sclubricants · 2022-08-12T23:50:02Z

Having some trouble with SQLite. What version are we running?

The UPDATE-FROM idea is an extension to SQL that allows an UPDATE statement to be driven by other tables in the database. The "target" table is the specific table that is being updated. With UPDATE-FROM you can join the target table against other tables in the database in order to help compute which rows need updating and what the new values should be on those rows. UPDATE-FROM is supported beginning in SQLite version 3.33.0 (2020-08-14).
https://www.sqlite.org/lang_update.html

SQLite older than 3.33.0 will use the old SQL.

kenjis · 2022-08-13T01:18:37Z

What version are we running?

Just SQLite3.

SQLite3 via the SQLite3 driver
https://codeigniter4.github.io/CodeIgniter4/intro/requirements.html

sclubricants · 2022-08-13T01:28:35Z

SQLite version must be determined by the driver.

If driver is 3.33 or newer version than we use new method, otherwise the overide method is the old method.

I'm not sure if you can update the driver to the newer one.. probably depends on version of php.

    protected function _updateBatch(string $table, array $values, string $index): string
    {
        if ((float) $this->db->getVersion() >= 3.33) {
            return parent::_updateBatch($table, $values, $index);
        }

kenjis · 2022-08-13T01:39:22Z

$this->db->getVersion() returns versionString like 3.5.9. It is not a valid number string.
https://www.php.net/manual/en/sqlite3.version.php
I don't know (float) works always as you expect.

sclubricants · 2022-08-13T01:46:17Z

I tested casting to float. It basically seems to knock everything off from the second decimal.

$this->db->getVersion() could return an int.. would have been more useful.

https://www.php.net/manual/en/sqlite3.version.php

kenjis · 2022-08-13T01:52:46Z

I tested casting to float. It basically seems to knock everything off from the second decimal.

It seems the behavior is not documented.

Unless it is documented, we should not rely on the current behavior.
It is better to use version_compare().
https://www.php.net/manual/en/function.version-compare.php

Or you can use SQLite3::version() directly.

sclubricants · 2022-08-13T01:59:10Z

Thanks, didn't know about that function.

kenjis

Looks good and this is more elegant!
But I want some other reviews.

kenjis · 2022-08-26T13:09:54Z

I benchmarked with CSV upload and updateBatch().
Update existing records with exactly the same data with updateBatch().
Although not a very rigorous benchmark, I did see an improvement in performance.
And I confirmed all the records are exactly the same before and after the updateBatch().

149,970 records
develop: 319 MB, 157 seconds
this PR: 319 MB, 130 seconds

sclubricants · 2022-08-26T16:04:18Z

#6407 has a slightly better implementation as most of the query is cached and doesn't need rebuilt for each batch run.

Also, while performance is important, there are other considerations in the design of the new query. It allows multiple keys to update on where the current implementation only allows one. The majority of the tables I work with have composite keys so this is very helpful to me. Another thing is that you can use a query as a datasource. Although its not implemented here you could also join a table to the dataset.

I have made quite a few changes in my all inclusive PR #6407

I could update this PR with some of them.

sclubricants · 2022-08-26T16:36:27Z

Another thing you can do with this PR is update a field that isn't in the dataset or filter dataset by field and not use it in update:

UPDATE `db_update_batch` AS t
INNER JOIN (
SELECT 1 `primary_key`, 1 `second_key`, 'Name 1-1' `name`, '2022-08-24 12:56:11' `created_date` UNION ALL
SELECT 1 `primary_key`, 2 `second_key`, 'Name 1-2' `name`, '2022-08-26 09:22:36' `created_date`  UNION ALL
SELECT 2 `primary_key`, 1 `second_key`, 'Name 2-1' `name` , '2022-08-26 09:31:57' `created_date`
) u
ON t.`primary_key` = u.`primary_key` AND t.`second_key` = u.`second_key` AND DATE(u.`created_date`) = CURDATE()
SET 
t.`name` = u.`name`,
t.`last_update` = CURRENT_TIMESTAMP()

This only updates records from the dataset that were created today. This query is possible with this PR using RawSql.

sclubricants · 2022-08-26T16:48:44Z

This one will only update records where the table data doesn't match the update data. Then only if they don't match will update the record and change the last_update.

UPDATE `db_update_batch` AS t
INNER JOIN (
SELECT 1 `primary_key`, 'Cat 3' `category`, 'Name 1-1' `name` UNION ALL
SELECT 1 `primary_key`, 'Cat 5' `category`, 'Name 1-2' `name` UNION ALL
SELECT 2 `primary_key`, 'Cat 5' `category`, 'Name 2-1' `name`
) u
ON t.`primary_key` = u.`primary_key` AND (t.`category` != u.`category` OR t.`name` != u.`name`)
SET 
t.`name` = u.`name`,
t.`last_update` = CURRENT_TIMESTAMP()

This gives you an accurate last_update because there truly was an update.

kenjis · 2022-08-26T21:54:48Z

@sclubricants Thank you for your work.
I would like to get your work into the framework.
But your PRs are relatively too big, and so the most important thing I would like to say is that make your PR small.
The larger it is, the less likely it is to be merged. And the more it will be closed.

#6407 does not help me. Because it is too big to understand.
You need to run git commit more often. One commit, one intent.

michalsn

I like these changes. However, the most important element is missing. Users won't use the new features if they don't know about them.

We need to explain and demonstrate the new capabilities with examples in the user guide. We also need to indicate in which situations new features will not work (SQLite < 3.33.0)

As for other PRs - let's stick to one rule: 1 PR = 1 enhancement. This way, it's easier for everyone to follow the main purpose and agree/disagree on the feature.

sclubricants · 2022-08-30T00:07:22Z

We need to explain and demonstrate the new capabilities with examples in the user guide. We also need to indicate in which situations new features will not work (SQLite < 3.33.0)

@michalsn
Well actually all the new capabilities come with my other PR. With this PR there really is no difference to the developer. This just changes the query structure. All the cool new functions require some major changes.. which I have worked out mostly. I suggested creating a new branch to build it out but no luck. #6407

kenjis · 2022-08-30T01:51:26Z

The generated SQL statements will be completely changed. So at least we should add it in the changelog.
And the SQL statements will be greatly improved, so I think we can add it as an enhancement in the changelog.

michalsn · 2022-08-30T08:09:59Z

Oh... okay, sorry - my mistake. I was browsing the code from the linked PR and mixed things up.

Please, just add the changelog, and we will be ready. After merging, we can go forward with your next proposals/ideas.

Much appreciate.

kenjis · 2022-09-06T03:52:21Z

Run phpcpd --exclude system/Test --exclude system/ThirdParty --exclude system/Database/SQLSRV/Builder.php --exclude system/Database/SQLSRV/Forge.php -- app/ public/ system/
phpcpd 6.0.3 by Sebastian Bergmann.

Found 1 clones with 54 duplicated lines in 3 files:

  - /home/runner/work/CodeIgniter4/CodeIgniter4/system/Database/BaseBuilder.php:2254-2281 (27 lines)
    /home/runner/work/CodeIgniter4/CodeIgniter4/system/Database/MySQLi/Builder.php:62-89
    /home/runner/work/CodeIgniter4/CodeIgniter4/system/Database/OCI8/Builder.php:235-262

0.07% duplicated lines out of 82900 total lines of code.
Average size of duplication is 54 lines, largest clone has 27 of lines

system/Database/BaseBuilder.php

kenjis · 2022-09-06T04:10:51Z

@sclubricants Can you fix PHPCPD error?

sclubricants · 2022-09-06T23:51:17Z

@sclubricants Can you fix PHPCPD error?

@kenjis Its not as bad as it looks. A single statement can take 11 lines.

            $data = implode(
                " UNION ALL\n",
                array_map(
                    static fn ($value) => 'SELECT ' . implode(', ', array_map(
                        static fn ($key, $index) => $index . ' ' . $key,
                        $keys,
                        $value
                    )),
                    $values
                )
            ) . "\n";

The top 6 lines will go away once things get refactored. I wasn't getting this error until I added these.

        // this is a work around until the rest of the platform is refactored
        if ($index !== '') {
            $this->QBOptions['constraints'] = [$index];
        }
        $keys = array_keys(current($values));

Can we suppress these somehow?

kenjis · 2022-09-07T00:36:00Z

Add --exclude option.

CodeIgniter4/.github/workflows/test-phpcpd.yml

Line 42 in 31ae195

    
           run: phpcpd --exclude system/Test --exclude system/ThirdParty --exclude system/Database/SQLSRV/Builder.php --exclude system/Database/SQLSRV/Forge.php -- app/ public/ system/

sclubricants · 2022-09-07T18:06:25Z

I wish it had a way to suppress certain lines instead of excluding the whole file.

sebastianbergmann/phpcpd#99

kenjis · 2022-09-09T03:30:07Z

It seems you used git merge, but git rebase is recommended if you can.

Could you rebase?
See https://github.com/codeigniter4/CodeIgniter4/blob/develop/contributing/workflow.md#updating-your-branch
This PR goes into 4.3, so the command would be git rebase upstream/4.3.

sclubricants · 2022-09-09T16:00:51Z

It seems you used git merge, but git rebase is recommended if you can.

I usually do rebase but I did it through the web interface and it created a merge.

This will be a place to store different options that can be used on queries. I thought about using $QBData but felt this could be confused with the data in QBSet.

This method removes the key fields from the update fields. It also handles using RawSql.

This is a bit more refined then the previous version. This version caches most of the query after the first run of the batch. It also is ready to accept various options that can be fed to it.. perhaps in a later PR. For instance the fields updated can be extracted from the field set as is currently implemented or they could be explicitly set before hand. Likewise the data is set by QBSet data but could potentially be fed a query instead.

entry was in the wrong place

Co-authored-by: kenjis <[email protected]>

kenjis · 2022-09-10T23:15:55Z

Thank you for rebasing.

kenjis · 2022-09-20T01:08:59Z

system/Database/BaseBuilder.php


-        return 'UPDATE ' . $this->compileIgnore('update') . $table . ' SET ' . substr($cases, 0, -2) . $this->compileWhereHaving('QBWhere');
+        return sprintf($sql, $data);


sprintf() was introduced here.

kenjis added 4.3 refactor Pull requests that refactor code database Issues or pull requests that affect the database layer labels Aug 12, 2022

sclubricants force-pushed the BatchUpdateSQL branch from 1e78564 to dd42a1a Compare August 13, 2022 00:18

sclubricants force-pushed the BatchUpdateSQL branch from 8cd7717 to e231617 Compare August 13, 2022 02:01

sclubricants mentioned this pull request Aug 15, 2022

Add Upsert/UpsertBatch to builder #6367

Closed

5 tasks

sclubricants force-pushed the BatchUpdateSQL branch from e231617 to 869064c Compare August 19, 2022 19:42

sclubricants mentioned this pull request Aug 22, 2022

Upsert + Refactor BaseBuilder *Batch() Methods #6407

Closed

5 tasks

kenjis requested a review from michalsn August 26, 2022 06:59

kenjis approved these changes Aug 26, 2022

View reviewed changes

kenjis changed the title ~~Redesign BatchUpdate() SQL~~ Redesign BaseBuilder::updateBatch() SQL Aug 26, 2022

kenjis changed the title ~~Redesign BaseBuilder::updateBatch() SQL~~ Improve BaseBuilder::updateBatch() SQL Aug 26, 2022

michalsn requested changes Aug 28, 2022

View reviewed changes

kenjis added the docs needed Pull requests needing documentation write-ups and/or revisions. label Aug 28, 2022

michalsn approved these changes Aug 30, 2022

View reviewed changes

kenjis reviewed Sep 6, 2022

View reviewed changes

system/Database/BaseBuilder.php Outdated Show resolved Hide resolved

kenjis removed the docs needed Pull requests needing documentation write-ups and/or revisions. label Sep 6, 2022

sclubricants and others added 13 commits September 9, 2022 09:04

Redesign BatchUpdate() SQL

cc0290a

Add $QBOptions property

edcae4b

This will be a place to store different options that can be used on queries. I thought about using $QBData but felt this could be confused with the data in QBSet.

Added method updateFields()

d5863e9

This method removes the key fields from the update fields. It also handles using RawSql.

Updated MySql Forge::_updateBatch()

a9024de

Fix UpdaeTest

57c6b23

Update Oracle Forge::_updateBatch()

103a573

Fix PHPStan errors

07659b3

Add change log and documentation

a689024

Fix change log

b3b7e4c

entry was in the wrong place

Update system/Database/BaseBuilder.php

975a1f7

Co-authored-by: kenjis <[email protected]>

Exclude from PHPCPD

5217a1c

Update test-phpcpd.yml

c4cbaa7

sclubricants force-pushed the BatchUpdateSQL branch from c9999cf to c4cbaa7 Compare September 9, 2022 16:05

kenjis approved these changes Sep 10, 2022

View reviewed changes

kenjis merged commit e249460 into codeigniter4:4.3 Sep 10, 2022

sclubricants deleted the BatchUpdateSQL branch September 11, 2022 19:10

kenjis reviewed Sep 20, 2022

View reviewed changes

kenjis mentioned this pull request Sep 20, 2022

Bug: BaseBuilder::updateBatch() may cause ArgumentCountError #6555

Closed

sclubricants mentioned this pull request Oct 6, 2022

RawSql not working in UpdateBatch #6554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `BaseBuilder::updateBatch()` SQL #6373

Improve `BaseBuilder::updateBatch()` SQL #6373

sclubricants commented Aug 12, 2022 •

edited

Loading

sclubricants commented Aug 12, 2022 •

edited

Loading

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis left a comment

kenjis commented Aug 26, 2022

sclubricants commented Aug 26, 2022

sclubricants commented Aug 26, 2022

sclubricants commented Aug 26, 2022

kenjis commented Aug 26, 2022

michalsn left a comment

sclubricants commented Aug 30, 2022

kenjis commented Aug 30, 2022

michalsn commented Aug 30, 2022

kenjis commented Sep 6, 2022

kenjis commented Sep 6, 2022

sclubricants commented Sep 6, 2022 •

edited

Loading

kenjis commented Sep 7, 2022

sclubricants commented Sep 7, 2022

kenjis commented Sep 9, 2022

sclubricants commented Sep 9, 2022

kenjis commented Sep 10, 2022

kenjis Sep 20, 2022


		return 'UPDATE ' . $this->compileIgnore('update') . $table . ' SET ' . substr($cases, 0, -2) . $this->compileWhereHaving('QBWhere');
		return sprintf($sql, $data);

Improve BaseBuilder::updateBatch() SQL #6373

Improve BaseBuilder::updateBatch() SQL #6373

Conversation

sclubricants commented Aug 12, 2022 • edited Loading

sclubricants commented Aug 12, 2022 • edited Loading

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis commented Aug 13, 2022

sclubricants commented Aug 13, 2022

kenjis left a comment

Choose a reason for hiding this comment

kenjis commented Aug 26, 2022

sclubricants commented Aug 26, 2022

sclubricants commented Aug 26, 2022

sclubricants commented Aug 26, 2022

kenjis commented Aug 26, 2022

michalsn left a comment

Choose a reason for hiding this comment

sclubricants commented Aug 30, 2022

kenjis commented Aug 30, 2022

michalsn commented Aug 30, 2022

kenjis commented Sep 6, 2022

kenjis commented Sep 6, 2022

sclubricants commented Sep 6, 2022 • edited Loading

kenjis commented Sep 7, 2022

sclubricants commented Sep 7, 2022

kenjis commented Sep 9, 2022

sclubricants commented Sep 9, 2022

kenjis commented Sep 10, 2022

kenjis Sep 20, 2022

Choose a reason for hiding this comment

Improve `BaseBuilder::updateBatch()` SQL #6373

Improve `BaseBuilder::updateBatch()` SQL #6373

sclubricants commented Aug 12, 2022 •

edited

Loading

sclubricants commented Aug 12, 2022 •

edited

Loading

sclubricants commented Sep 6, 2022 •

edited

Loading