Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: [Postgre] QueryBuilder::updateBatch() does not work <column_name>::<type> #8426

Closed
wants to merge 9 commits into from

Conversation

kenjis
Copy link
Member

@kenjis kenjis commented Jan 17, 2024

Description
Fixes #7387
See https://forum.codeigniter.com/showthread.php?tid=88871

This PR introduces a new syntax to specify column name and its type, <column_name>::<type>.
E.g., updated_at::TIMESTAMP

$data = [
    [
        'name'                  => 'Derek Jones',
        'country'               => 'Greece',
        'updated_at::TIMESTAMP' => '2023-12-02 18:47:52',
    ],
    [
        'name'                  => 'Ahmadinejad',
        'country'               => 'Greece',
        'updated_at::TIMESTAMP' => '2023-12-02 18:47:52',
    ],
];
$builder->updateBatch($data, 'name');
/*
 * Produces:
 * UPDATE "db_user"
 * SET
 * "country" = _u."country",
 * "updated_at" = _u."updated_at"
 * FROM (
 * SELECT 'Greece' "country", 'Derek Jones' "name", '2023-12-02 18:47:52'::TIMESTAMP "updated_at" UNION ALL
 * SELECT 'Greece' "country", 'Ahmadinejad' "name", '2023-12-02 18:47:52'::TIMESTAMP "updated_at"
 * ) _u
 * WHERE "db_user"."name" = _u."name"
 */

Checklist:

  • Securely signed commits
  • Component(s) with PHPDoc blocks, only if necessary or adds value
  • Unit testing, with >80% coverage
  • User guide updated
  • Conforms to style guide

@kenjis kenjis added bug Verified issues on the current code behavior or pull requests that will fix them database Issues or pull requests that affect the database layer labels Jan 17, 2024
@kenjis
Copy link
Member Author

kenjis commented Jan 17, 2024

Run vendor/bin/rector process system --dry-run --no-progress-bar

Error: ] Could not process                                                      
         "/home/runner/work/CodeIgniter4/CodeIgniter4/vendor/rector/rector/vendo
         r/symplify/easy-parallel/src/ValueObject/ParallelProcess.php" file, due
         to:                                                                    
         "Child process timed out after 120 seconds". On line: 108              

Error: ] Could not process some files, due to:                                  
         "Reached system errors count limit of 50, exiting...".                 

Error: Process completed with exit code 1.

https://github.com/codeigniter4/CodeIgniter4/actions/runs/7551028040/job/20557620126?pr=8426

@kenjis kenjis force-pushed the fix-postgre-updateBatch branch from 177d52e to 425bb79 Compare January 17, 2024 04:35
@kenjis kenjis changed the title fix: [Postgres] QueryBuilder::updateBatch() does not work fix: [Postgre] QueryBuilder::updateBatch() does not work Jan 17, 2024
@samsonasik
Copy link
Member

@kenjis I am looking of way to improve the performance of Rector, for now, it may require update the parallel config for job size:

-$rectorConfig->parallel(120, 8, 15);
+$rectorConfig->parallel(120, 8, 10);

could you try update it?

@kenjis
Copy link
Member Author

kenjis commented Jan 17, 2024

Okay.

1 file with changes
===================

1) system/Database/BaseBuilder.php:1856

    ---------- begin diff ----------
@@ @@

                 $escapedValue = $escape ? $this->db->escape($rowValue) : $rowValue;

-                if ($keyType !== null) {
-                    $clean[] = new SqlValue($escapedValue, $keyType);
-                } else {
-                    $clean[] = $escapedValue;
-                }
+                $clean[] = $keyType !== null ? new SqlValue($escapedValue, $keyType) : $escapedValue;
             }

             $row = $clean;
    ----------- end diff -----------

Applied rules:
 * SimplifyIfElseToTernaryRector


 [OK] 1 file would have changed (dry-run) by Rector                             

Error: Process completed with exit code 2.

https://github.com/codeigniter4/CodeIgniter4/actions/runs/7554007555/job/20565980505?pr=8426

@kenjis
Copy link
Member Author

kenjis commented Jan 17, 2024

Oh no, I got the error on my MBA.

$ vendor/bin/rector process system/Database/BaseBuilder.php
 0/1 [░░░░░░░░░░░░░░░░░░░░░░░░░░░░]   0%
                                                                                                                        
 [ERROR] Could not process                                                                                              
         "/Users/kenji/work/codeigniter/official/CodeIgniter4/vendor/rector/rector/vendor/symplify/easy-parallel/src/Val
         ueObject/ParallelProcess.php" file, due to:                                                                    
         "Child process timed out after 120 seconds". On line: 108                                                      
                                                                                                                        

                                                                                                                        
 [ERROR] Could not process some files, due to:                                                                          
         "Reached system errors count limit of 50, exiting...". 

@samsonasik
Copy link
Member

Hm.., timeout may need to be adjusted

-$rectorConfig->parallel(120, 8, 10);
+$rectorConfig->parallel(240, 8, 10);

to make work on various resource, could you try?

@kenjis
Copy link
Member Author

kenjis commented Jan 17, 2024

I reran vendor/bin/rector process system/Database/BaseBuilder.php and passed.

@kenjis kenjis force-pushed the fix-postgre-updateBatch branch from 8f7f0c0 to 1f9b0cf Compare January 17, 2024 10:59
Copy link
Member

@michalsn michalsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this problem only occur in Postgre?

@michalsn michalsn added the docs needed Pull requests needing documentation write-ups and/or revisions. label Jan 17, 2024
@kenjis kenjis force-pushed the fix-postgre-updateBatch branch from 1f9b0cf to 91d8cd8 Compare January 18, 2024 00:42
@kenjis kenjis removed the docs needed Pull requests needing documentation write-ups and/or revisions. label Jan 18, 2024
@kenjis kenjis force-pushed the fix-postgre-updateBatch branch 2 times, most recently from ed399eb to 2743be9 Compare January 18, 2024 03:12
@kenjis
Copy link
Member Author

kenjis commented Jan 18, 2024

Does this problem only occur in Postgre?

Almost yes. At least, there is no bug report.

I found the case when using binary data on MySQL #8282 (comment)
but binary data is special.

@kenjis kenjis marked this pull request as draft January 18, 2024 03:18
@kenjis kenjis force-pushed the fix-postgre-updateBatch branch from 2743be9 to 7ae60bd Compare January 18, 2024 04:16
@kenjis kenjis marked this pull request as ready for review January 18, 2024 04:17
Copy link
Member

@michalsn michalsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This is probably the best we can do right now.

@kenjis kenjis force-pushed the fix-postgre-updateBatch branch 2 times, most recently from c9977fe to 28c24a4 Compare January 18, 2024 11:32
@kenjis
Copy link
Member Author

kenjis commented Jan 19, 2024

@codeigniter4/database-team Is the new syntax <column_name>::<type> okay?

@sclubricants
Copy link
Member

It seems "CREATE CAST" might be the answer.

https://www.postgresql.org/docs/current/sql-createcast.html

Most Db implicitly do this but it looks like Postgre makes it configurable.

CREATE CAST (varchar AS integer) WITH INOUT AS IMPLICIT;

$this->updateFields($keys, false, $constraints)->QBOptions['updateFields'] ??
[];

$alias = $this->QBOptions['alias'] ?? '_u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias name is not getting quoted like:

"alias"."column"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems $this->QBOptions['alias'] is quoted.

$this->QBOptions['alias'] = $this->db->protectIdentifiers($alias);

@sclubricants
Copy link
Member

sclubricants commented Jan 19, 2024

            $data = [
                [
                    'name'                  => 'Derek Jones',
                    'country'               => 'Greece',
                    'updated_at::TIMESTAMP' => '2023-12-02 18:47:52',
                ],
            ];

I really don' t think this is the best way to implement this. The reason is that you are altering the key name in the array and I feel this might cause problems further down the line. Your operation is on changing the expression of the value, not the key.

Here is a good use of RawSql. You could create a specific class to postgres and have it return RawSql.

            $data = [
                [
                    'name'                  => 'Derek Jones',
                    'country'               => 'Greece',
                    'updated_at' => new RawSql("'2023-12-02 18:47:52'::TIMESTAMP"),
                ],
            ];

All the sql rendering methods already support RawSql. It might be nice if we had a RawSql interface instead. Then we could have different classes under one type all with the __toString() method.

Where PostgresCastValue impliments RawSqlInterface

            $data = [
                [
                    'name'                  => 'Derek Jones',
                    'country'               => 'Greece',
                    'updated_at' => new PostgresCastValue('2023-12-02 18:47:52', 'TIMESTAMP'),
                ],
            ];

then all sql rendering methods look for a RawSqlInterface object and render it without quotes like RawSql is.

@kenjis
Copy link
Member Author

kenjis commented Jan 20, 2024

I tested on PostgreSQL 13, Docker image postgres:13-alpine.

@sclubricants
Copy link
Member

It seems that the issue is creating the virtual table via select union all. Postgre is not applying implicit casting like other dbms. Everything gets selected into table as int or text..

One option would be to lookup the target table and cast all variables to their type for batch operations. I would assume this affects upserts as well.

This might be resolved by create cast

https://www.postgresql.org/docs/current/sql-createcast.html

@sclubricants
Copy link
Member

Or add a method to specify column types?

$builder->setType(['updated_at' => 'TIMESTAMP'])->updateBatch($data, 'name');

I hope there is another solution but this seems the best so far.

@michalsn
Copy link
Member

As mentioned, RawSql is not portable and will break the code when we change the handler. First, we look for a non-invasive solution.

new SqlValue('2023-12-02 18:47:52', 'TIMESTAMP'),

This seems reasonable, although I would consider a different class name, such as SqlCast, which would better describe what we are doing.

The alternative - a new method for Query Builder, isn't bad from the user's point of view. The only problem I have with it is that this method would only be useful for one database handler... which doesn't harmonize with the rules we adopted for our Query Builder (no specific implementations for a single driver).

@kenjis
Copy link
Member Author

kenjis commented Jan 20, 2024

Okay, there are four options.

(1) Key as <column_name>::<type>

$data = [
    [
        'name'                  => 'Derek Jones',
        'country'               => 'Greece',
        'updated_at::TIMESTAMP' => '2023-12-02 18:47:52',
    ],
];
$builder->updateBatch($data, 'name');

(2) Array as [value, type]

$data = [
    [
         'name'       => 'Derek Jones',
         'country'    => 'Greece',
         'updated_at' => ['2023-12-02 18:47:52', 'TIMESTAMP'],
     ],
];
$builder->updateBatch($data, 'name');

(3) New class that represents value and type

$data = [
    [
         'name'       => 'Derek Jones',
         'country'    => 'Greece',
         'updated_at' => new SqlCast('2023-12-02 18:47:52', 'TIMESTAMP'),
     ],
];
$builder->updateBatch($data, 'name');

(4) New method to specify column types

$data = [
    [
         'name'       => 'Derek Jones',
         'country'    => 'Greece',
         'updated_at' => '2023-12-02 18:47:52',
     ],
];
$builder->setType(['updated_at' => 'TIMESTAMP'])->updateBatch($data, 'name');

@michalsn
Copy link
Member

  1. seems not very handy, we have to modify the key name
  2. seems to be a little too loose, the array can be anything
  3. probably my favorite solution
  4. I'm not a fan of adding an extra method, especially since it would only be used by Postgre

The question is, how common a problem is casting for Postgre? I'm just worried about the scenario where people start asking to implement broader support for it in methods other than updateBatch. That could be a problem with option (3) for sure.

@sclubricants
Copy link
Member

sclubricants commented Jan 20, 2024

I was thinking of something along the libes of (4) but without a method. Instead the database could automatically lookup the keys and return their types. Then each could be casted to its type. This would only be for postgre.

@sclubricants
Copy link
Member

@kenjis could you try explicitly casting all values as ::text and see if the error still occurs?

@kenjis kenjis force-pushed the fix-postgre-updateBatch branch from 28c24a4 to fce14e9 Compare January 20, 2024 22:48
@kenjis
Copy link
Member Author

kenjis commented Jan 20, 2024

@sclubricants Do you mean this?

UPDATE "db_user"
SET
"country" = _u."country",
"updated_at" = _u."updated_at"
FROM (
SELECT 'Greece'::TEXT "country", 'Derek Jones'::TEXT "name", '2023-12-02 18:47:52'::TEXT "updated_at" UNION ALL
SELECT 'Greece'::TEXT "country", 'Ahmadinejad'::TEXT "name", '2023-12-02 18:47:52'::TEXT "updated_at"
) _u
WHERE "db_user"."name" = _u."name"

ErrorException: pg_query(): Query failed: ERROR: column "updated_at" is of type timestamp without time zone but expression is of type text
LINE 4: "updated_at" = _u."updated_at"
^
HINT: You will need to rewrite or cast the expression. in /Users/kenji/work/codeigniter/official/CodeIgniter4/system/Database/Postgre/Connection.php:187

@sclubricants
Copy link
Member

Is the only one we are having problems with timestamp? What about just date?

@sclubricants
Copy link
Member

sclubricants commented Jan 20, 2024

SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = 'public' AND table_name = 'bike_details';

Get datatypes and save to $QBOptions. Then when building sql cast to datatype.

This might be a quick and easy patch until we find something better. This wont change the api.

Id like to talk to some postgre gooroos to see what other solutions there might be.

@kenjis
Copy link
Member Author

kenjis commented Jan 21, 2024

Is the only one we are having problems with timestamp? What about just date?

UPDATE "db_type_test"
SET
"type_bigint" = _u."type_bigint",
"type_date" = _u."type_date",
"type_text" = _u."type_text"
FROM (
SELECT 9999999 "type_bigint", '2024-01-01'::TEXT "type_date", 'updated' "type_text", 'test1' "type_varchar" UNION ALL
SELECT 9999999 "type_bigint", '2024-01-01'::TEXT "type_date", 'updated' "type_text", 'test2' "type_varchar"
) _u
WHERE "db_type_test"."type_varchar" = _u."type_varchar"

ErrorException: pg_query(): Query failed: ERROR: column "type_date" is of type date but expression is of type text
LINE 4: "type_date" = _u."type_date",
^
HINT: You will need to rewrite or cast the expression. in /Users/kenji/work/codeigniter/official/CodeIgniter4/system/Database/Postgre/Connection.php:187

@sclubricants
Copy link
Member

sclubricants commented Jan 21, 2024

You should be able to do the casts in the set part rather than the select part.

I wander if the first part of set was cast as text would it work. If not then cast the right side qppropriately

..sorry hard to spell out from mobile..

@kenjis
Copy link
Member Author

kenjis commented Jan 21, 2024

Oh, thanks.
The following query is valid.

UPDATE "db_type_test"
SET
    "type_bigint" = _u."type_bigint",
    "type_date" = _u."type_date"::DATE,
    "type_text" = _u."type_text"
FROM (
         SELECT 9999999 "type_bigint", '2024-01-01' "type_date", 'updated' "type_text", 'test1' "type_varchar" UNION ALL
         SELECT 9999999 "type_bigint", '2024-01-01' "type_date", 'updated' "type_text", 'test2' "type_varchar"
     ) _u
WHERE "db_type_test"."type_varchar" = _u."type_varchar"

@kenjis kenjis changed the title fix: [Postgre] QueryBuilder::updateBatch() does not work fix: [Postgre] QueryBuilder::updateBatch() does not work <column_name>::<type> Jan 21, 2024
@kenjis
Copy link
Member Author

kenjis commented Jan 21, 2024

I was thinking of something along the libes of (4) but without a method. Instead the database could automatically lookup the keys and return their types. Then each could be casted to its type. This would only be for postgre.

I sent another PR #8439 with that way.

@kenjis kenjis marked this pull request as draft January 21, 2024 23:40
@kenjis kenjis closed this Jan 23, 2024
@kenjis kenjis deleted the fix-postgre-updateBatch branch January 23, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Verified issues on the current code behavior or pull requests that will fix them database Issues or pull requests that affect the database layer
Projects
None yet
4 participants