Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return numeric-string instead of just string to represent row count #5317

Closed
wants to merge 1 commit into from

Conversation

simPod
Copy link
Contributor

@simPod simPod commented Mar 10, 2022

Q A
Type improvement
Fixed issues Follows up #5269

Summary

The returned string representing row count is always a number so it makes sense to constrain it a little bit more.

@morozov
Copy link
Member

morozov commented Mar 10, 2022

Looks good in general. I believe, the same idea applies to the QueryBuilder and the Driver\Connection interface and its implementations as well:

  1. * @return Result|int|string
  2. * @return int The number of affected rows.
  3. public function exec(string $sql): int;

@simPod
Copy link
Contributor Author

simPod commented Mar 10, 2022 via email

@simPod
Copy link
Contributor Author

simPod commented Mar 10, 2022

@morozov I believe changing

- public function exec(string $sql): int; 
+ public function exec(string $sql): int|string; 

results in hard BC break since I'll have to remove the type hint (union types not supported here yet)

- public function exec(string $sql): int; 
+ public function exec(string $sql) 

Should I proceed?

@morozov
Copy link
Member

morozov commented Mar 10, 2022

No, but it should be possible to add a @return or @psalm-return annotation that would override the native return type.

@simPod
Copy link
Contributor Author

simPod commented Mar 10, 2022 via email

@morozov
Copy link
Member

morozov commented Mar 11, 2022

Ok so I keep :int but add @return int|string?

Yes, keep :int and add whatever (@return or @psalm-return) that will make static analyzers understand that it's in fact int|numeric-string.

Off-topic: note that it's not just int but in fact 0|positive-int (see phpstan/phpstan-src#692 for example). While it's not currently necessary, it's not clear how to express a positive-negative-integer-string.

@simPod
Copy link
Contributor Author

simPod commented Mar 23, 2022

@morozov what about PHPDoc tag @return with type int|string is not subtype of native type int. phpstan errors?

@morozov
Copy link
Member

morozov commented Mar 23, 2022

[…] what about PHPDoc tag @return with type int|string is not subtype of native type int. phpstan errors?

Would it help to remove the conflicting @return annotations and only leave the @psalm-return ones which don't seem to cause any issues?

@Seldaek
Copy link
Member

Seldaek commented Mar 23, 2022

Would it help to remove the conflicting @return annotations and only leave the @psalm-return ones which don't seem to cause any issues?

IMO this would not be good. If the PHP return type is : int then PHP will do type coercion so in the example of executeUpdate if the inner executeStatement returns a numeric-string because it overflowed PHP_INT_MAX, then it doesn't matter what your phpdoc tags say, it will try to return an int and it will fail (see https://3v4l.org/YaedP)

The only correct way here that I see is to remove the int return type from exec&executeUpdate IMO, and then use @return int|string everywhere for dumb parsers plus ideally @psalm-return int<0, max>|numeric-string for static analyzers.

Removing the type hint should not be a huge deal, classes extending it can still specify it without breaking LSP (at least in the 3.3 branch as this was allowed in PHP 7.2) so it's not a BC break I would say.

@morozov
Copy link
Member

morozov commented Mar 23, 2022

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase. So right now the int return type also serves as an implicit cast of the strings potentially returned by the underlying drivers. I don't know if any driver returns the number of affected rows lower than PHP_INT_MAX as a string, but if they do, it might be a bigger BC break. I.e. the consuming code may expect an int but will now get a string regardless of the actual value.

Another thought is that it was never reported that a number of affected rows larger than PHP_INT_MAX couldn't be properly reported. Even on a 32-bit platform, it would have to be over 2 billion rows, which is quite a lot.

It is much more likely that changing the signature will affect the API consumers who don't experience this issue than it will fix the issue for those who experience it.

I am fine with introducing this minor BC break in a minor release but we have to be conscious about the real side effects it can lead to.

@Seldaek
Copy link
Member

Seldaek commented Mar 24, 2022

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase. So right now the int return type also serves as an implicit cast of the strings potentially returned by the underlying drivers.

If you look at my example (https://3v4l.org/YaedP) again, it also does not use strict types, and while it also surprised me, it seems like PHP fails the string->int cast if the numeric string is beyond PHP_MAX_INT. So technically you already have the issue with the current code.

But as you say, in practice even on 32bit doing an operation on 2billion rows is pretty damn rare, and 32bit systems themselves are getting rarer, so IMO it's fine to use a plain int return type here, and coerce whatever numeric string we get back to an int. I find this completely reasonable, but I do wonder why not apply the same logic to the other statements then?

If adding return types would be considered a BC break, we can still do return (int) $whatever; to explicitly cast.

I think this would be way less surprising for everyone, and lets you use a much leaner @return int<0, max> return type.

The bigint -> string type thing is a bit of a pain already, but I can understand the rationale there (tho I wish we could forget about 32bit), and having tables with ids going above 2billion is definitely not that far fetched. But again for affected rows a 32bit integer really sounds like it is good enough.

@Ocramius
Copy link
Member

But again for affected rows a 32bit integer really sounds like it is good enough.

Agreed, plus the only 32bit system I had to recently work with is a raspberry pi 😁

@morozov
Copy link
Member

morozov commented Mar 24, 2022

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase.

Actually, all the classes implementing Driver\Result::rowCount(): int have declare(strict_types=1) in their files, which means that all drivers return an int for the values not greater than PHP_INT_MAX.

The only correct way here that I see is to remove the int return type from exec&executeUpdate IMO, and then use @return int|string everywhere for dumb parsers plus ideally @psalm-return int<0, max>|numeric-string for static analyzers.

Agree.

Removing the type hint should not be a huge deal, classes extending it can still specify it without breaking LSP (at least in the 3.3 branch as this was allowed in PHP 7.2) so it's not a BC break I would say.

Agree. It won't be a BC break.

@Seldaek
Copy link
Member

Seldaek commented Mar 25, 2022

Actually, all the classes implementing Driver\Result::rowCount(): int have declare(strict_types=1) in their files, which means that all drivers return an int for the values not greater than PHP_INT_MAX.

I am a bit confused, if that is the case already then why not keep the int return types on Connection and simply drop the string types?

If Driver\Result::rowCount(): int have declare(strict_types=1) already, it means that for int<0,PHP_INT_MAX> they will return an int, and if the php drivers return a string for values above then rowCount will throw a TypeError.

So having an int|string return type on Connection makes no sense as it will never return string.

As per

dbal/src/Connection.php

Lines 1157 to 1160 in 83f779b

return $result->rowCount();
}
return $connection->exec($sql);
the other way executeStatement can return is via DriverConnection::exec but that also has an int return type, so it is currently impossible for executeStatement to return a string.

I opened #5334 to show more clearly what I suggested doing above in #5317 (comment)

IMO this is the best outcome as it completely removes the string type, keeping things simpler for the API consumers, and it is anyway reflecting the current reality of the code better.

@morozov
Copy link
Member

morozov commented Mar 25, 2022

I am a bit confused, if that is the case already then why not keep the int return types on Connection and simply drop the string types?

Because the underlying driver can return a string.

@Seldaek
Copy link
Member

Seldaek commented Mar 25, 2022

Yes there is one driver (MySQLi) which can return a string as far as I have seen after looking at all drivers docs this morning. And it will do so only in the case that you have a query affecting over 2billion rows (super rare) AND only on 32bit platforms (super rare). I don't find it justified to add a string return type just for that one edge case, adding burden to every API consumer, but that's obviously just my perspective.

Also note that in the current state, this PR does not fix it as https://github.com/doctrine/dbal/blob/3.3.x/src/Driver/Mysqli/Result.php#L164-L171 and other Result classes still have the int return type, which will turn any overflowing int into a TypeError. So you will right now never get a string back, and the type def is too broad. You also never got anyone reporting such a TypeError I suppose, because as I mentioned above this is a very low probability edge case.

I would appeal to pragmatism here as forcing everyone to deal with int|string when reality is 99.9999999% of users will ever see an int, seems absurd.

P.S: Also another minor aspect I noticed this morning is that strictly speaking some drivers (SQLSrv and Mysqli as per https://github.com/doctrine/dbal/pull/5334/files) do return -1 in some error conditions, which does not fit the int<0, max> type. I don't think this can happen because if it errors most likely Doctrine won't create a Result, but for safety I added a max(0, $...) in that PR, ensuring the returned values match the documented types.

@morozov
Copy link
Member

morozov commented Mar 25, 2022

I realize that this is such an insignificant and at the same time such a complex problem that I'm not willing to spend time on solving it.

It is insignificant because the edge case is indeed rare. It is complex because different underlying drivers are handling it differently. I.e. SQL Server seems to be unable to return more than PHP_INT_MAX rows (not sure about handling the updates). Oracle seems to be returning uint as int which might lead to a negative number at the end.

I don't believe we have the integration tests that validate the semantics of these values in all cases (e.g. select, insert, update where all matching rows indeed were updated or some of them already had the value set by the update).

With that said, whoever has the commit permissions for this repo has my full blessing in doing whatever they deem reasonable.

I would appeal to pragmatism here as forcing everyone to deal with int|string when reality is 99.9999999% of users will ever see an int, seems absurd.

This appeal should be targeted at the maintainers of the underlying drivers first of all. BTW @kamil-tekiela contributed some changes that allowed to simplify the DBAL code for mysqli recently.

@Seldaek
Copy link
Member

Seldaek commented Mar 25, 2022

OK, then if you rather not solve anything, I would suggest leaving the php types as they are now, broken or not that they may be in edge cases, and switch to a simple @return int instead of int|string. That would not require any code change. It also reflects reality because drivers will return an int or throw a TypeError right now (as of 3.3.x code state I mean) in pathologic cases.

Removing the int<0, max> part makes it simpler due to drivers returning -1 and the fact that these driver methods are not properly typed in psalm/phpstan yet either..

If you agree I am happy to prepare a new PR (or @simPod updates this one here?). It'd be risk free in terms of code, but improves the type situation for API consumers which I believe was @simPod's intent as well.

@github-actions
Copy link

There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days.
If you want to continue working on it, please leave a comment.

@github-actions github-actions bot added the Stale label Jun 24, 2022
@github-actions
Copy link

github-actions bot commented Jul 1, 2022

This pull request was closed due to inactivity.

@github-actions github-actions bot closed this Jul 1, 2022
@simPod
Copy link
Contributor Author

simPod commented Jul 1, 2022

@doctrine what do you think about @Seldaek' comment

@Seldaek
Copy link
Member

Seldaek commented Jul 1, 2022

Yeah this should be reopened IMO. @morozov if you can let us know what you think about #5317 (comment) I'll happily send a PR.

@greg0ire greg0ire reopened this Jul 1, 2022
@morozov
Copy link
Member

morozov commented Jul 1, 2022

I'll be fine with whatever changes that somebody else from the maintainers approves. After skimming over the thread, I don't want to work on this change.

@github-actions github-actions bot removed the Stale label Jul 2, 2022
@derrabus derrabus changed the base branch from 3.3.x to 3.4.x August 30, 2022 19:16
@morozov
Copy link
Member

morozov commented Sep 18, 2022

Closing due to the lack of progress.

@morozov morozov closed this Sep 18, 2022
@simPod
Copy link
Contributor Author

simPod commented Sep 18, 2022

What is your feedback on #5334 then?

@morozov
Copy link
Member

morozov commented Sep 18, 2022

I don't have any feedback on in, it's closed.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants