Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement array shapes for preg_match() $matches by-ref parameter #2589

Merged
merged 55 commits into from
Jun 21, 2024

Conversation

staabm
Copy link
Contributor

@staabm staabm commented Aug 26, 2023

I tried solving the shapes with Hoa Regex AST, but I couldn't find the information about capturing groups on the ast nodes.
I don't see a way on how to identify where capturing groups are based on the AST.

therefore I tried the way suggested in phpstan/phpstan#9502 (comment)

@staabm
Copy link
Contributor Author

staabm commented Aug 26, 2023

@mvorisek please provide feedback and if you could come up with a few more test-cases, that would be awesome.

whats the expected result for phpstan/phpstan#9502 ?

@staabm
Copy link
Contributor Author

staabm commented Aug 27, 2023

memo to me: I need to fix PREG_UNMATCHED_AS_NULL on PHP <= 7.3
https://3v4l.org/3k0mr

@PrinsFrank
Copy link
Contributor

@staabm It would be really great to have this implemented now reportPossiblyNonexistentConstantArrayOffset is available! Can I help here?

@staabm
Copy link
Contributor Author

staabm commented May 14, 2024

I have this PR on my todo list and will rebase it in the near future.

You can help by verifying existing test expectations of this PR or help explore missing tests/cases

@staabm staabm force-pushed the regexshapes branch 2 times, most recently from c08a944 to 0ce2624 Compare May 15, 2024 17:20
@staabm staabm changed the base branch from 1.10.x to 1.11.x May 15, 2024 17:20
@staabm staabm force-pushed the regexshapes branch 3 times, most recently from 39d2b3b to 7ed47c4 Compare May 18, 2024 09:51
@staabm staabm marked this pull request as ready for review May 18, 2024 09:56
@phpstan-bot
Copy link
Collaborator

This pull request has been marked as ready for review.

@staabm
Copy link
Contributor Author

staabm commented May 18, 2024

In the end I combined both approaches. regex based retrieval of the capturing groups and the AST to identify how many non-optional groups are contained in the regex.

since the type inference now requires a regex which is parsable by Hoa\Regex I commented some type-assertions which currently don't work because the regex parser cannot parse it. I think these can be fixed in the future by applying fixes to the Grammar.pp.

I am neither a regex nor a grammer expert, therefore I left these things for other people more confident in this topic.
overall this means we get already a pretty good type narrowing based on a constant regex.


In a future PR I would improve the types to be even more precise for e.g.
'/Price: (£|€)/i' -> constant string £|€ instead of just string
'/Price: (\d)/i' -> numeric-string instead of string
'/Price: (\s){2,10}/i' -> non-falsey-string instead of string
...

src/Type/Php/RegexShapeMatcher.php Outdated Show resolved Hide resolved

function doNamedSubpattern(string $s): void {
if (preg_match('/\w-(?P<num>\d+)-(\w)/', $s, $matches)) {
// could be assertType('array{0: string, num: string, 1: string, 2: string, 3: string}', $matches);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we can see one example of a regex which is not parsable with the current hoa grammar:

Fatal error: Uncaught Hoa\Compiler\Llk\Parser::parse(): (0) Unexpected token "-" (range) at line 1 and column 4:
/\w-(?P<num>\d+)-(\w)/
   ↑
in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Parser.php at line 1.
  thrown in /Users/staabm/workspace/phpstan-src/vendor/hoa/compiler/Llk/Parser.php on line 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues:

a) (?P<xxx>... is equivalent of supported (?<xxx>... - spec https://www.pcre.org/original/doc/html/pcrepattern.html (search for ˙?p<˙), grammar: https://github.com/hoaproject/Regex/blob/master/Source/Grammar.pp#L73
b) - range - it is reserved character strictly if in [ character group only (otherwise it is regular chanracter and very often used)

Copy link
Contributor Author

@staabm staabm May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please provide code suggestions for fixes.

the example work in plain php, so we either need to fix the grammar or leave the typing for these cases behind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind forking hoaproject/Regex and saving there a php test file I can run with all the issues you found?

Copy link
Contributor Author

@staabm staabm May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look into this PR here and find all commented assertions. these are the cases which hoa cannot parse.

I have spent the last the 3 days of freetime for your feature request. take your time to do the tests yourself. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I see. I got the impression you are working on all these cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into more cases later. This one was the most important. What about adding proper Hoa Regex unit testing to phpstan-src? We can assert the parsed regexes using https://github.com/mvorisek/Hoa-Regex/blob/0c8a5cbcf696df8f65168704af14ad5ebd48c641/test.php#L13-L25 function.

Copy link
Contributor Author

@staabm staabm May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not. Please send a PR to my branch/fork

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super familiar with phpstan-src codebase - where, what class can I base this unit test on?

Copy link
Contributor Author

@staabm staabm May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create a new file in tests/PHPStan/Type. we can later on move it somewhere else if this doesn't work for ondrej.

Comment on lines +7 to +8
-%token anchor \\(bBAZzG)|\^|\$
+%token anchor \\([bBAZzG])|\^|\$
Copy link
Contributor Author

@staabm staabm May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreleased hoa/regex upstream fix
hoaproject/Regex@5f670af

Comment on lines 16 to 17
- ( range() | literal() )+
+ ( <class_> | range() | literal() )+ <range>?
Copy link
Contributor Author

@staabm staabm May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreleased hoa/regex upstream fix
hoaproject/Regex@ce7fd7b
hoaproject/Regex@e770ada

Copy link
Contributor

@mvorisek mvorisek May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd patch was reverted, see #2589 (comment)

Comment on lines +25 to +26
-capturing:
+#capturing:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Seldaek
Copy link
Contributor

Seldaek commented May 18, 2024

This looks amazing, thanks @staabm.

Now I'm just wondering how I can get it to recognize Composer\Pcre\Preg::match and co as well to get the same quality of type output there :s

If you can think of some way to make this more reusable by third party libs wrapping preg functions it'd be great.

@staabm
Copy link
Contributor Author

staabm commented May 18, 2024

Now I'm just wondering how I can get it to recognize Composer\Pcre\Preg::match and co as well to get the same quality of type output there :s

If you can think of some way to make this more reusable by third party libs wrapping preg functions it'd be great.

at first I want to see what @ondrejmirtes thinks about all this. he was not yet involved.

After that: I can think of different ways to make this re-usable. for wrapper-libs which are 100% api compatible with the pcre_* native functions we could e.g.

  • rewrite the AST at analysis time
  • create separate extensions which re-use the underlying implementation
  • make it configurable via NEON config
  • introduce a new phpdoc type
  • ...

@Seldaek
Copy link
Contributor

Seldaek commented May 18, 2024

Ok cool thanks. Yes composer/pcre is almost one to one.. Except it throws instead of returning false. For the other api returning object results there it's a bit more complex but anyway I see your point and I'll let you wrap things up here first. Just wanted to let you know in case it can influence design decisions along the way but it's not urgent at all.

@staabm
Copy link
Contributor Author

staabm commented Jun 21, 2024

I am not sure why we have these 7.4/7.2 remaining errors. are you fine with merging anyway?

@staabm
Copy link
Contributor Author

staabm commented Jun 21, 2024

I am not sure why we have these 7.4/7.2 remaining errors. are you fine with merging anyway?

made the expectations more php version specific so we can move on and get a green build.

we see some PHPStan extension builds failing because of the problem mentioned in #2589 (comment)

@ondrejmirtes
Copy link
Member

What about the errors here? https://github.com/phpstan/phpstan-src/actions/runs/9610380632

Like: https://github.com/phpstan/phpstan-symfony/blob/bca27f1701fc1a297749e6c2a1e3da4462c1a6af/src/Type/Symfony/ParameterDynamicReturnTypeExtension.php#L185-L188

I know you mentioned === 1 not being able to handle that in the type-specifying extension.

I don't know whether you solved that, based on the results probably not. My opinion is that we could hardcode this specifically in TypeSpecifier, so that extension is called correctly with the right context.

@staabm
Copy link
Contributor Author

staabm commented Jun 21, 2024

I don't know whether you solved that, based on the results probably not. My opinion is that we could hardcode this specifically in TypeSpecifier, so that extension is called correctly with the right context.

if you have an idea how this need to look like, please go ahead and implement it.

the only idea I have atm would be to add a AST visitor which detects a if (preg_match(...) === 1) and sets a attribute which later on can be detected by the TypeSpecifier.

hopefully you have something better in mind :)

@mvorisek
Copy link
Contributor

What exactly is the problem, why we cannot check if the context/preg_match result is surely non-falsy?

@ondrejmirtes
Copy link
Member

@staabm I meant this: 328fcb9

@staabm
Copy link
Contributor Author

staabm commented Jun 21, 2024

I see, thanks. should we have the same for ==?

@ondrejmirtes
Copy link
Member

@staabm Yeah, it's probably harmless, but do that in a next PR, not here :)

@ondrejmirtes ondrejmirtes changed the title Implement array shapes for preg_match() $matches Implement array shapes for preg_match() $matches by-ref parameter Jun 21, 2024
@ondrejmirtes ondrejmirtes merged commit 721a0a6 into phpstan:1.11.x Jun 21, 2024
447 of 452 checks passed
@ondrejmirtes
Copy link
Member

Thank you!

@ondrejmirtes
Copy link
Member

An extension can be as simple as this phpstan/phpstan-nette@3e68a5d

@staabm
Copy link
Contributor Author

staabm commented Jun 21, 2024

the extension for composer/pcre is beeing worked on in composer/pcre#24

@westonruter
Copy link

I just noticed that this doesn't seem to play well with strict rules.

This example without strict rules works: https://phpstan.org/r/6187bcca-8fc8-459b-9790-21a82e62b345

But when I enable strict rules, I need to cast the return value of preg_match() to a bool and this seems to block the types from passing through. For example: https://phpstan.org/r/f368fc07-6438-43f1-b80b-93dee22535f4

Same thing happens when I cast to an int: https://phpstan.org/r/0d011355-4446-4bd8-9b8d-0c44907f9a07

@ondrejmirtes
Copy link
Member

@westonruter Please open a new bug report about this.

@westonruter
Copy link

Done: phpstan/phpstan#11262

@staabm
Copy link
Contributor Author

staabm commented Jul 5, 2024

extensions implemented for PHP-CS-Fixer Preg::match in PHP-CS-Fixer/PHP-CS-Fixer#8103

@staabm
Copy link
Contributor Author

staabm commented Jul 11, 2024

initial support for composer/pcre 2.x landed in composer/pcre#25
coverage for more methods is beeing worked on in separate PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants