Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support # comments in regex #3735

Merged
merged 3 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions src/Type/Regex/RegexGroupParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
use function count;
use function in_array;
use function is_int;
use function preg_replace;
use function rtrim;
use function sscanf;
use function str_contains;
Expand Down Expand Up @@ -64,20 +65,25 @@ public function parseGroups(string $regex): ?array
return null;
}

$rawRegex = $this->regexExpressionHelper->removeDelimitersAndModifiers($regex);
try {
$ast = self::$parser->parse($rawRegex);
} catch (Exception) {
return null;
}

$modifiers = $this->regexExpressionHelper->getPatternModifiers($regex) ?? '';
foreach (self::NOT_SUPPORTED_MODIFIERS as $notSupportedModifier) {
if (str_contains($modifiers, $notSupportedModifier)) {
return null;
}
}

if (str_contains($modifiers, 'x')) {
// in freespacing mode the # character starts a comment and runs until the end of the line
$regex = preg_replace('/[^?]#.*/', '', $regex) ?? '';
Copy link
Contributor

@Seldaek Seldaek Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also get rid of the character before # due to [^?]. I realize in most sane cases this will be a space and thus it's harmless, but IMO this should rather be a lookbehind like (?<!\?). I assume the goal was to exclude (?#...) but those are already ignored/handled by the lexer https://github.com/phpstan/phpstan-src/blob/2.0.x/resources/RegexGrammar.pp#L80-L82

Copy link
Contributor Author

@staabm staabm Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will look into it, thanks for the heads up.

(I had to use this additional char before the # as we would otherwise destroy comments in (?# ..) notation). since we replace before lexing we need to make sure we don't turn the comments into something which is no longer a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in #3739

}

$rawRegex = $this->regexExpressionHelper->removeDelimitersAndModifiers($regex);
try {
$ast = self::$parser->parse($rawRegex);
} catch (Exception) {
return null;
}

$captureOnlyNamed = false;
if ($this->phpVersion->supportsPregCaptureOnlyNamedGroups()) {
$captureOnlyNamed = str_contains($modifiers, 'n');
Expand Down
32 changes: 32 additions & 0 deletions tests/PHPStan/Analyser/nsrt/bug-12242.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<?php // lint >= 7.4

namespace Bug12242;

use function PHPStan\Testing\assertType;

function foo(string $str): void
{
$regexp = '/
# (
([\d,]*)
# )
/x';
if (preg_match($regexp, $str, $match)) {
assertType('array{string, string}', $match);
}
}

function bar(string $str): void
{
$regexp = '/^
(\w+) # column type [1]
[\(] # (
?([\d,]*) # size or size, precision [2]
[\)] # )
?\s* # whitespace
(\w*) # extra description (UNSIGNED, CHARACTER SET, ...) [3]
$/x';
if (preg_match($regexp, $str, $matches)) {
assertType('array{string, non-empty-string, string, string}', $matches);
}
}
Loading