Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP PCRE2 not matching curly brackets (exact number of) - {} #1704

Closed
ghost opened this issue Jan 20, 2022 · 5 comments
Closed

PHP PCRE2 not matching curly brackets (exact number of) - {} #1704

ghost opened this issue Jan 20, 2022 · 5 comments

Comments

@ghost
Copy link

ghost commented Jan 20, 2022

Bug Description

PCRE2 exact count match ({}) isn't matching correctly. See reproduction below.

Reproduction steps

Given this DNS name: hnl-noc-fw1.pr.nws.noaa the following PCRE2 does not match though it's legitimate and matches outside of the website when used with actual products hence not a syntaxial problem.

^((?#Location)([Pp]([Rr][Hh])|([Pp][Gg]))|([Hh][Nn][Ll])|([Gg][Uu][Mm])|([Ll][Ii][Hh])|([Ii][Tt][Oo])|([Mm][Aa][Zz])|([Dd][Mm][Zz])(?# or Org)|([Ii][Tt][Cc]))-(?#Type and OS)[Nn][MmOo](?# Portfolio)[AaOoCcSsDd]-(?#Function or Type)[[:alnum:]]+\.[Pp][Rr]\.[Nn][Ww][Ss]\.[Nn][Oo][Aa]{2}$

If I change that final {2} to {1,3} it matches just fine hence the problem is the exact match case {2}.

^((?#Location)([Pp]([Rr][Hh])|([Pp][Gg]))|([Hh][Nn][Ll])|([Gg][Uu][Mm])|([Ll][Ii][Hh])|([Ii][Tt][Oo])|([Mm][Aa][Zz])|([Dd][Mm][Zz])(?# or Org)|([Ii][Tt][Cc]))-(?#Type and OS)[Nn][MmOo](?# Portfolio)[AaOoCcSsDd]-(?#Function or Type)[[:alnum:]]+\.[Pp][Rr]\.[Nn][Ww][Ss]\.[Nn][Oo][Aa]{1,3}$

THIS LAST ONE WORKS

Expected Outcome

The first one should work.

Browser

Edge

OS

Windows 2019 Server

@ghost ghost added the bug label Jan 20, 2022
@working-name
Copy link
Collaborator

Hi @peter-thoenen,

I ran this on php 8.0 and php 8.1 with pcre2 10.36 and the behavior matches the site's in both cases (1,3 vs 2). Can you see if that's the case on your end as well?

If so, then it's not a site issue but more of a regex issue. Please hop on to #regex on libera.chat and share your link - one of the volunteers might be able to suggest alternatives to your approach.

@ghost
Copy link
Author

ghost commented Jan 22, 2022

No I think you misunderstood, I know that works on the site in question. What I'm saying is on your regex101 site it doesn't. Select PHP->PCRE2, you will see the first one doesn't work.

@firasdib
Copy link
Owner

It doesn't work in PHP either, which indicates this is not an implementation error on my end, but rather the way PCRE2 matches your expression. We can see it fails in PCRE2, but works in PCRE, which might indicate a bug within the PCRE2 library.

@firasdib firasdib added upstream and removed bug labels Jan 23, 2022
@ghost
Copy link
Author

ghost commented Jan 23, 2022

Negative, when I use it with Tenable Security Center as written {2} which uses PHP 7.4 [1][2] it works just fine, it's only on your site it fails unless maybe it's a bug with PHP 8 but I thought you said above you tested that; I have no way to test PHP8. It's definitely legitimate PCRE2 syntax:

http://www.pcre.org/current/doc/html/pcre2syntax.html#TOC1

As you said, could be an issue with the library on your site but I would find that odd given I assume it's the standard PHP library which I would find odd given I assume you and Tenable are both using the standard PHP library unless it was introduced in PHP8.x hence their usage of 7.4 works.

[1] 7.4 >= 7.3 hence PCRE2

[2] The release notes an upgrade to PHP 7.4 was included with the TSC 5.19 version. (from vendor)

@firasdib
Copy link
Owner

firasdib commented Jan 23, 2022

Many assumptions in one response :-)

I am using PCRE, not PHP. PHP has their own implementation of PCRE, as does regex101.

You can test php code online, e.g. on https://sandbox.onlinephpfunctions.com/ (first result on google)

Here is the code generated by the website:

$re = '/^((?#Location)([Pp]([Rr][Hh])|([Pp][Gg]))|([Hh][Nn][Ll])|([Gg][Uu][Mm])|([Ll][Ii][Hh])|([Ii][Tt][Oo])|([Mm][Aa][Zz])|([Dd][Mm][Zz])(?# or Org)|([Ii][Tt][Cc]))-(?#Type and OS)[Nn][MmOo](?# Portfolio)[AaOoCcSsDd]-(?#Function or Type)[[:alnum:]]+\.[Pp][Rr]\.[Nn][Ww][Ss]\.[Nn][Oo][Aa]{2}$/m';
$str = 'hnl-noc-fw1.pr.nws.noaa';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

My results:

  • PHP 8: no match
  • PHP 7.4.13: no match
  • PHP 7.4.7: no match
  • PHP 7.4.0: match

Thus we can conclude, a change in the PHP implementation caused the regex to no longer match after 7.4.0. If we try it in Perl v5.24.2, using the following code:

use strict;

my $str = 'hnl-noc-fw1.pr.nws.noaa';
my $regex = qr/^((?#Location)([Pp]([Rr][Hh])|([Pp][Gg]))|([Hh][Nn][Ll])|([Gg][Uu][Mm])|([Ll][Ii][Hh])|([Ii][Tt][Oo])|([Mm][Aa][Zz])|([Dd][Mm][Zz])(?# or Org)|([Ii][Tt][Cc]))-(?#Type and OS)[Nn][MmOo](?# Portfolio)[AaOoCcSsDd]-(?#Function or Type)[[:alnum:]]+\.[Pp][Rr]\.[Nn][Ww][Ss]\.[Nn][Oo][Aa]{2}$/mp;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

We see it does indeed match, indicating that this problem looks to be in the PCRE2 library.

Looking at the PHP changelog, they updated to PCRE2 10.34 in release 7.4.6, which is a couple of versions behind what regex101 uses.

Thus I would urge you to open an issue with the PCRE2 maintainer so this can be resolved. Thank you.

Edit:
Output from pcre2test below

PCRE2 version 10.39 2021-10-29
  re> /^((?#Location)([Pp]([Rr][Hh])|([Pp][Gg]))|([Hh][Nn][Ll])|([Gg][Uu][Mm])|([Ll][Ii][Hh])|([Ii][Tt][Oo])|([Mm][Aa][Zz])|([Dd][Mm][Zz])(?# or Org)|([Ii][Tt][Cc]))-(?#Type and OS)[Nn][MmOo](?# Portfolio)[AaOoCcSsDd]-(?#Function or Type)[[:alnum:]]+\.[Pp][Rr]\.[Nn][Ww][Ss]\.[Nn][Oo][Aa]{2}$/gm
data> hnl-noc-fw1.pr.nws.noaa
No match
data> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants