-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"regular expression too large" for seemingly simple regex #119
Comments
This is, I'm afraid, expected behaviour. When there is a group that has a fixed iteration limit, it gets replicated in the generated code, so the larger the upper limit, the longer the generated code. Use the pcre2test "memory" option to see memory usage, for example: /abcd/memory as a pattern, or pcre2test -s memory . Using a set of alternatives makes the basic group longer, and hence more memory is needed. In the 8-bit library, 16-bit values are used for lengths within the compiled pattern, thus limiting its size to around 64K. However, you can configure PCRE2 --with-link-size=3 (or 4) to use larger internal links, in which case the compiled pattern can be much bigger. There are no plans to change any of this, so I'm going to close this issue. |
Hi Philip, thanks for the quick response. My curiosity has been satisfied :-) |
Hi, Philip, We had got the same error message, but our re string is really very large: PCRE2 compilation failed at offset 376104: code 120 msg regulare expression is too large Then how to adjust compile option to meet our need of super long regular string? thanks. Edit: our expression contains a large group of keywords. like: |
Please read the comments above. The |
Yes, we had config this option with the max value , 4. it is not the solution in our case. more info, "PCRE2 compilation failed at offset 376104: code 120 msg regulare expression is too large", here 376104 is exactly the regex string length in our case. and this exception happens occasionally, not always. |
It must be a truly large regex if link size 4 cannot handle it. I'm afraid that is the absolute limit. However, 376104 is nowhere near the limit for a 32-bit number, so I'm wondering if something else is going on here. You would get that error with the default link size of 2. Are you sure you are linking with a version of PCRE2 that is compiled with --with-link-size=4? Or could it be accidentally linking with a system PCRE2 that has the default? |
in the pcre2.cmake, we had add some env like that set(PCRE2_LINK_SIZE 4) so it is confirmed that with-link-size=4; and without this setting, our large regex string will get compiling failed all the time. |
The question wasn't how you compiled pcre2, but whether you use the compiled pcre or accidentally the system pcre. Btw the compiled pcre2 has a pcre2test tool, you can try your regex there. |
OK, we will have a try. thanks every body for helps. |
The following regular expression raises an error (
Failed: error 120 at offset 27: regular expression is too large
):I know the character classes can be combined into one, e.g. like so:
This regular expression compiles just fine however, so do a lot of "simplified" versions of the first pattern:
As far as I can tell the documentation only states that the numbers inside a curly-braced repetition must be less than 65536, which 1025 does not even come close to.
I am guessing there is some complexity introduced by the combination of alternatives using
|
and character classes including character ranges, but I wanted to check anyway: is this the intended behavior?I can reproduce this behavior on both my local machine using
pcre2test
(build of the master branch with default settings on Linux), as well as https://regex101.com/ with the PCRE2 engine.The text was updated successfully, but these errors were encountered: