PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

jagprog5 · 2023-04-20T19:53:32Z

Suppose I create a match pattern: 0xE6, 0xBC, 0xA2. This pattern gives a single complete utf8 character.

I then compile the pattern with PARTIAL_HARD and MATCH_INVALID_UTF, and use the compiled pattern to match against the subject string consisting of only the first byte of the pattern: 0xE6.

I expect the match to give a partial match consisting of the entire subject string. Instead, it gives no match. Is this correct behavior?

carenas · 2023-04-20T21:17:18Z

MATCH_INVALID_UTF means (ironically) that anything that is not perfectly valid UTF will be ignored, hence why you can't match an incomplete UTF subject.

If not in UTF mode (which means not using PCRE2_UTF nor PCRE2_MATCH_INVALID_UTF) you can:

$ pcre2test
PCRE2 version 10.42 2022-12-11
  re> /e6 bc a2/hex
data> \xe6\=ph
Partial match: \xe6

note the use of hex in pcre2test is just to avoid the ambiguity of using instead \x, so don't expect that in your pattern string.

jagprog5 closed this as completed Apr 20, 2023

SolitaryGrass mentioned this issue May 31, 2023

internal_dfa_match, a stack overflow occurred due to recursive calls. #258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

jagprog5 commented Apr 20, 2023

carenas commented Apr 20, 2023 •

edited

Loading

PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

PARTIAL_HARD with MATCH_INVALID_UTF does not give partial match on incomplete multibyte #239

Comments

jagprog5 commented Apr 20, 2023

carenas commented Apr 20, 2023 • edited Loading

carenas commented Apr 20, 2023 •

edited

Loading