Skip to content

Commit

Permalink
pcre2test: avoid printing invalid utf trail in partial match
Browse files Browse the repository at this point in the history
When match_invalid_utf is enabled, invalid UTF-8 data can't match
but it was mistakenly getting printed as part of a partial match
eventhough the ovector correctly didn't include it, as shown by:

  PCRE2 version 10.34 2019-11-21
    re> /(?<=..)X/match_invalid_utf,allvector
  data> XX\x80\=ph,ovector=1
  Partial match: \x{80}
  ** ovector[1] is not equal to the subject length: 2 != 3
   0: 2 2

Fix the logic to print instead the empty match that was returned
and as a side effect avoid a buffer overread when trying to decode
UTF-8 that was missing code units.

Fixes: #235
  • Loading branch information
carenas committed Apr 20, 2023
1 parent 9323329 commit 65c89ff
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 1 deletion.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ testtemp2
testtemp2grep
testtry
testtrygrep
testSinput
testbtables
testsaved1
testsaved2

m4/libtool.m4
m4/ltoptions.m4
Expand Down
2 changes: 1 addition & 1 deletion src/pcre2test.c
Original file line number Diff line number Diff line change
Expand Up @@ -8064,7 +8064,7 @@ for (gmatched = 0;; gmatched++)
rubriclength += 15;

PCHARS(backlength, pp, leftchar, ovector[0] - leftchar, utf, outfile);
PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile);
PCHARSV(pp, ovector[0], ovector[1] - ovector[0], utf, outfile);

if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used)
fprintf(outfile, " (JIT)");
Expand Down
4 changes: 4 additions & 0 deletions testdata/testinput10
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,10 @@
\= Expect no match
ab\x80cdef\=ph

/(?<=..)X/match_invalid_utf
XX\x80\=ph
XX\xef\=ph

/ab$/match_invalid_utf
ab\x80cdeab
\= Expect no match
Expand Down
8 changes: 8 additions & 0 deletions testdata/testoutput10
Original file line number Diff line number Diff line change
Expand Up @@ -1646,6 +1646,14 @@ Partial match: ab
ab\x80cdef\=ph
No match

/(?<=..)X/match_invalid_utf
XX\x80\=ph
Partial match:
** ovector[1] is not equal to the subject length: 2 != 3
XX\xef\=ph
Partial match:
** ovector[1] is not equal to the subject length: 2 != 3

/ab$/match_invalid_utf
ab\x80cdeab
0: ab
Expand Down

0 comments on commit 65c89ff

Please sign in to comment.