Skip to content

Commit

Permalink
Add serialization functions and tests with updated pcre2test. Fix
Browse files Browse the repository at this point in the history
PCRE2_INFO_SIZE issues.
  • Loading branch information
PhilipHazel committed Jan 23, 2015
1 parent d4daaf9 commit 5438fc8
Show file tree
Hide file tree
Showing 40 changed files with 3,145 additions and 976 deletions.
28 changes: 21 additions & 7 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Change Log for PCRE2
--------------------

Version 10.10 13-January-2015
-----------------------------
Version 10.10 xx-xxx-2015
-------------------------

1. When a pattern is compiled, it remembers the highest back reference so that
when matching, if the ovector is too small, extra memory can be obtained to
1. When a pattern is compiled, it remembers the highest back reference so that
when matching, if the ovector is too small, extra memory can be obtained to
use instead. A conditional subpattern whose condition is a check on a capture
having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
another kind of back reference, but it was not setting the highest
Expand All @@ -16,8 +16,21 @@ bug was that the condition was always treated as FALSE when the capture could
not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
has been fixed.

2. Functions for serialization and deserialization of sets of compiled patterns
have been added.

3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
excess code units at the end of the data block that may occasionally occur if
the code for calculating the size over-estimates. This change stops the
serialization code copying uninitialized data, to which valgrind objects. The
documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
include the general overhead. This has been corrected.

4. All code units in every slot in the table of group names are now set, again
in order to avoid accessing uninitialized data when serializing.


Version 10.00 05-January-2015
Version 10.00 05-January-2015
-----------------------------

Version 10.00 is the first release of PCRE2, a revised API for the PCRE
Expand All @@ -30,8 +43,9 @@ logged. In addition to the API changes, the following changes were made. They
are either new functionality, or bug fixes and other noticeable changes of
behaviour that were implemented after the code had been forked.

1. Unicode support is now enabled by default, but it can optionally be
disabled.
1. Including Unicode support at build time is now enabled by default, but it
can optionally be disabled. It is not enabled by default at run time (no
change).

2. The test program, now called pcre2test, was re-specified and almost
completely re-written. Its input is not compatible with input for pcretest.
Expand Down
13 changes: 13 additions & 0 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ dist_html_DATA = \
doc/html/pcre2_match_data_create_from_pattern.html \
doc/html/pcre2_match_data_free.html \
doc/html/pcre2_pattern_info.html \
doc/html/pcre2_serialize_decode.html \
doc/html/pcre2_serialize_encode.html \
doc/html/pcre2_serialize_free.html \
doc/html/pcre2_serialize_get_number_of_codes.html \
doc/html/pcre2_set_bsr.html \
doc/html/pcre2_set_callout.html \
doc/html/pcre2_set_character_tables.html \
Expand Down Expand Up @@ -89,6 +93,7 @@ dist_html_DATA = \
doc/html/pcre2perform.html \
doc/html/pcre2posix.html \
doc/html/pcre2sample.html \
doc/html/pcre2serialize.html \
doc/html/pcre2stack.html \
doc/html/pcre2syntax.html \
doc/html/pcre2test.html \
Expand Down Expand Up @@ -127,6 +132,10 @@ dist_man_MANS = \
doc/pcre2_match_data_create_from_pattern.3 \
doc/pcre2_match_data_free.3 \
doc/pcre2_pattern_info.3 \
doc/pcre2_serialize_decode.3 \
doc/pcre2_serialize_encode.3 \
doc/pcre2_serialize_free.3 \
doc/pcre2_serialize_get_number_of_codes.3 \
doc/pcre2_set_bsr.3 \
doc/pcre2_set_callout.3 \
doc/pcre2_set_character_tables.3 \
Expand Down Expand Up @@ -162,6 +171,7 @@ dist_man_MANS = \
doc/pcre2perform.3 \
doc/pcre2posix.3 \
doc/pcre2sample.3 \
doc/pcre2serialize.3 \
doc/pcre2stack.3 \
doc/pcre2syntax.3 \
doc/pcre2test.1 \
Expand Down Expand Up @@ -316,6 +326,7 @@ COMMON_SOURCES = \
src/pcre2_newline.c \
src/pcre2_ord2utf.c \
src/pcre2_pattern_info.c \
src/pcre2_serialize.c \
src/pcre2_string_utils.c \
src/pcre2_study.c \
src/pcre2_substitute.c \
Expand Down Expand Up @@ -573,6 +584,7 @@ EXTRA_DIST += \
testdata/testinput16 \
testdata/testinput17 \
testdata/testinput18 \
testdata/testinput19 \
testdata/testinputEBC \
testdata/testoutput1 \
testdata/testoutput2 \
Expand All @@ -598,6 +610,7 @@ EXTRA_DIST += \
testdata/testoutput16 \
testdata/testoutput17 \
testdata/testoutput18 \
testdata/testoutput19 \
testdata/testoutputEBC \
perltest.sh

Expand Down
3 changes: 2 additions & 1 deletion NON-AUTOTOOLS-BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ can skip ahead to the CMake section.
pcre2_newline.c
pcre2_ord2utf.c
pcre2_pattern_info.c
pcre2_serialize.c
pcre2_string_utils.c
pcre2_study.c
pcre2_substitute.c
Expand Down Expand Up @@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
course.

=============================
Last Updated: 05 January 2015
Last Updated: 19 January 2015
103 changes: 56 additions & 47 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -527,11 +527,10 @@ Testing PCRE2
------------

To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
There is another script called RunGrepTest that tests the options of the
pcre2grep command. When JIT support is enabled, a third test program called
pcre2_jit_test is built. Both the scripts and all the program tests are run if
you obey "make check". For other environments, see the instructions in
NON-AUTOTOOLS-BUILD.
There is another script called RunGrepTest that tests the pcre2grep command.
When JIT support is enabled, a third test program called pcre2_jit_test is
built. Both the scripts and all the program tests are run if you obey "make
check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.

The RunTest script runs the pcre2test test program (which is documented in its
own man page) on each of the relevant testinput files in the testdata
Expand All @@ -544,9 +543,9 @@ Some tests are relevant only when certain build-time options were selected. For
example, the tests for UTF-8/16/32 features are run only when Unicode support
is available. RunTest outputs a comment when it skips a test.

Many of the tests that are not skipped are run twice if JIT support is
available. On the second run, JIT compilation is forced. This testing can be
suppressed by putting "nojit" on the RunTest command line.
Many (but not all) of the tests that are not skipped are run twice if JIT
support is available. On the second run, JIT compilation is forced. This
testing can be suppressed by putting "nojit" on the RunTest command line.

The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
libraries that are enabled. If you want to run just one set of tests, call
Expand All @@ -570,33 +569,38 @@ in numerical order.
You can also call RunTest with the single argument "list" to cause it to output
a list of tests.

The first two tests can always be run, as they expect only plain text strings
(not UTF) and make no use of Unicode properties. The first test file can be fed
The test sequence starts with "test 0", which is a special test that has no
input file, and whose output is not checked. This is because it will be
different on different hardware and with different configurations. The test
exists in order to exercise some of pcre2test's code that would not otherwise
be run.

Tests 1 and 2 can always be run, as they expect only plain text strings (not
UTF) and make no use of Unicode properties. The first test file can be fed
directly into the perltest.sh script to check that Perl gives the same results.
The only difference you should see is in the first few lines, where the Perl
version is given instead of the PCRE2 version. The second set of tests check
auxiliary functions, error detection, and run-time flags that are specific to
PCRE2, as well as the POSIX wrapper API. It also uses the debugging flags to
check some of the internals of pcre2_compile().
PCRE2. It also uses the debugging flags to check some of the internals of
pcre2_compile().

If you build PCRE2 with a locale setting that is not the standard C locale, the
character tables may be different (see next paragraph). In some cases, this may
cause failures in the second set of tests. For example, in a locale where the
isprint() function yields TRUE for characters in the range 128-255, the use of
[:isascii:] inside a character class defines a different set of characters, and
this shows up in this test as a difference in the compiled code, which is being
listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE2.

The third set of tests checks pcre2_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The script uses the "locale" command to check for the
availability of the "fr_FR", "french", or "fr" locale, and uses the first one
that it finds. If the "locale" command fails, or if its output doesn't include
"fr_FR", "french", or "fr" in the list of available locales, the third test
cannot be run, and a comment is output to say why. If running this test
produces an error like this
listed for checking. For example, where the comparison test output contains
[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
cases. This is not a bug in PCRE2.

Test 3 checks pcre2_maketables(), the facility for building a set of character
tables for a specific locale and using them instead of the default tables. The
script uses the "locale" command to check for the availability of the "fr_FR",
"french", or "fr" locale, and uses the first one that it finds. If the "locale"
command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
the list of available locales, the third test cannot be run, and a comment is
output to say why. If running this test produces an error like this:

** Failed to set locale "fr_FR"

Expand All @@ -606,33 +610,37 @@ alternative output files for the third test, because three different versions
of the French locale have been encountered. The test passes if its output
matches any one of them.

The fourth and fifth tests check UTF and Unicode property support, the fourth
being compatible with the perltest.sh script, and the fifth checking
PCRE2-specific things.
Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
with the perltest.sh script, and test 5 checking PCRE2-specific things.

Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
non-UTF mode and UTF-mode with Unicode property support, respectively.

Test 8 checks some internal offsets and code size features; it is run only when
the default "link size" of 2 is set (in other cases the sizes change) and when
Unicode support is enabled.

Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
16-bit and 32-bit modes. These are tests that generate different output in
8-bit mode. Each pair are for general cases and Unicode support, respectively.
Test 13 checks the handling of non-UTF characters greater than 255 by
pcre2_dfa_match() in 16-bit and 32-bit modes.

The sixth and seventh tests check the pcre2_dfa_match() alternative matching
function, in non-UTF mode and UTF-mode with Unicode property support,
respectively.
Test 14 contains a number of tests that must not be run with JIT. They check,
among other non-JIT things, the match-limiting features of the intepretive
matcher.

The eighth test checks some internal offsets and code size features; it is
run only when the default "link size" of 2 is set (in other cases the sizes
change) and when Unicode support is enabled.
Test 15 is run only when JIT support is not available. It checks that an
attempt to use JIT has the expected behaviour.

The ninth and tenth tests are run only in 8-bit mode, and the eleventh and
twelfth tests are run only in 16-bit and 32-bit modes. These are tests that
generate different output in 8-bit mode. Each pair are for general cases and
Unicode support, respectively. The thirteenth test checks the handling of
non-UTF characters greater than 255 by pcre2_dfa_match() in 16-bit and 32-bit
modes.
Test 16 is run only when JIT support is available. It checks JIT complete and
partial modes, match-limiting under JIT, and other JIT-specific features.

The fourteenth test is run only when JIT support is not available, and the
fifteenth test is run only when JIT support is available. They test some
JIT-specific features such as information output from pcre2test about JIT
compilation.
Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
the 8-bit library, without and with Unicode support, respectively.

The sixteenth and seventeenth tests are run only in 8-bit mode. They check the
POSIX interface to the 8-bit library, without and with Unicode support,
respectively.
Test 19 checks the serialization functions by writing a set of compiled
patterns to a file, and then reloading and checking them.


Character tables
Expand Down Expand Up @@ -718,6 +726,7 @@ The distribution should contain the files listed below.
src/pcre2_newline.c )
src/pcre2_ord2utf.c )
src/pcre2_pattern_info.c )
src/pcre2_serialize.c )
src/pcre2_string_utils.c )
src/pcre2_study.c )
src/pcre2_substitute.c )
Expand Down Expand Up @@ -816,4 +825,4 @@ The distribution should contain the files listed below.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 05 January 2015
Last updated: 20 January 2015
17 changes: 15 additions & 2 deletions RunTest
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ title15="Test 15: JIT-specific features when JIT is not available"
title16="Test 16: JIT-specific features when JIT is available"
title17="Test 17: Tests of the POSIX interface, excluding UTF/UCP"
title18="Test 18: Tests of the POSIX interface with UTF/UCP"
title19="Test 19: Serialization tests"
maxtest=18

if [ $# -eq 1 -a "$1" = "list" ]; then
Expand All @@ -87,6 +88,7 @@ if [ $# -eq 1 -a "$1" = "list" ]; then
echo $title16
echo $title17
echo $title18
echo $title19
exit 0
fi

Expand Down Expand Up @@ -207,6 +209,7 @@ do15=no
do16=no
do17=no
do18=no
do19=no

while [ $# -gt 0 ] ; do
case $1 in
Expand All @@ -229,6 +232,7 @@ while [ $# -gt 0 ] ; do
16) do16=yes;;
17) do17=yes;;
18) do18=yes;;
19) do19=yes;;
-8) arg8=yes;;
-16) arg16=yes;;
-32) arg32=yes;;
Expand Down Expand Up @@ -364,7 +368,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
$do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \
$do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \
$do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \
$do16 = no -a $do17 = no -a $do18 = no \
$do16 = no -a $do17 = no -a $do18 = no -a $do19 = no \
]; then
do0=yes
do1=yes
Expand All @@ -385,6 +389,7 @@ if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \
do16=yes
do17=yes
do18=yes
do19=yes
fi

# Handle any explicit skips at this stage, so that an argument list may consist
Expand Down Expand Up @@ -721,10 +726,18 @@ for bmode in "$test8" "$test16" "$test32"; do
fi
fi

# Serialization tests

if [ $do19 = yes ] ; then
echo $title19
$sim $valgrind ./pcre2test -q $bmode $testdata/testinput19 testtry
checkresult $? 19 ""
fi

# End of loop for 8/16/32-bit tests
done

# Clean up local working files
rm -f testSinput test3input test3output test3outputA test3outputB teststdout testtry
rm -f testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout testtry

# End
3 changes: 2 additions & 1 deletion doc/html/NON-AUTOTOOLS-BUILD.txt
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ can skip ahead to the CMake section.
pcre2_newline.c
pcre2_ord2utf.c
pcre2_pattern_info.c
pcre2_serialize.c
pcre2_string_utils.c
pcre2_study.c
pcre2_substitute.c
Expand Down Expand Up @@ -391,4 +392,4 @@ The site currently has ports for PCRE1 releases, but PCRE2 should follow in due
course.

=============================
Last Updated: 05 January 2015
Last Updated: 19 January 2015
Loading

0 comments on commit 5438fc8

Please sign in to comment.