Skip to content

Commit

Permalink
Implement PCRE2_EXTRA_CASELESS_RESTRICT and related features
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilipHazel committed Jan 29, 2023
1 parent fcceddc commit c13d54f
Show file tree
Hide file tree
Showing 14 changed files with 764 additions and 117 deletions.
7 changes: 7 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ configure.ac and CMakeLists.txt.
8. Fixed a bug in pcre2test when a ridiculously large string repeat required a
stupid amount of memory. It now gives a clean realloc() failure error.

9. Updates to restrict the interaction between ASCII and non-ASCII characters
for caseless matching and items like \d:

(a) Added PCRE2_EXTRA_CASELESS_RESTRICT to lock out mixing of ASCII and
non-ASCII when matching caselessly. This is also /r in pcre2test and
(?r) within patterns.


Version 10.42 11-December-2022
------------------------------
Expand Down
10 changes: 7 additions & 3 deletions HACKING
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Technical Notes about PCRE2
Technical notes about PCRE2
---------------------------

These are very rough technical notes that record potentially useful information
Expand Down Expand Up @@ -248,7 +248,6 @@ by a length and an offset into the pattern to specify the name.
The following have one data item that follows in the next vector element:

META_BIGVALUE Next is a literal >= META_END
META_OPTIONS (?i) and friends (data is new option bits)
META_POSIX POSIX class item (data identifies the class)
META_POSIX_NEG negative POSIX class item (ditto)

Expand Down Expand Up @@ -298,6 +297,11 @@ META_MINMAX {n,m} repeat
META_MINMAX_PLUS {n,m}+ repeat
META_MINMAX_QUERY {n,m}? repeat

This one is followed by two elements, giving the new option settings for the
main and extra options, respectively.

META_OPTIONS (?i) and friends

This one is followed by three elements. The first is 0 for '>' and 1 for '>=';
the next two are the major and minor numbers:

Expand Down Expand Up @@ -827,4 +831,4 @@ not a real opcode, but is used to check at compile time that tables indexed by
opcode are the correct length, in order to catch updating errors.

Philip Hazel
April 2022
January 2023
12 changes: 12 additions & 0 deletions maint/GenerateUcd.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@
# 10-January-2022: Addition of general Boolean property support
# 12-January-2022: Merge scriptx and bidiclass fields
# 14-January-2022: Enlarge Boolean property offset to 12 bits
# 28-January-2023: Remove ASCII "other case" from non-ASCII character that
# are present in caseless sets.
#
# ----------------------------------------------------------------------------
#
Expand Down Expand Up @@ -710,6 +712,16 @@ def write_bitsets(list, item_size):

# End of block of code for creating offsets for caseless matching sets.

# Scan the caseless sets, and for any non-ASCII character that has an ASCII
# character as its "base" other case, remove the other case. This makes it
# easier to handle those characters when the PCRE2 option for not mixing ASCII
# and non-ASCII is enabled. In principle one should perhaps scan for a
# non-ASCII alternative, but in practice these don't exist.

for s in caseless_sets:
for x in s:
if x > 127 and x + other_case[x] < 128:
other_case[x] = 0

# Combine all the tables

Expand Down
2 changes: 1 addition & 1 deletion maint/ucptest.c
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ switch(bidi)
printf("U+%04X %s %s: %s, %s, %s", c, bidiclass, typename, fulltypename,
scriptname, graphbreak);

if (is_just_one && othercase != c)
if (is_just_one && (othercase != c || caseset != 0))
{
printf(", U+%04X", othercase);
if (caseset != 0)
Expand Down
3 changes: 2 additions & 1 deletion src/pcre2.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
/* This is the public header file for the PCRE library, second API, to be
#included by applications that call PCRE2 functions.

Copyright (c) 2016-2021 University of Cambridge
Copyright (c) 2016-2023 University of Cambridge

-----------------------------------------------------------------------------
Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -153,6 +153,7 @@ D is inspected during pcre2_dfa_match() execution
#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */
#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */
#define PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK 0x00000040u /* C */
#define PCRE2_EXTRA_CASELESS_RESTRICT 0x00000080u /* C */

/* These are for pcre2_jit_compile(). */

Expand Down
Loading

0 comments on commit c13d54f

Please sign in to comment.