Skip to content

Commit

Permalink
add pcre2_get_match_data_heapframes_size() (#191)
Browse files Browse the repository at this point in the history
Since PCRE2 10.41, the match data contains a pointer to a vector of
frames allocated in the heap and that are used by pcre2_match()
when doing non JIT matches.

There is though, no outside visibility on the size of it, and therefore
the memory it uses is locked away until match_data itself is freed.

Add an API that allows getting that value, so an application could
decide based on its own experienced memory pressure to keep reusing
that match_data or not.

While at it, update the documentation of other related functions for
clarity.
  • Loading branch information
carenas authored Jan 17, 2023
1 parent 4d66adc commit c80c633
Show file tree
Hide file tree
Showing 11 changed files with 139 additions and 10 deletions.
14 changes: 9 additions & 5 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,19 @@ Change Log for PCRE2 - see also the Git log
Version 10.43 xx-xxx-202x
-------------------------

1. The test program added by change 2 of 10.42 didn't work when the default
newline setting didn't include \n as a newline. One test needed (*LF) to ensure
1. The test program added by change 2 of 10.42 didn't work when the default
newline setting didn't include \n as a newline. One test needed (*LF) to ensure
that it worked.

2. Added the new freestanding POSIX test program to the ManyConfigTests script
in the maint directory (overlooked in 2 below). Also improved the selection
facilities in that script, and added a test with JIT in a non-source directory,
2. Added the new freestanding POSIX test program to the ManyConfigTests script
in the maint directory (overlooked in 2 below). Also improved the selection
facilities in that script, and added a test with JIT in a non-source directory,
fixing an oversight that would have made such a test fail before.

3. Added pcre2_get_match_data_heapframes_size() and related pcre2test flags
to allow for finer control of the heap used when pcre2_match() without JIT is
used and the match_data might be reused.


Version 10.42 11-December-2022
------------------------------
Expand Down
2 changes: 2 additions & 0 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dist_html_DATA = \
doc/html/pcre2_general_context_free.html \
doc/html/pcre2_get_error_message.html \
doc/html/pcre2_get_mark.html \
doc/html/pcre2_get_match_data_heapframes_size.html \
doc/html/pcre2_get_match_data_size.html \
doc/html/pcre2_get_ovector_count.html \
doc/html/pcre2_get_ovector_pointer.html \
Expand Down Expand Up @@ -142,6 +143,7 @@ dist_man_MANS = \
doc/pcre2_general_context_free.3 \
doc/pcre2_get_error_message.3 \
doc/pcre2_get_mark.3 \
doc/pcre2_get_match_data_heapframes_size.3 \
doc/pcre2_get_match_data_size.3 \
doc/pcre2_get_ovector_count.3 \
doc/pcre2_get_ovector_pointer.3 \
Expand Down
39 changes: 39 additions & 0 deletions doc/html/pcre2_get_match_data_heapframes_size.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<html>
<head>
<title>pcre2_get_match_data_heapframes_size specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre2_get_match_data_heapframes_size man page</h1>
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>
<p>
This page is part of the PCRE2 HTML documentation. It was generated
automatically from the original man page. If there is any nonsense in it,
please consult the man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre2.h&#62;</b>
</P>
<P>
<b>PCRE2_SIZE pcre2_get_match_data_heapframes_size(pcre2_match_data *<i>match_data</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns the size, in bytes, of the heapframes data block that is owned
by its argument.
</P>
<P>
There is a complete description of the PCRE2 native API in the
<a href="pcre2api.html"><b>pcre2api</b></a>
page and a description of the POSIX API in the
<a href="pcre2posix.html"><b>pcre2posix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE2 index page</a>.
</p>
2 changes: 1 addition & 1 deletion doc/html/pcre2_match_data_free.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ <h1>pcre2_match_data_free man page</h1>
</P>
<P>
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
match data block, the copy of the subject that was referencedd within the block is
also freed.
</P>
<P>
Expand Down
7 changes: 7 additions & 0 deletions doc/html/pcre2test.html
Original file line number Diff line number Diff line change
Expand Up @@ -686,6 +686,7 @@ <h1>pcre2test man page</h1>
fullbincode show binary code with lengths
/I info show info about compiled pattern
hex unquoted characters are hexadecimal
heapframes_size show match data heapframes size
jit[=&#60;number&#62;] use JIT
jitfast use JIT fast path
jitverify verify JIT use
Expand Down Expand Up @@ -778,6 +779,12 @@ <h1>pcre2test man page</h1>
number of capturing parentheses in the pattern.
</P>
<P>
The <b>heapframes_size</b> modifier shows the size, in bytes, of the allocated
heapframes used by <b>pcre2_match()</b> and associated with the match_data.
The vector is reused by all matching patterns that use that `pcre2_match_data`
and will be expanded as needed.
</P>
<P>
The <b>callout_info</b> modifier requests information about all the callouts in
the pattern. A list of them is output at the end of any other information that
is requested. For each callout, either its number or string is given, followed
Expand Down
27 changes: 27 additions & 0 deletions doc/pcre2_get_match_data_heapframes_size.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.TH PCRE2_GET_MATCH_DATA_HEAPFRAMES_SIZE 3 "13 January 2023" "PCRE2 10.43"
.SH NAME
PCRE2 - Perl-compatible regular expressions (revised API)
.SH SYNOPSIS
.rs
.sp
.B #include <pcre2.h>
.PP
.nf
.B PCRE2_SIZE pcre2_get_match_data_heapframes_size(pcre2_match_data *\fImatch_data\fP);
.fi
.
.SH DESCRIPTION
.rs
.sp
This function returns the size, in bytes, of the heapframes data block that is owned
by its argument.
.P
There is a complete description of the PCRE2 native API in the
.\" HREF
\fBpcre2api\fP
.\"
page and a description of the POSIX API in the
.\" HREF
\fBpcre2posix\fP
.\"
page.
2 changes: 1 addition & 1 deletion doc/pcre2_match_data_free.3
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ using the memory freeing function from the general context or compiled pattern
with which it was created, or \fBfree()\fP if that was not set.
.P
If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this
match data block, the copy of the subject that was remembered with the block is
match data block, the copy of the subject that was referencedd within the block is
also freed.
.P
There is a complete description of the PCRE2 native API in the
Expand Down
6 changes: 6 additions & 0 deletions doc/pcre2test.1
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,7 @@ heavily used in the test files.
fullbincode show binary code with lengths
/I info show info about compiled pattern
hex unquoted characters are hexadecimal
heapframes_size show match data heapframes size
jit[=<number>] use JIT
jitfast use JIT fast path
jitverify verify JIT use
Expand Down Expand Up @@ -728,6 +729,11 @@ The \fBframesize\fP modifier shows the size, in bytes, of the storage frames
used by \fBpcre2_match()\fP for handling backtracking. The size depends on the
number of capturing parentheses in the pattern.
.P
The \fBheapframes_size\fP modifier shows the size, in bytes, of the allocated
heapframes used by \fBpcre2_match()\fP and associated with the match_data.
The vector is reused by all matching patterns that use that `pcre2_match_data`
and will be expanded as needed.
.P
The \fBcallout_info\fP modifier requests information about all the callouts in
the pattern. A list of them is output at the end of any other information that
is requested. For each callout, either its number or string is given, followed
Expand Down
3 changes: 3 additions & 0 deletions src/pcre2.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,8 @@ PCRE2_EXP_DECL PCRE2_SPTR PCRE2_CALL_CONVENTION \
pcre2_get_mark(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \
pcre2_get_match_data_size(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \
pcre2_get_match_data_heapframes_size(pcre2_match_data *); \
PCRE2_EXP_DECL uint32_t PCRE2_CALL_CONVENTION \
pcre2_get_ovector_count(pcre2_match_data *); \
PCRE2_EXP_DECL PCRE2_SIZE *PCRE2_CALL_CONVENTION \
Expand Down Expand Up @@ -851,6 +853,7 @@ pcre2_compile are called by application code. */
#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_)
#define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_)
#define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_)
#define pcre2_get_match_data_heapframes_size PCRE2_SUFFIX(pcre2_get_match_data_heapframes_size_)
#define pcre2_get_match_data_size PCRE2_SUFFIX(pcre2_get_match_data_size_)
#define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_)
#define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_)
Expand Down
12 changes: 12 additions & 0 deletions src/pcre2_match_data.c
Original file line number Diff line number Diff line change
Expand Up @@ -170,4 +170,16 @@ return offsetof(pcre2_match_data, ovector) +
2 * (match_data->oveccount) * sizeof(PCRE2_SIZE);
}



/*************************************************
* Get heapframes size *
*************************************************/

PCRE2_EXP_DEFN PCRE2_SIZE PCRE2_CALL_CONVENTION
pcre2_get_match_data_heapframes_size(pcre2_match_data *match_data)
{
return match_data->heapframes_size;
}

/* End of pcre2_match_data.c */
35 changes: 32 additions & 3 deletions src/pcre2test.c
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@ so many of them that they are split into two fields. */
#define CTL2_NULL_REPLACEMENT 0x00002000u
#define CTL2_FRAMESIZE 0x00004000u

#define CTL2_HEAPFRAMES_SIZE 0x20000000u /* Informational */
#define CTL2_NL_SET 0x40000000u /* Informational */
#define CTL2_BSR_SET 0x80000000u /* Informational */

Expand Down Expand Up @@ -682,6 +683,7 @@ static modstruct modlist[] = {
{ "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) },
{ "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) },
{ "heap_limit", MOD_CTM, MOD_INT, 0, MO(heap_limit) },
{ "heapframes_size", MOD_PAT, MOD_CTL, CTL2_HEAPFRAMES_SIZE, PO(control2) },
{ "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) },
{ "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) },
{ "jit", MOD_PAT, MOD_IND, 7, PO(jit) },
Expand Down Expand Up @@ -786,8 +788,8 @@ static modstruct modlist[] = {
CTL_JITVERIFY|CTL_MEMORY|CTL_PUSH|CTL_PUSHCOPY| \
CTL_PUSHTABLESCOPY|CTL_USE_LENGTH)

#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_FRAMESIZE| \
CTL2_NL_SET)
#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET| \
CTL2_HEAPFRAMES_SIZE|CTL2_FRAMESIZE|CTL2_NL_SET)

/* Controls that apply only at compile time with 'push'. */

Expand Down Expand Up @@ -4130,7 +4132,7 @@ Returns: nothing
static void
show_controls(uint32_t controls, uint32_t controls2, const char *before)
{
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s",
before,
((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "",
((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "",
Expand All @@ -4153,6 +4155,7 @@ fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s
((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "",
((controls & CTL_GETALL) != 0)? " getall" : "",
((controls & CTL_GLOBAL) != 0)? " global" : "",
((controls & CTL2_HEAPFRAMES_SIZE) != 0)? " heapframes_size" : "",
((controls & CTL_HEXPAT) != 0)? " hex" : "",
((controls & CTL_INFO) != 0)? " info" : "",
((controls & CTL_JITFAST) != 0)? " jitfast" : "",
Expand Down Expand Up @@ -4357,6 +4360,31 @@ fprintf(outfile, "Frame size for pcre2_match(): %" SIZ_FORM "\n", frame_size);



/*************************************************
* Show heapframes size info for a match_data *
*************************************************/

static void
show_heapframes_size(void)
{
PCRE2_SIZE heapframes_size;
#ifdef SUPPORT_PCRE2_8
if (code_unit_size == 1)
heapframes_size = pcre2_get_match_data_heapframes_size_8(match_data8);
#endif
#ifdef SUPPORT_PCRE2_16
if (code_unit_size == 2)
heapframes_size = pcre2_get_match_data_heapframes_size_16(match_data16);
#endif
#ifdef SUPPORT_PCRE2_32
if (code_unit_size == 4)
heapframes_size = pcre2_get_match_data_heapframes_size_32(match_data32);
#endif
fprintf(outfile, "Heapframes size in match_data: %" SIZ_FORM "\n", heapframes_size);
}



/*************************************************
* Get and output an error message *
*************************************************/
Expand Down Expand Up @@ -5971,6 +5999,7 @@ if ((pat_patctl.control2 & CTL2_NL_SET) != 0)

if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info();
if ((pat_patctl.control2 & CTL2_FRAMESIZE) != 0) show_framesize();
if ((pat_patctl.control2 & CTL2_HEAPFRAMES_SIZE) != 0) show_heapframes_size();
if ((pat_patctl.control & CTL_ANYINFO) != 0)
{
int rc = show_pattern_info();
Expand Down

0 comments on commit c80c633

Please sign in to comment.