pcre2_compile: make errorcode parameter optional #102

carenas · 2022-04-18T22:20:05Z

Not sure why the interface was designed this way, but it would seem that it was evolved from the one available in the original PCRE and that was using instead a text buffer to hold the error message directly.

While the use of both and error code and an error offset is useful and might be encouraged, shouldn't be required IMHO so this change tries to loosen that rule by allowing a NULL error, and showing how it will be used with some code from our own codebase.

With the risk of starting a bike shedding discussion, I would assume that making the offset ALSO optional might be useful, but had punted that change for the future once it would be discussed better.

There is also the posibility of an ABI breaking change that would either consolidate this 2 values (ex: using the offset with negative values of a ssize_t) for errors like is done in other functions, or creating an opaque parameter to replace this two but it is obviously not something to consider until pcre3

When the pcre2_compile interface was introduced, code was added to check that BOTH the errorcode and erroroffset were not NULL, but usually only one or the other is needed, in adition to a NULL return value to detect failures. Make the minimum change possible to allow passing a NULL and simplify some of the calls used in pcre2_jit_test that will benefit from this change. A more complex change that would also avoid the multiple places where checks for it were added has been punted from this change.

carenas · 2022-04-18T22:28:10Z

src/pcre2_compile.c

-if (errorptr == NULL || erroroffset == NULL) return NULL;
-*errorptr = ERR0;
+if (erroroffset == NULL) return NULL;
+if (errorptr) *errorptr = ERR0;


noticed this (setting the error to ERR0 unless there was an error) is not documented, could it be dropped? (meaning errorptr content is indeterminate unless return is NULL).

forcefully setting the error offset to 0 is a little unusual, and might be ok to leave as an undocumented "feature" for historical reasons, but reminds me of another undocumented feature I recently found while migrating a codebase from PCRE to PCRE2, where the code was relying in the error offset from pcreexec() being returned with sub[0] and that was no longer valid in the equivalent pcre2 call.

FWIW I think the addition of pcre2_get_startchar() make the overall API cleaner so not complaining about that one, but surelly this one does allow for some overly clever ways to initialize variables as shown by my test code in #51 (comment)

zherczeg · 2022-04-19T05:08:01Z

As for me these are compiler things, so I would not mind if both would be optional (negligible perf cost). But it is also true that only a few applications would benefit of this (testing programs, where a test case is expected to be pass or fuzzers), and even on that cases it is good if the error is displayed in some way.

PhilipHazel · 2022-04-19T17:02:32Z

There are lots of places where *errorptr=something occurs, though I suppose, if errorptr is passed in as NULL, it could be set to point to some dummy local variable. I suppose it is valid to allow callers not to be interested in what went wrong. Or indeed where the error was (the offset). Incidentally, I would always write "if (errorptr != NULL)" rather than "if (errorptr)" just in case NULL is not zero (I know it usually is), but this also makes it clear that errorptr is not a Boolean value.

carenas · 2022-04-19T20:10:03Z

Agree with Zoltan that a call where the error parameters are ignored is not very useful, so maybe "encouraging" their use by preventing them to be NULL is the right approach and all that is remaining is to document that failure with something like the following (untested)?:

diff --git a/doc/pcre2_compile.3 b/doc/pcre2_compile.3
index 58a60c1d..337b0bdb 100644
--- a/doc/pcre2_compile.3
+++ b/doc/pcre2_compile.3
@@ -83,6 +83,17 @@ function.
 The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 .P
+In that last case, a textual representation of the problem could be
+obtained using the value returned in the \fIerrorcode\fP parameter
+through the HREF \fBpcre2_get_error_message\fP function.
+.P
+The exact offset (in code points) where the error was encountered should
+be also available from the value returned in the \fIerroroffset\fP parameter.
+.P
+If no error was found, the values of both parameters is undeterminated, and
+if any of those 2 parameters was NULL the compilation won't be attempted
+and NULL will be returned from this function.
+.P
 There is a complete description of the PCRE2 native API, with more detail on
 each option, in the
 .\" HREF

PhilipHazel · 2022-04-22T09:41:59Z

I have updated the pcre2_compile.3 man page (and matching HTML) with some wording equivalent to your suggestion (for which thanks). The pcre2api page already had the information. I have added the fact that both values are set 0 when there is no error. There doesn't seem any reason not to document that fact.

carenas · 2022-04-22T16:31:02Z

I have added the fact that both values are set 0 when there is no error.

Except they are not (apologies since my original reporting was confusing), the errorcode is set to ERR0 which is a somehow arbitrary value of 100.

IMHO it be better left as undocumented, but eitherway the last version of documentation should be updated.

There doesn't seem any reason not to document that fact.

Leaving them undocumented (as as originally spelled in the pcre2api page, sorry for missing that) has the advantage that we can change it later as needed without having to worry about backward compatibility.

It might be also useful when the values used have odd values (like in the case of the error value which could be only interpreted externally as PCRE2_ERROR_END_BACKSLASH - 1). I don't think adding an additional define that could be used to represent 100 would be of significant value either.

Documenting the error offset as set to 0 might be ok, but was thinking it might be more useful as the size of the pattern read instead, so the next call to pcre2_compile that reuses the pattern could be done with it instead of maybe using PCRE2_ZERO_TERMINATED twice.

Obviously that would be an unusual optimization and maybe worth leaving undocumented as well.

PhilipHazel · 2022-04-22T16:56:29Z

Oops. Silly me. Apologies for my incompetence and thanks for checking. I have amended the docs to say that the value in errorcode will get the message "no error" if passed to pcre2_get_error_message(), and erroroffset is zero. I cannot now remember why the errorcodes start at 100. There must have been some reason (possibly I was going to use lower values for the UTF errors, but then went negative instead).

…

On Fri, 22 Apr 2022 at 17:31, Carlo Marcelo Arenas Belón < ***@***.***> wrote: I have added the fact that both values are set 0 when there is no error. Except they are not (apologies since my original reporting was confusing), the errorcode is set to ERR0 which is a somehow arbitrary value of 100. IMHO it be better left as undocumented, but eitherway the last version of documentation should be updated. There doesn't seem any reason not to document that fact. Leaving them undocumented (as as originally spelled in the pcre2api page, sorry for missing that) has the advantage that we can change it later as needed without having to worry about backward compatibility. It might be also useful when the values used have odd values (like in the case of the error value which could be only interpreted externally as PCRE2_ERROR_END_BACKSLASH - 1. I don't think adding an additional define that could be used to represent 100 would be of significant value either. Documenting the error offset as set to 0 might be ok, but was thinking it might be more useful as the size of the pattern read instead, so the next call to pcre2_match could be done with it instead of maybe using PCRE2_ZERO_TERMINATED twice. Obviously that would be an unusual optimization and maybe worth leaving undocumented as well. — Reply to this email directly, view it on GitHub <#102 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG4QUAB44I2EGYRQXOVOOGLVGLH5DANCNFSM5TXAFAWQ> . You are receiving this because you commented.Message ID: ***@***.***>

carenas · 2022-04-22T18:06:08Z

maybe worth reconsidering for the next iteration of the API, but since this is now the PCRE2 project better close for now

carenas commented Apr 18, 2022

View reviewed changes

carenas closed this Apr 22, 2022

SolitaryGrass mentioned this pull request May 31, 2023

internal_dfa_match, a stack overflow occurred due to recursive calls. #258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pcre2_compile: make errorcode parameter optional #102

pcre2_compile: make errorcode parameter optional #102

carenas commented Apr 18, 2022

carenas Apr 18, 2022 •

edited

Loading

zherczeg commented Apr 19, 2022

PhilipHazel commented Apr 19, 2022

carenas commented Apr 19, 2022 •

edited

Loading

PhilipHazel commented Apr 22, 2022

carenas commented Apr 22, 2022 •

edited

Loading

PhilipHazel commented Apr 22, 2022 via email

carenas commented Apr 22, 2022

pcre2_compile: make errorcode parameter optional #102

pcre2_compile: make errorcode parameter optional #102

Conversation

carenas commented Apr 18, 2022

carenas Apr 18, 2022 • edited Loading

Choose a reason for hiding this comment

zherczeg commented Apr 19, 2022

PhilipHazel commented Apr 19, 2022

carenas commented Apr 19, 2022 • edited Loading

PhilipHazel commented Apr 22, 2022

carenas commented Apr 22, 2022 • edited Loading

PhilipHazel commented Apr 22, 2022 via email

carenas commented Apr 22, 2022

carenas Apr 18, 2022 •

edited

Loading

carenas commented Apr 19, 2022 •

edited

Loading

carenas commented Apr 22, 2022 •

edited

Loading