Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcre2_compile: make errorcode parameter optional #102

Closed
wants to merge 1 commit into from

Conversation

carenas
Copy link
Contributor

@carenas carenas commented Apr 18, 2022

Not sure why the interface was designed this way, but it would seem that it was evolved from the one available in the original PCRE and that was using instead a text buffer to hold the error message directly.

While the use of both and error code and an error offset is useful and might be encouraged, shouldn't be required IMHO so this change tries to loosen that rule by allowing a NULL error, and showing how it will be used with some code from our own codebase.

With the risk of starting a bike shedding discussion, I would assume that making the offset ALSO optional might be useful, but had punted that change for the future once it would be discussed better.

There is also the posibility of an ABI breaking change that would either consolidate this 2 values (ex: using the offset with negative values of a ssize_t) for errors like is done in other functions, or creating an opaque parameter to replace this two but it is obviously not something to consider until pcre3

When the pcre2_compile interface was introduced, code was added to check
that BOTH the errorcode and erroroffset were not NULL, but usually only
one or the other is needed, in adition to a NULL return value to detect
failures.

Make the minimum change possible to allow passing a NULL and simplify
some of the calls used in pcre2_jit_test that will benefit from this
change.

A more complex change that would also avoid the multiple places where
checks for it were added has been punted from this change.
if (errorptr == NULL || erroroffset == NULL) return NULL;
*errorptr = ERR0;
if (erroroffset == NULL) return NULL;
if (errorptr) *errorptr = ERR0;
Copy link
Contributor Author

@carenas carenas Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed this (setting the error to ERR0 unless there was an error) is not documented, could it be dropped? (meaning errorptr content is indeterminate unless return is NULL).

forcefully setting the error offset to 0 is a little unusual, and might be ok to leave as an undocumented "feature" for historical reasons, but reminds me of another undocumented feature I recently found while migrating a codebase from PCRE to PCRE2, where the code was relying in the error offset from pcreexec() being returned with sub[0] and that was no longer valid in the equivalent pcre2 call.

FWIW I think the addition of pcre2_get_startchar() make the overall API cleaner so not complaining about that one, but surelly this one does allow for some overly clever ways to initialize variables as shown by my test code in #51 (comment)

@zherczeg
Copy link
Collaborator

As for me these are compiler things, so I would not mind if both would be optional (negligible perf cost). But it is also true that only a few applications would benefit of this (testing programs, where a test case is expected to be pass or fuzzers), and even on that cases it is good if the error is displayed in some way.

@PhilipHazel
Copy link
Collaborator

There are lots of places where *errorptr=something occurs, though I suppose, if errorptr is passed in as NULL, it could be set to point to some dummy local variable. I suppose it is valid to allow callers not to be interested in what went wrong. Or indeed where the error was (the offset). Incidentally, I would always write "if (errorptr != NULL)" rather than "if (errorptr)" just in case NULL is not zero (I know it usually is), but this also makes it clear that errorptr is not a Boolean value.

@carenas
Copy link
Contributor Author

carenas commented Apr 19, 2022

Agree with Zoltan that a call where the error parameters are ignored is not very useful, so maybe "encouraging" their use by preventing them to be NULL is the right approach and all that is remaining is to document that failure with something like the following (untested)?:

diff --git a/doc/pcre2_compile.3 b/doc/pcre2_compile.3
index 58a60c1d..337b0bdb 100644
--- a/doc/pcre2_compile.3
+++ b/doc/pcre2_compile.3
@@ -83,6 +83,17 @@ function.
 The yield of this function is a pointer to a private data structure that
 contains the compiled pattern, or NULL if an error was detected.
 .P
+In that last case, a textual representation of the problem could be
+obtained using the value returned in the \fIerrorcode\fP parameter
+through the HREF \fBpcre2_get_error_message\fP function.
+.P
+The exact offset (in code points) where the error was encountered should
+be also available from the value returned in the \fIerroroffset\fP parameter.
+.P
+If no error was found, the values of both parameters is undeterminated, and
+if any of those 2 parameters was NULL the compilation won't be attempted
+and NULL will be returned from this function.
+.P
 There is a complete description of the PCRE2 native API, with more detail on
 each option, in the
 .\" HREF

@PhilipHazel
Copy link
Collaborator

I have updated the pcre2_compile.3 man page (and matching HTML) with some wording equivalent to your suggestion (for which thanks). The pcre2api page already had the information. I have added the fact that both values are set 0 when there is no error. There doesn't seem any reason not to document that fact.

@carenas
Copy link
Contributor Author

carenas commented Apr 22, 2022

I have added the fact that both values are set 0 when there is no error.

Except they are not (apologies since my original reporting was confusing), the errorcode is set to ERR0 which is a somehow arbitrary value of 100.

IMHO it be better left as undocumented, but eitherway the last version of documentation should be updated.

There doesn't seem any reason not to document that fact.

Leaving them undocumented (as as originally spelled in the pcre2api page, sorry for missing that) has the advantage that we can change it later as needed without having to worry about backward compatibility.

It might be also useful when the values used have odd values (like in the case of the error value which could be only interpreted externally as PCRE2_ERROR_END_BACKSLASH - 1). I don't think adding an additional define that could be used to represent 100 would be of significant value either.

Documenting the error offset as set to 0 might be ok, but was thinking it might be more useful as the size of the pattern read instead, so the next call to pcre2_compile that reuses the pattern could be done with it instead of maybe using PCRE2_ZERO_TERMINATED twice.

Obviously that would be an unusual optimization and maybe worth leaving undocumented as well.

@PhilipHazel
Copy link
Collaborator

PhilipHazel commented Apr 22, 2022 via email

@carenas
Copy link
Contributor Author

carenas commented Apr 22, 2022

maybe worth reconsidering for the next iteration of the API, but since this is now the PCRE2 project better close for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants