-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pcre2_compile: make errorcode parameter optional #102
Conversation
When the pcre2_compile interface was introduced, code was added to check that BOTH the errorcode and erroroffset were not NULL, but usually only one or the other is needed, in adition to a NULL return value to detect failures. Make the minimum change possible to allow passing a NULL and simplify some of the calls used in pcre2_jit_test that will benefit from this change. A more complex change that would also avoid the multiple places where checks for it were added has been punted from this change.
if (errorptr == NULL || erroroffset == NULL) return NULL; | ||
*errorptr = ERR0; | ||
if (erroroffset == NULL) return NULL; | ||
if (errorptr) *errorptr = ERR0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed this (setting the error to ERR0 unless there was an error) is not documented, could it be dropped? (meaning errorptr content is indeterminate unless return is NULL).
forcefully setting the error offset to 0 is a little unusual, and might be ok to leave as an undocumented "feature" for historical reasons, but reminds me of another undocumented feature I recently found while migrating a codebase from PCRE to PCRE2, where the code was relying in the error offset from pcreexec() being returned with sub[0] and that was no longer valid in the equivalent pcre2 call.
FWIW I think the addition of pcre2_get_startchar() make the overall API cleaner so not complaining about that one, but surelly this one does allow for some overly clever ways to initialize variables as shown by my test code in #51 (comment)
As for me these are compiler things, so I would not mind if both would be optional (negligible perf cost). But it is also true that only a few applications would benefit of this (testing programs, where a test case is expected to be pass or fuzzers), and even on that cases it is good if the error is displayed in some way. |
There are lots of places where *errorptr=something occurs, though I suppose, if errorptr is passed in as NULL, it could be set to point to some dummy local variable. I suppose it is valid to allow callers not to be interested in what went wrong. Or indeed where the error was (the offset). Incidentally, I would always write "if (errorptr != NULL)" rather than "if (errorptr)" just in case NULL is not zero (I know it usually is), but this also makes it clear that errorptr is not a Boolean value. |
Agree with Zoltan that a call where the error parameters are ignored is not very useful, so maybe "encouraging" their use by preventing them to be NULL is the right approach and all that is remaining is to document that failure with something like the following (untested)?:
|
I have updated the pcre2_compile.3 man page (and matching HTML) with some wording equivalent to your suggestion (for which thanks). The pcre2api page already had the information. I have added the fact that both values are set 0 when there is no error. There doesn't seem any reason not to document that fact. |
Except they are not (apologies since my original reporting was confusing), the errorcode is set to ERR0 which is a somehow arbitrary value of 100. IMHO it be better left as undocumented, but eitherway the last version of documentation should be updated.
Leaving them undocumented (as as originally spelled in the pcre2api page, sorry for missing that) has the advantage that we can change it later as needed without having to worry about backward compatibility. It might be also useful when the values used have odd values (like in the case of the error value which could be only interpreted externally as PCRE2_ERROR_END_BACKSLASH - 1). I don't think adding an additional define that could be used to represent 100 would be of significant value either. Documenting the error offset as set to 0 might be ok, but was thinking it might be more useful as the size of the pattern read instead, so the next call to pcre2_compile that reuses the pattern could be done with it instead of maybe using PCRE2_ZERO_TERMINATED twice. Obviously that would be an unusual optimization and maybe worth leaving undocumented as well. |
Oops. Silly me. Apologies for my incompetence and thanks for checking. I
have amended the docs to say that the value in errorcode will get the
message "no error" if passed to pcre2_get_error_message(), and erroroffset
is zero. I cannot now remember why the errorcodes start at 100. There must
have been some reason (possibly I was going to use lower values for the UTF
errors, but then went negative instead).
…On Fri, 22 Apr 2022 at 17:31, Carlo Marcelo Arenas Belón < ***@***.***> wrote:
I have added the fact that both values are set 0 when there is no error.
Except they are not (apologies since my original reporting was confusing),
the errorcode is set to ERR0 which is a somehow arbitrary value of 100.
IMHO it be better left as undocumented, but eitherway the last version of
documentation should be updated.
There doesn't seem any reason not to document that fact.
Leaving them undocumented (as as originally spelled in the pcre2api page,
sorry for missing that) has the advantage that we can change it later as
needed without having to worry about backward compatibility.
It might be also useful when the values used have odd values (like in the
case of the error value which could be only interpreted externally as
PCRE2_ERROR_END_BACKSLASH - 1. I don't think adding an additional define
that could be used to represent 100 would be of significant value either.
Documenting the error offset as set to 0 might be ok, but was thinking it
might be more useful as the size of the pattern read instead, so the next
call to pcre2_match could be done with it instead of maybe using
PCRE2_ZERO_TERMINATED twice.
Obviously that would be an unusual optimization and maybe worth leaving
undocumented as well.
—
Reply to this email directly, view it on GitHub
<#102 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG4QUAB44I2EGYRQXOVOOGLVGLH5DANCNFSM5TXAFAWQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
maybe worth reconsidering for the next iteration of the API, but since this is now the PCRE2 project better close for now |
Not sure why the interface was designed this way, but it would seem that it was evolved from the one available in the original PCRE and that was using instead a text buffer to hold the error message directly.
While the use of both and error code and an error offset is useful and might be encouraged, shouldn't be required IMHO so this change tries to loosen that rule by allowing a NULL error, and showing how it will be used with some code from our own codebase.
With the risk of starting a bike shedding discussion, I would assume that making the offset ALSO optional might be useful, but had punted that change for the future once it would be discussed better.
There is also the posibility of an ABI breaking change that would either consolidate this 2 values (ex: using the offset with negative values of a ssize_t) for errors like is done in other functions, or creating an opaque parameter to replace this two but it is obviously not something to consider until pcre3