-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot build with PCRE2_CODE_UNIT_WIDTH=0 #218
Comments
I'm afraid you have misunderstood - perhaps the documentation isn't clear (I have made a note to look at this point in due course). When building PCRE2 you must set PCRE2_CODE_UNIT_WIDTH to 8, 16, or 32. Each setting builds a different library. If you have an application that needs to use more than one PCRE2 library (e.g. 8-bit and 16-bit), then you can set PCRE2_CODE_UNIT_WIDTH to zero in your application, before including pcre2.h. The application must then refer to functions by their full names, e.g. pcre2_compile_8() instead of the generic pcre2_compile() and must be linked with all relevant PCRE2 libraries. The bottom line is that there is no PCRE2 library that handles more than one code unit width. (One day there might be; I have had some discussions with a user about this.) |
I see, thanks. So for now we'll need to build both pcre2-8 and pcre2-16 separately as libraries and link both. In all fairness, Maybe the .c source files could give an error as well if |
I wasn't confused about how
@PhilipHazel: What sort of discussion have you had and is there someplace they can be found online? Thanks. |
It was a series of private emails, I'm afraid. The other party is otherwise occupied at the moment, so the "project" (if that's not too grand a word) is currently on hold for a couple of months. The discussions were very preliminary, but the rough idea was to have another library where the compile and match functions have another parameter, specifying the code unit width. The compiled code would always look like the current 32-bit code. Obviously, checking the width every time the code needs to load a character will slow things down. Converting the strings before starting - which of course can be done externally now - is not right because some subject strings are very, very long. PCRE2 already loads characters using macros (e.g. GETCHAR, see pcre2_intmodedep.h). The first step will be to have different versions for compile and match. A major decision will be whether to add an argument to the compile/match functions, or to retain compatibility and have an "extra options" setting. |
That seems like a very specialized set of use cases. Thanks for the quick feedback. |
Managed to solve this in the end by compiling the sources twice, once for 8 bit support and 16 bit, and putting them into separate object files, before linking everything together. So it wasn't too complicated to compile in a similar way to how we did it with pcre1, and now we can easily use both 8 bit and 16 bit functions. I can close this now, unless you'd like to keep this around as a reminder for looking at the documentation as you mentioned. For the library with multiple encoding support, I agree it sounds quite specialised. In our case, we have separate code paths for 8 bit and 16 bit so we use our own logic for deciding which one to use. I think it would be a shame to break the api again for a feature like this which seems like it would be used in fairly rare cases, but I'm not sure how common it is for other applications to want to use pcre with multiple string encodings. |
Not sure if you updated the docu yet, but I just fell into the same trap. Which means that it's still ambiguous, even though I read in several places, after a search for PCRE2_CODE_UNIT_WIDTH in all file types. The docu led me to believe that PCRE2 will always contain 8, 16 and 32 bit versions of all functions and that PCRE2_CODE_UNIT_WIDTH just adds convenience macros for your "favorite" bit width. And that PCRE2_CODE_UNIT_WIDTH=0 just omits those convenience macros. And even though I now compile with PCRE2_CODE_UNIT_WIDTH=16, it still has the pcre2_..._8 types. |
I think I did some work on the documentation, which is why I closed the issue. I've just had another check, and it seems to be quite clear. For example, in the "pcre2" man page it says "The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit, or 32-bit code units, which means that up to three separate libraries may be installed, one for each code unit size." And in "pcre2build" it says "By default, a library called libpcre2-8 is built, containing functions that take string arguments contained in arrays of bytes, interpreted either as single-byte characters, or UTF-8 strings. You can also build two other libraries, called libpcre2-16 and libpcre2-32, which process strings that are contained in arrays of 16-bit and 32-bit code units, respectively." If you still think there is a lack of clarity about the current state of the documentation in HEAD, please say so, and I will try to improve it. |
@PhilipHazel: That documentation seems quite clear however, I believe the issue is with respect to Maybe have a summary somewhere of what acceptable values for
and of course state it defaults to |
Yes, it was the scattered documentation of PCRE2_CODE_UNIT_WIDTH, which confused me. Also, in regard to the last sentence in my last comment: If building with PCRE2_CODE_UNIT_WIDTH=16 only generates libpcre2-16, then why do I still have the the pcre2_..._8 types? Meanwhile I noticed that only the declarations are there, not the function definitions. Still a bit weird and adding to the confusion. |
Anybody building using Autotools or Cmake doesn't need to know about PCRE2_CODE_UNIT_WIDTH, because those tools handle it for you. For those that build "by hand", the NON-AUTOTOOLS-BUILD file has, I hope, reasonable instructions. Users of PCRE2 do, of course, need to know about it. I would hope that users would look at the "pcre2api" man page, where it says "Many applications use only one code unit width. For their convenience, macros are defined whose names are the generic forms such as pcre2_compile() and PCRE2_SPTR. These macros use the value of the macro PCRE2_CODE_UNIT_WIDTH to generate the appropriate width-specific function and macro names. PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it to be 8, 16, or 32 before including pcre2.h in order to make use of the generic names." Note that, in contradiction to @Uzume 's comment above, there is no default value for PCRE2_CODE_UNIT_WIDTH. Interesting that it's taken 8 years for this issue to be raised. The fact that there is only one pcre2.h (rather than pcre2-8.h, etc) makes is easier to maintain and at the time of creating PCRE2 from PCRE1 it was the straightforward thing to do. I am, incidentally, in the process of arranging for a compiler error if an attempt is made to compile any of the library modules with an invalid value for PCRE2_CODE_UNIT_WIDTH. |
@PhilipHazel: The default of |
Hi, I'm porting a code base which used to use pcre1. It used to compile in pcre_* files along with pcre16_* files as it can use both string encodings. The files are unified in prce2, so pcre2 says
Use 8, 16, or 32; or 0 for a multi-width application.
. However, compiling with PCRE2_CODE_UNIT_WIDTH=0 causes compilation errors:I've followed the
NON-AUTOTOOLS-BUILD
guide, using the generic config.h along with:-DHAVE_CONFIG_H
-DPCRE2_STATIC
-DPCRE2_CODE_UNIT_WIDTH=0
-DSUPPORT_PCRE2_8
-DSUPPORT_PCRE2_16
-DSUPPORT_UNICODE
.Not sure if I've missed something.
The text was updated successfully, but these errors were encountered: