-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCRE2_UTF flag detects U+202F as Mongolian #118
Comments
It appears that Mongolian exists in the list of script extensions for U+202F. Here is output from the ucptest program: $ ./ucptest 202f Perl also recognizes U+202F as Mongolian. The Unicode file ScriptExtensions.txt from which PCRE2 gets its data contains this: 202F ; Latn Mong # Zs NARROW NO-BREAK SPACE So it looks like this is deliberate on the part of Unicode. I am therefore closing this as invalid. |
Thank you for your reply. |
Note that \p{Mong} works like \p{scx:Mong}, that is, it checks both the script and the script extensions. If you want to test just the script, use \p{sc:Mong}. |
Thank you for your advice. I'll use \p{sc:Xxx} as appropriate. Arrêt Nation – Voltaire [56] Pasta & Tapas Pietro 池袋店 |
if pcre2_compile() called with PCRE2_UTF option, U+202F(NARROW NO-BREAK SPACE) is detected as Mongolian.
pcre2grep with -u option occurs this error.
sample.text is as below.
command is as below.
pcre2grep -u '\p{Mongolian}' sample.text
output is as below.
The text was updated successfully, but these errors were encountered: