Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all escape character sequences are highlighted #4

Open
geekley opened this issue Apr 13, 2022 · 0 comments
Open

Not all escape character sequences are highlighted #4

geekley opened this issue Apr 13, 2022 · 0 comments

Comments

@geekley
Copy link

geekley commented Apr 13, 2022

Reporting upstream from: MrOrz/vscode-gettext#25


Please make escape sequences in strings other than \" be highlighted as constant.character.escape.po as well.

https://github.com/textmate/gettext.tmbundle/blob/master/Syntaxes/Gettext.tmLanguage#L108
https://github.com/textmate/gettext.tmbundle/blob/master/Syntaxes/Gettext.tmLanguage#L159
https://github.com/textmate/gettext.tmbundle/blob/master/Syntaxes/Gettext.tmLanguage#L210

At least the most important (\n) should appear, but the spec defines it as following the C syntax.
Escape characters can be misleading as you can see ("False" below), so it's important that they're represented accurately.

PO spec from: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and embedded backslashed escape sequences.

As listed by: https://en.cppreference.com/w/c/language/escape
also https://docs.microsoft.com/en-us/cpp/c-language/escape-sequences?view=msvc-170#escape-sequences-1
literal escapes: \' \" \? \\ \a \b \f \n \r \t \v
octal regex: \\\[0-7]{1,3} e.g.: \0 \77 \123
hex regex: \\x[0-9A-Fa-f]+ e.g.: \x0 \xabc \x0False = '\x0FA' + "lse"
unicode regex (since C99): \\u[0-9A-Fa-f]{4}|\\U[0-9A-Fa-f]{8} e.g: \u000a \U0000000D

Also, a single \ can appear at the end of a line to form a line-continuation, even inside string literals.
https://en.cppreference.com/w/c/language/translation_phases

Whenever backslash appears at the end of a line (immediately followed by the newline character), both backslash and newline are deleted, combining two physical source lines into one logical source line. This is a single-pass operation: a line ending in two backslashes followed by an empty line does not combine three lines [sic] into one.

https://docs.microsoft.com/en-us/cpp/c-language/c-string-literals?view=msvc-170#remarks

When a backslash appears at the end of a line, it is always interpreted as a line-continuation character.


Btw, I haven't tested any of these in actual gettext tools, I'm just making assumptions based on the spec. I can't confirm whether these actually work (e.g. \ at end of line or the unicode \u \U sequences, or anything else). Testing to confirm would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant