-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Missing escape of char
literal in interpolation
#861
Comments
Hmm... Maybe I should be using |
Looks like it's not disabling string quoting inside the capture. You get the same result with just this:
|
Documentation might help to build an intuition on how this should work. |
Character literals such as |
char
literal in UFCS in interpolationchar
literal in interpolation
It looks like this behavior was added in #464 and #481. The block of code that does it is here: https://github.com/hsutter/cppfront/blame/main/source/lex.h#L430-L436 I'm not sure why we're removing those When I comment out that block, all regression tests pass unchanged, and the code reported in this issue is fixed. |
It was to fix #461 |
Here's the code from #461, showing the state before #461 was fixed:
My question is, should those last two have to escape inside the captures? Because what's in the captures should be an expression, and In other words, the capture source expression is not part of the string itself (and therefore shouldn't be escaped) -- its evaluation is part of the string (after being passed through So I would instead expect this behavior:
Before I make that change, does that make sense to this group? |
The comments look wrong to me.
If this is correct, it's the behavior I expect. |
Ah, I think you're right about those last two, thanks. In particular, these two should be the same and print
And these two should be the same and print
Right? Let me try a few other cases...
Do all these look correct now? |
Yes. |
I'm a bit unsure about how unquoted double quotes should interact with interpolation captures.
Should this be interpreted as the string |
OK, I'm just going to fix this immediate Issue. However, I would still like to get to where Of course, you can see above that my wish is to lex |
Right you are. I think you just identified the same issue in a racing comment while I was writing the above. 👍 |
Is that an intentional change from Cpp1? https://en.cppreference.com/w/cpp/language/string_literal
https://www.godbolt.org/z/j1v6qrK81
|
Yes. I've considered allowing that, but haven't gone there yet. If Cpp2 did allow merging consecutive string literal tokens, though, it would be in the grammar and not in the preprocessor. Updated to add: I see in the quote above it says "phase 6 (after the preprocessor)" but I don't think that parenthetical is quite right, because it's not until phase 7 that "each preprocessing token is converted into a token." So up to the end of phase 6 we still have only preprocessing tokens. |
If cppfront doesn't support that, then should there be a "how to fix it" message? |
I read "preprocessing tokens" as the output of the preprocessor. Phase 4 is the preprocessor, phase 5 is determining the common encoding for adjacent string literals, phase 6 is concatenating those adjacent string literals, and phase 7 is lexing, so I think "after the preprocessor" is accurate. The handling is covered in |
Is the problem with
Please, resist. I really liked where this way going, How about alternatives? |
No, I don't think so. The problem is that the first non-escaped |
Can we get that behavior by changing the grammar of string-literal and extending [lex.phases]? //G string-literal:
//G encoding-prefix? '"' s-char-seq? '"'
//G encoding-prefix? 'R"' d-char-seq? '(' s-char-seq? ')' d-char-seq? '"'
//G
//G s-char-seq:
//G interpolation? s-char
//G interpolation? s-char-seq s-char
//G
//G d-char-seq:
//G d-char
//G
//G interpolation:
//G '(' expression ')' '$' The idea is for |
That would still change the meaning of these well-formed lines of Cpp1 code.
|
I'm fine with that, for now. |
OK, let's not quickly switch from suffix I'm not too concerned about having a string literal with interpolation change meaning from Cpp1... they already do that for every case (e.g., I still want to keep lexing simple (modulo the backtrack we need for interpolation right now). I'll think about it some more... |
I'm not too concerned about a single string either, but eliminating the need to escape the quotes can change from multiple strings to a single fixed string, and thus are much more concerning.
|
For now the status quo is:
I think that part is fine then until we get more new information. For consecutive string literals: I see their absence as a good simplification in general. But it's true that sometimes it's useful to break a string literal across source lines, and interpolation does make for longer (if fewer) string literals. My thought there was to recognize it grammatically but require
and the usual fix is to write That's been my thought anyway. How does that strike you? |
This is a runtime concat. To preserve the current behavior, you would need to convert it to this compile-time concat.
Something to watch out for however is that cpp1 supports putting a literal operator on the end of the final string literal, since the strings are combined before tokenization:
|
An interpolation with |
Title: Missing escape of
char
literal in UFCS in interpolation.Minimal reproducer (https://cpp2.godbolt.org/z/r1f73EP85):
Commands:
cppfront main.cpp2 clang++18 -std=c++23 -stdlib=libc++ -lc++abi -pedantic-errors -Wall -Wextra -Wconversion -Werror=unused-result -Werror=unused-value -Werror=unused-parameter -I . main.cpp
Expected result:
static_cast<void>(cpp2::to_string((static_cast<void>(CPP2_UFCS(split, std::move(s), '\n')), 0)));
.Actual result and error:
static_cast<void>(cpp2::to_string((static_cast<void>(CPP2_UFCS(split, std::move(s), 'n')), 0)));
.Cpp2 lowered to Cpp1:
The text was updated successfully, but these errors were encountered: