-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String literals #199
String literals #199
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good. One thing I would note is where you put a multi-line example in the middle of a sentence, that's really hard to read -- we should probably avoid stuff like that in the actual design, but I'm trying not to nit too much on proposals.
proposals/p0199.md
Outdated
| `\n` | U+000A LINE FEED | | ||
| `\v` | U+000B LINE TABULATION | | ||
| `\f` | U+000C FORM FEED | | ||
| `\r` | U+000D CARRIAGE RETURN | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone actually care about bell, backspace, line tabulation, form feed? It's some weird historical thing that we could encode as explicit names or numbers instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I've used both bell and backspace in simple terminal applications. I've never used, nor even seen, \v
or \f
. I wouldn't mind ditching some or all of these four. I also wouldn't mind adding \e
for escape, which is a common extension in C-family languages. I'm running a poll in #syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In response to the poll and the observation that those escapes were dropped in both Swift and Rust, I've dropped them here too.
`\N{unicode character name}` syntax. We could add such an escape sequence, but | ||
this proposal does not include one. Future proposals considering adding such | ||
support should pay attention to work done by C++'s Unicode study group in this | ||
area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this as soon as C++ does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Do you think this document should call that out, or should we leave this for someone to propose for Carbon once C++ has committed to this direction?
proposals/p0199.md
Outdated
### Block string literals | ||
|
||
We could avoid including a block string literal in general, and instead | ||
construct multi-line strings by string concatenation. But doing so would be more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concatenation isn't really defined here. Do we want C++ style string concat? It seems kinda weird, compared to using +
for concat everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want C++-style string concatenation, but I think the argument here applies regardless of whether we express string concatenation via juxtaposition or via +
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to explicitly say that this commentary is intended to cover both implicit concatenation via juxtaposition or explicit concatenation with +
.
character is `T` and the last character is `e`. | ||
|
||
- The opening `"""` of a multi-line string literal can be followed by a _file | ||
type indicator_, to assist tooling in understanding the intent of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the file type indicator. Scope for syntax highlighters to clue off this and even work within strings unambiguously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
proposals/p0199.md
Outdated
line. The text in between is not interpreted in any way. _N_ can be zero. | ||
|
||
A _raw block string literal_ is expressed analogously to a raw string literal, | ||
but for a block string literal. Escape sequences are ignored, but indentation is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth clarifying what "indentation" means in the context of things like:
var String: x = r"""
this
oops
is
indented
""";
Something like "indentation is inferred from the least non-zero number of whitespace characters on a line" ? (Wordsmithing is absolutley not my forte)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this text above in the "Non-raw string literals" section:
The indentation of a block string literal is the sequence of horizontal
whitespace preceding the closing"""
. Each non-empty content line shall begin
with the indentation of the string literal. The content of the literal is formed
by removing the indentation of the closing line from each (non-empty) content
line, replacing each escape sequence with the corresponding character sequence
or code unit sequence, and concatenating the results with a line feed character
(U+000A) added between each pair of lines.
... and what I mean here is "do that but don't remove escape sequences". I've rephrased to try to make it clearer that I'm referring back to that wording.
terminate each line. This would allow uncommon forms of vertical whitespace (for | ||
example, vertical tab and form feed) to be included in raw string literals, but | ||
would create a risk that the meaning of a program would be different when the | ||
source code is checked out on an operating system that uses line feed as a line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no real comment other than this has bitten me before, so +1 to thinking more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
Co-authored-by: Jon Meow <[email protected]> Co-authored-by: Chandler Carruth <[email protected]>
longer supported (following Rust and Swift).
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
Co-authored-by: josh11b <[email protected]>
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
1 similar comment
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
Co-authored-by: Geoff Romer <[email protected]>
precision. Define what a "non-empty content line" is rather than leaving the reader to guess. No change to the proposed direction is intended.
line after the opening `"""` is seen. However, this adds lexical complexity, and | ||
harms the ability to copy-paste string contents into other contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the copy-paste argument terribly persuasive, since even under the status quo proposal, multiline strings can't be copy-pasted with full fidelity unless the closing """
starts at column 0.
I'm much more persuaded by the following point: if leading whitespace followed by some marker character is removed, we will need some kind of escaping scheme in order to support string data whose lines are intended to start with that marker character, possibly preceded by whitespace. This could be fairly annoying for ordinary string literals, but it would create a particular dilemma for raw string literals: either we'd have to disable leading-whitespace removal for raw string literals (which would make leading-whitespace removal substantially less useful), or we'd have to support that escaping scheme in raw string literals (which would make raw string literals substantially less useful). Having leading whitespace removal be controlled from outside the literal content (or at least from the edge of the literal content) avoids that whole problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted the decision in #280
PR #280 committed the decision for #199. #199 had the proposal and no decision and modified `README.md`. #280 had a decision but no proposal, and *didn't* update `README.md`. However, in a tree with both the proposal *and* its decision, another modification is necessary. Since only one PR modified `README.md`, there wasn't a merge conflict that required a rebase. And since there was no rebase, each PR ran its own CI blissfully unaware of the other. For now, committing the results of `pre-commit run --all-files` on trunk. Later, will look into ways to make the proposal+decision workflow less likely to break trunk.
I've modified the text from the proposal slightly, focusing the overview more on a design setup, but mostly kept the details. One important thing here is I noticed that raw tab characters are disallowed -- this was a little buried before, and I've now updated the list of characters allowed in a string to exclude tabs. Additionally, I've noted `\0D` in the list of escapes as explicitly invalid. Co-authored-by: Richard Smith <[email protected]>
Proposal as accepted by core team on 2021-02-23.
PR #280 committed the decision for #199. #199 had the proposal and no decision and modified `README.md`. #280 had a decision but no proposal, and *didn't* update `README.md`. However, in a tree with both the proposal *and* its decision, another modification is necessary. Since only one PR modified `README.md`, there wasn't a merge conflict that required a rebase. And since there was no rebase, each PR ran its own CI blissfully unaware of the other. For now, committing the results of `pre-commit run --all-files` on trunk. Later, will look into ways to make the proposal+decision workflow less likely to break trunk.
I've modified the text from the proposal slightly, focusing the overview more on a design setup, but mostly kept the details. One important thing here is I noticed that raw tab characters are disallowed -- this was a little buried before, and I've now updated the list of characters allowed in a string to exclude tabs. Additionally, I've noted `\0D` in the list of escapes as explicitly invalid. Co-authored-by: Richard Smith <[email protected]>
Lexical syntax for string literals.