Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String literals #199

Merged
merged 27 commits into from
Mar 1, 2021
Merged

String literals #199

merged 27 commits into from
Mar 1, 2021

Conversation

zygoloid
Copy link
Contributor

@zygoloid zygoloid commented Nov 25, 2020

Lexical syntax for string literals.

@zygoloid zygoloid added WIP proposal A proposal labels Nov 25, 2020
@google-cla google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Nov 25, 2020
@zygoloid zygoloid added proposal rfc Proposal with request-for-comment sent out and removed WIP labels Dec 9, 2020
Copy link
Contributor

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts here.

proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. One thing I would note is where you put a multi-line example in the middle of a sentence, that's really hard to read -- we should probably avoid stuff like that in the actual design, but I'm trying not to nit too much on proposals.

proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
| `\n` | U+000A LINE FEED |
| `\v` | U+000B LINE TABULATION |
| `\f` | U+000C FORM FEED |
| `\r` | U+000D CARRIAGE RETURN |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone actually care about bell, backspace, line tabulation, form feed? It's some weird historical thing that we could encode as explicit names or numbers instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I've used both bell and backspace in simple terminal applications. I've never used, nor even seen, \v or \f. I wouldn't mind ditching some or all of these four. I also wouldn't mind adding \e for escape, which is a common extension in C-family languages. I'm running a poll in #syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to the poll and the observation that those escapes were dropped in both Swift and Rust, I've dropped them here too.

`\N{unicode character name}` syntax. We could add such an escape sequence, but
this proposal does not include one. Future proposals considering adding such
support should pay attention to work done by C++'s Unicode study group in this
area.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this as soon as C++ does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Do you think this document should call that out, or should we leave this for someone to propose for Carbon once C++ has committed to this direction?

### Block string literals

We could avoid including a block string literal in general, and instead
construct multi-line strings by string concatenation. But doing so would be more

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenation isn't really defined here. Do we want C++ style string concat? It seems kinda weird, compared to using + for concat everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want C++-style string concatenation, but I think the argument here applies regardless of whether we express string concatenation via juxtaposition or via +.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to explicitly say that this commentary is intended to cover both implicit concatenation via juxtaposition or explicit concatenation with +.

character is `T` and the last character is `e`.

- The opening `"""` of a multi-line string literal can be followed by a _file
type indicator_, to assist tooling in understanding the intent of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the file type indicator. Scope for syntax highlighters to clue off this and even work within strings unambiguously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

line. The text in between is not interpreted in any way. _N_ can be zero.

A _raw block string literal_ is expressed analogously to a raw string literal,
but for a block string literal. Escape sequences are ignored, but indentation is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth clarifying what "indentation" means in the context of things like:

var String: x = r"""
    this
   oops
    is
    indented
""";

Something like "indentation is inferred from the least non-zero number of whitespace characters on a line" ? (Wordsmithing is absolutley not my forte)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this text above in the "Non-raw string literals" section:

The indentation of a block string literal is the sequence of horizontal
whitespace preceding the closing """. Each non-empty content line shall begin
with the indentation of the string literal. The content of the literal is formed
by removing the indentation of the closing line from each (non-empty) content
line, replacing each escape sequence with the corresponding character sequence
or code unit sequence, and concatenating the results with a line feed character
(U+000A) added between each pair of lines.

... and what I mean here is "do that but don't remove escape sequences". I've rephrased to try to make it clearer that I'm referring back to that wording.

terminate each line. This would allow uncommon forms of vertical whitespace (for
example, vertical tab and form feed) to be included in raw string literals, but
would create a risk that the meaning of a program would be different when the
source code is checked out on an operating system that uses line feed as a line
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no real comment other than this has bitten me before, so +1 to thinking more about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged.

@google-cla
Copy link

google-cla bot commented Jan 29, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no PR does not meet CLA requirements according to bot. and removed cla: yes PR meets CLA requirements according to bot. labels Jan 29, 2021
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
proposals/p0199.md Outdated Show resolved Hide resolved
@google-cla
Copy link

google-cla bot commented Feb 3, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Feb 3, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Feb 3, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Feb 5, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Feb 5, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Feb 5, 2021

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

zygoloid and others added 2 commits February 12, 2021 12:22
precision. Define what a "non-empty content line" is rather than leaving
the reader to guess.

No change to the proposed direction is intended.
Comment on lines +525 to +526
line after the opening `"""` is seen. However, this adds lexical complexity, and
harms the ability to copy-paste string contents into other contexts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the copy-paste argument terribly persuasive, since even under the status quo proposal, multiline strings can't be copy-pasted with full fidelity unless the closing """ starts at column 0.

I'm much more persuaded by the following point: if leading whitespace followed by some marker character is removed, we will need some kind of escaping scheme in order to support string data whose lines are intended to start with that marker character, possibly preceded by whitespace. This could be fairly annoying for ordinary string literals, but it would create a particular dilemma for raw string literals: either we'd have to disable leading-whitespace removal for raw string literals (which would make leading-whitespace removal substantially less useful), or we'd have to support that escaping scheme in raw string literals (which would make raw string literals substantially less useful). Having leading whitespace removal be controlled from outside the literal content (or at least from the edge of the literal content) avoids that whole problem.

@jonmeow jonmeow added proposal accepted Decision made, proposal accepted and removed proposal rfc Proposal with request-for-comment sent out needs decision labels Feb 23, 2021
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted the decision in #280

@zygoloid zygoloid merged commit 85dca7f into carbon-language:trunk Mar 1, 2021
mmdriley pushed a commit that referenced this pull request Mar 10, 2021
@mmdriley mmdriley mentioned this pull request Mar 10, 2021
mmdriley added a commit that referenced this pull request Mar 10, 2021
PR #280 committed the decision for #199. #199 had the proposal and no decision
and modified `README.md`. #280 had a decision but no proposal, and *didn't*
update `README.md`. However, in a tree with both the proposal *and* its
decision, another modification is necessary.

Since only one PR modified `README.md`, there wasn't a merge conflict that
required a rebase. And since there was no rebase, each PR ran its own CI
blissfully unaware of the other.

For now, committing the results of `pre-commit run --all-files` on trunk.
Later, will look into ways to make the proposal+decision workflow less likely
to break trunk.
jonmeow added a commit that referenced this pull request Aug 27, 2021
I've modified the text from the proposal slightly, focusing the overview more on a design setup, but mostly kept the details. One important thing here is I noticed that raw tab characters are disallowed -- this was a little buried before, and I've now updated the list of characters allowed in a string to exclude tabs. Additionally, I've noted `\0D` in the list of escapes as explicitly invalid.

Co-authored-by: Richard Smith <[email protected]>
@zygoloid zygoloid deleted the proposal-string-literals branch March 11, 2022 00:55
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
Proposal as accepted by core team on 2021-02-23.
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
PR #280 committed the decision for #199. #199 had the proposal and no decision
and modified `README.md`. #280 had a decision but no proposal, and *didn't*
update `README.md`. However, in a tree with both the proposal *and* its
decision, another modification is necessary.

Since only one PR modified `README.md`, there wasn't a merge conflict that
required a rebase. And since there was no rebase, each PR ran its own CI
blissfully unaware of the other.

For now, committing the results of `pre-commit run --all-files` on trunk.
Later, will look into ways to make the proposal+decision workflow less likely
to break trunk.
chandlerc pushed a commit that referenced this pull request Jun 28, 2022
I've modified the text from the proposal slightly, focusing the overview more on a design setup, but mostly kept the details. One important thing here is I noticed that raw tab characters are disallowed -- this was a little buried before, and I've now updated the list of characters allowed in a string to exclude tabs. Additionally, I've noted `\0D` in the list of escapes as explicitly invalid.

Co-authored-by: Richard Smith <[email protected]>
@jonmeow jonmeow mentioned this pull request Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR meets CLA requirements according to bot. proposal accepted Decision made, proposal accepted proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants