Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments #198

Merged
merged 18 commits into from
Mar 1, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions proposals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,6 @@ request:
- [Decision](p0143_decision.md)
- [0149 - Change documentation style guide](p0149.md)
- [Decision](p0149_decision.md)
- [0198 - Comments](p0198.md)

<!-- endproposals -->
374 changes: 374 additions & 0 deletions proposals/p0198.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,374 @@
# Comments

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/198)

<!-- toc -->

## Table of contents

- [Problem](#problem)
- [Background](#background)
- [Line comments](#line-comments)
- [Block comments](#block-comments)
- [`#if 0`](#if-0)
- [Proposal](#proposal)
- [Details](#details)
- [Overview](#overview)
- [Text comments](#text-comments)
- [Block comments](#block-comments-1)
- [Block comments rationale](#block-comments-rationale)
- [Reserved comments](#reserved-comments)
- [Reserved comment introducers rationale](#reserved-comment-introducers-rationale)
- [Alternatives considered](#alternatives-considered)
- [Intra-line comments](#intra-line-comments)
- [Multi-line text comments](#multi-line-text-comments)
- [Block comment alternatives](#block-comment-alternatives)
- [Documentation comments](#documentation-comments)

<!-- tocstop -->

## Problem

This proposal provides a suggested concrete lexical syntax for comments.

## Background

This proposal assumes the purpose and necessity of a comment syntax is
understood and uncontroversial.

In C++, there are three different ways in which comments are expressed in
practice:

### Line comments

Single-line comments (and sometimes multiline comments) are expressed in C++
using `// ...`:

```
// The next line declares a variable.
int n; // This is a comment about 'n'.
```

(These are sometimes called "BCPL comments".)

- Can appear anywhere (at the start of a line or after tokens).
- Can contain any text (other than newline).
- End at the end of the logical line.
- Can be continued by ending the comment with `\` (or `??/` in C++14 and
earlier).
- Unambiguous with non-comment syntax.
- "Nest" in that `//` within `//` has no effect.
- Do not nest with other kinds of comment.

### Block comments

Comments within lines (or sometimes multiline comments) are expressed in C++
using `/*...*/`:

```
f(/*size*/5, /*initial value*/1);
```

- Can appear anywhere (at the start of a line or after tokens).
- Can contain any text (other than `*/`).
- End at the `*/` delimiter (which might be separated by a `\` line
continuation).
- Ambiguous with non-comment syntax: `int a=1, *b=&a, c=a/*b;` though this is
not a problem in practice.
- Do not nest -- the first `*/` ends the comment.

### `#if 0`

Blocks of code are often commented out in C++ programs using `#if 0`:

```
#if 0
int n;
#endif
```

- Can appear only at the start of a logical line.
- Can only contain sequences of preprocessing tokens (including invalid tokens
such as `'`, but not including unterminated multiline string literals).
- End at the matching `#endif` delimiter.
- Unambiguous with any other syntax.
- Nest properly, and can have other kinds of comments nested within.

## Proposal
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

Comments always are the only content on their line. We provide two different
kinds of comment based on the semantic intent:

- Text comments begin `// ` (including whitespace after the comment
introducer), and provide human-readable commentary.
- Block comments begin `//{` and run to the matching `//}` line, and are used
to comment out blocks of code.
- Other characters following `//` are reserved for future evolution.

## Details

### Overview

A _comment_ is a lexical element beginning with the characters `//` and running
to the end of the line. We have no mechanism for physical line continuation, so
a trailing `\` does not extend a comment to subsequent lines.

> _Experimental:_ There can be no text other than horizontal whitespace before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth saying something about what an automatic C++-to-Carbon translator should do with comments under this proposal, given the more restrictive placement. (Or is the idea that we shouldn't expect C++ comments to get transferred to the Carbon version verbatim?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have a solid answer here, but I do think it's important that we translate comments in the common case.

Comments at the start of a line seem straightforward to translate, and I'd imagine we'd make a policy decision that we produce documentation whenever a comment appears at a suitable position. Trailing comments seem slightly harder, because there's the risk that they aren't directly associated with the line to the left:

int left;  // The location of the widget relative
int top;  // to its closest ancestor.

But I think we can get a 95+% conversion by moving these comments before their declarations.

Comments in unusual places (specifically, neither at the beginning nor at the end of a statement) seem likely to cause problems for translation irrespective of whether we permit intra-line / block / trailing comments in Carbon. I don't really have a good answer there. Tracking where they are in the C++ AST and mapping that into the Carbon AST seems like it may be infeasible in general. I expect such comments would also be a problem when migrating between versions of Carbon, although likely less so, because we'll hopefully mostly be doing localized rewrites rather than full-source-file rewrites.

I'm happy to include something along these lines in the proposal, though I wonder how much detail we should go into here versus in a proposal more targeted at migration. Maybe this is something we should be systematically asking of all proposals involving language syntax?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right that this is something we should be systematically asking of all proposals involving language syntax. Or to make a slightly weaker claim: this is something we should be systematically asking of proposals involving language syntax in all cases where there's reason to think translation might be problematic.

I asked in this case because the proposed Carbon comment syntax is more restrictive than the existing C++ comment syntax, which makes me wonder if there would always be a good Carbon translation for C++ comments. I think there's a chance, once we start writing a C++-to-Carbon translator, that we'll find it's easiest to allow Carbon comments in all of the places where C++ comments are allowed.

> the `//` characters introducing a comment. Either all of a line is a comment,
> or none of it.

The _kind_ of a comment is determined by the character(s) after the `//`
characters as follows:

- whitespace: the comment is a [text comment](#text-comments)
- `{` or `}`: the comment is an opening or closing
[block comment](#block-comments) delimiter, respectively
- anything else: the input is invalid

For the purpose of the above rule, the end of the file is considered to be
whitespace. The `//` characters at the start of a comment, followed by the above
additional characters, form the _comment introducer_.

### Text comments

A _text comment_ is a comment introduced by `//` followed by whitespace. Text
comments do not result in tokens.

Example:

```carbon
// This is a comment and is ignored. \
This is not a comment.

var Int: x; // error, trailing comments not allowed
```

### Block comments

An _opening block comment line_ is a line starting with `//{`, with no
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
indentation. A _closing block comment line_ is a line starting with `//}`, with
geoffromer marked this conversation as resolved.
Show resolved Hide resolved
no indentation. An indented comment starting with `//{` or `//}` is an error. A
line that starts with `//{` or `//}` in the middle of another lexical element
(in particular, a multi-line string literal) is an error. Any text following the
`//{` or `//}` of a block comment line is an error.

A _block comment_ is a comment that starts with an opening block comment line
and ends with a closing block comment line. Block comments nest: every line for
which the total number of preceding opening block comment lines is greater than
the total number of preceding closing block comment lines is part of a block
comment.

Examples (assuming as a placeholder that we use the string literal syntax from
[proposal 17](https://github.com/carbon-language/carbon-lang/pull/17):

```carbon
//{
fn CommentedOutFunction() {
// It's OK to include a //} in the middle of this comment; it's not a
// comment introducer so doesn't end the block comment.

//} is not a closing block comment line, so doesn't end the comment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it doesn't start at the beginning of the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify that in the text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by removal of this section.


//{
Nested comment.
//}

// The block comment doesn't end half way through the string literal,
// because the //} is indented.
var String: closing_comment_marker = r"""
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
//}
""";
}
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
// This ends the block comment that began on the first line.
//}
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
```

```carbon
// The next line is an error because the //{ is not at the start of the line.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
//{
```

#### Block comments rationale

It is important to be able to comment out a block of Carbon code confident in
the knowledge that all text between the comment markers (and exactly that text)
was in fact commented out. This leads to the following requirements:

- Block comments must nest.
- Closing comment markers in multi-line tokens such as string literals and in
any kind of nested comment do not close the outer comment.
- Opening comment markers in multi-line tokens such as string literals and in
line comments do not introduce an additional unintended level of commenting.

In addition, block comment syntax should not require lexing the contents of the
comment. Therefore we need to disallow the block comment closing syntax from
appearing in other tokens (in particular, in any form of multi-line string
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
literals). There are at least two reasonable ways to do this:

- Require block comment opening and closing lines to be unindented and require
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
such other multi-line tokens to be indented.
- Pick a syntax for block comment opening and closing lines that cannot appear
in other multi-line tokens.

Our chosen approach is the first of these options: we accept block comment
markers only if they appear, unindented, at the start of a line. All other
multi-line tokens that can contain a line resembling a block comment marker
(such as a multi-line string literal) will need to be able to be indented:

```
// OK, not a block comment line.
var String: opening_comment_marker = """
//{
""";

// Error, contains a block comment line that's not part of a block comment.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
// This string literal must be indented.
var String: closing_comment_marker = """
//}
""";

//{
Comment ends on the next line, even though this line contains '"""'.
//}
```

### Reserved comments
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

Comment introducers that do not have one of the forms specified in this proposal
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
are invalid. That is, if an attempt is made to form a lexical element where the
next two characters are `/` and `/`, and the following character is neither
whitespace nor one of `{`, `}`, `/`, or `!`, the program is invalid.
jonmeow marked this conversation as resolved.
Show resolved Hide resolved

#### Reserved comment introducers rationale

We anticipate the possibility of adding additional kinds of comment in the
future. Reserving syntactic space in comment syntax, in a way that is easy for
programs to avoid, allows us to add such additional kinds of comment as a
non-breaking change.

## Alternatives considered

### Intra-line comments

We could include a feature similar to C-style block comments, as a way to
provide comments that attach to some element of the program smaller than a line.
In C++ code, such comments are frequently used to annotate function parameter
names:

```
render(/*use_world_coords=*/true, /*draw_frame=*/false);
```

We expect this need to be addressed by a different mechanism in Carbon, such as
by way of named parameters.
zygoloid marked this conversation as resolved.
Show resolved Hide resolved

We could permit trailing comments on a line that contains other content. Such
comments are most frequently used in our sample C++ corpus to describe the
meaning of an entity, label, or close brace on the same line:

```
namespace N {
int n; // number of hats
enum Mode {
mode1, // first mode
mode2 // second mode
};
} // end namespace N
```

In all cases but the last, we expect it to be reasonable to move the comment to
before the declaration. For the case of an "end namespace" comment, the
underlying problem is solved in a different way, by not providing a syntax for
namespaces as delimited scopes.
chandlerc marked this conversation as resolved.
Show resolved Hide resolved

chandlerc marked this conversation as resolved.
Show resolved Hide resolved
Intra-line comments present a challenge for code formatting tools, which would
need to understand what part of the program syntax the comment "attaches to" in
order properly reflow the comment with the code. This concern is mitigated, but
not fully eliminated, by requiring comments to always be on their own line. We
could restrict text comments to appear in only certain syntactic locations to
fully resolve this concern, but doing so would remove the flexibility to insert
comments in arbitrary places:
dhollman marked this conversation as resolved.
Show resolved Hide resolved

```
match (x) {
case .Foo(1, 2,
// This might be 3 or 4 depending on the size of the Foo.
Int: n) => { ... }
}
```

The decision to not support intra-line comments is **experimental** and should
zygoloid marked this conversation as resolved.
Show resolved Hide resolved
be revisited if we find there is a need for such comments in the context of the
complete language design.

### Multi-line text comments

No support is provided for multi-line text comments. Instead, the intent is that
such comments are expressed by prepending each line with the same `// ` comment
marker.

Requiring each line to repeat the comment marker will improve readability, by
removing a source of non-local state, and removes a needless source of stylistic
variability. The resulting style of comment is common in other languages and
well-supported by editors. Even in C and C++ code that uses `/* ... */` to
comment out a block of human-readable text, it is common to include a `*` at the
start of each comment continuation line.

### Block comment alternatives

We considered various different options for block comments. Our primary goal was
to permit commenting out a large body of Carbon code, which may or may not be
well-formed. Alternatives considered included:

- Fully line-oriented block comments, which would remove lines without regard
for whether they are nested within a string literal, with the novel feature
of allowing some of the contents of a block string literal to be commented
out. This alternative has the disadvantage that it would result in
surprising behavior inside string literals containing Carbon code.
- Fully lexed block comments, in which a token sequence between the opening
and closing comment marker is produced and discarded, with the lexing rules
relaxed somewhat to avoid rejecting ill-formed code. This would be analogous
to C and C++'s `#if 0` ... `#endif`. This alternative has the disadvantage
that it would be inefficient to process.
chandlerc marked this conversation as resolved.
Show resolved Hide resolved
- A hybrid approach, with `//\{` and `//\}` delimiters that are invalid in
non-raw string literals, and with an indentation requirement for raw string
literals only. This alternative has the disadvantage of introducing
additional complexity into the lexical rules by treating different kinds of
string literals differently.
- Use of `/*` and `*/` as comment markers. This alternative has the
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
disadvantage that it risks confusion by using similar syntax to C and C++
but with divergent semantics.

### Documentation comments

We could add a distinct comment syntax for documentation comments, perhaps
treating documentation comments as producing real tokens rather than being
stripped out by the lexer. However, during discussion, we decided that we prefer
jonmeow marked this conversation as resolved.
Show resolved Hide resolved
using a syntax that does not resemble a comment for representing documentation.
For example, we could introduce an attribute syntax, such as using
`# <expression>` as a prefix to a declaration to attach attributes. Then a
string literal attribute can be treated as documentation:

```carbon
#"Get the size of the thing."
fn GetThingSize() -> Int;

#"""
Rate the quality of the widget.

Returns a quality factor between 0.0 and 1.0.
"""
fn RateQuality(
#"The widget to rate."
Widget: w,
#"A widget quality database."
QualityDB: db) -> Float;
```

This will be explored by a future proposal.
dhollman marked this conversation as resolved.
Show resolved Hide resolved
zygoloid marked this conversation as resolved.
Show resolved Hide resolved