Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whitespace and newline rules. #264

Merged
merged 3 commits into from
Dec 17, 2014
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 29 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Spec

* TOML is case sensitive.
* Whitespace means tab (0x09) or space (0x20).
* Newline means CR (0x0A) or LF (0x0D).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Newline was defined as either \r\n or \n. This leaves out a lone \r as qualifying as a new line, but I think this OK, unless it's still commonly used somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

\r was once used on Mac, but Mac OS X changed that AFAIK.

Comment
-------
Expand Down Expand Up @@ -116,26 +117,34 @@ purpose.
Sometimes you need to express passages of text (e.g. translation files) or would
like to break up a very long string into multiple lines. TOML makes this easy.
**Multi-line basic strings** are surrounded by three quotation marks on each
side and allow newlines. If the first character after the opening delimiter is a
newline (`0x0A`), then it is trimmed. All other whitespace remains intact.
side and allow newlines. Any newline characters (0x0A or 0x0D) immediately
following the opening delimiter will be trimmed. All other whitespace and
newline characters remain intact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wording is a little ambiguous. From my reading, I could imagine that """\n\n\nThe string""" would have all three \n characters trimmed, when I think we only want the first one trimmed.

Can we simply enumerate all cases in which characters are trimmed? So:

If the first character of a multiline string is \n, then it is removed. Similarly, if the first two characters of a multiline string are \r\n, then they are removed.

If Newline is redefined, then I think we could simplify this to:

If a multiline string starts with a Newline, then it is removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

```toml
# The following strings are byte-for-byte equivalent:
# On a Unix system, the following strings are byte-for-byte equivalent:
key1 = "One\nTwo"
key2 = """One\nTwo"""
key3 = """
One
Two"""

# On a Windows system, the following strings are byte-for-byte equivalent:
key4 = "One\r\nTwo"
key5 = """One\r\nTwo"""
key6 = """
One
Two"""
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "One\nTwo" is always equivalent to """One\nTwo""" regardless of OS. Similarly, "One\r\nTwo" is always equivalent to """One\r\nTwo""". That is, I don't think we need to mention OS here.

The key is that we permit either \n or \r\n to stand for a single line delimiter. I think everything after that is gravy.

To @ChristianSi's point, I'm not sure that we need to specify universal line handling in the spec. This lets the parser choose how to handle Newline in strings, which may include doing nothing at all and simply transmitting the string as is. We could suggest it, though:

TOML parsers should feel free to normalize Newline to whatever makes sense for their platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter sentence would address my concerns, yes.

👍


For writing long strings without introducing extraneous whitespace, end a line
with a `\`. The `\` will be trimmed along with all whitespace (including
newlines) up to the next non-whitespace character or closing delimiter. If the
first two characters after the opening delimiter are a backslash and a newline
(`0x5C0A`), then they will both be trimmed along with all whitespace (including
newlines) up to the next non-whitespace character or closing delimiter. All of
the escape sequences that are valid for basic strings are also valid for
multi-line basic strings.
first characters after the opening delimiter are a backslash and a newline
(`0x5C0A` or `0x5C0D0A`), then they will both be trimmed along with all
whitespace and newlines up to the next non-whitespace character or closing
delimiter. All of the escape sequences that are valid for basic strings are also
valid for multi-line basic strings.

```toml
# The following strings are byte-for-byte equivalent:
Expand Down Expand Up @@ -177,9 +186,9 @@ Since there is no escaping, there is no way to write a single quote inside a
literal string enclosed by single quotes. Luckily, TOML supports a multi-line
version of literal strings that solves this problem. **Multi-line literal
strings** are surrounded by three single quotes on each side and allow newlines.
Like literal strings, there is no escaping whatsoever. If the first character
after the opening delimiter is a newline (`0x0A`), then it is trimmed. All other
content between the delimiters is interpreted as-is without modification.
Like literal strings, there is no escaping whatsoever. Any newline characters
(0x0A or 0x0D) immediately following the opening delimiter will be trimmed. All
other content between the delimiters is interpreted as-is without modification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```toml
regex2 = '''I [dw]on't need \d{2} apples'''
Expand Down Expand Up @@ -306,22 +315,21 @@ apart from arrays because arrays are only ever values.
```

Under that, and until the next table or EOF are the key/values of that table.
Keys are on the left of the equals sign and values are on the right. Keys start
with the first character that isn't whitespace or `[` and end with the last
non-whitespace character before the equals sign. Keys cannot contain a `#`
character. Key/value pairs within tables are not guaranteed to be in any
specific order.
Keys are on the left of the equals sign and values are on the right. Whitespace
is ignored around key names and values.

Key names may only consist of non-whitespace, non-newline characters excluding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete these two lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indenting is covered by the previous statement that "Whitespace is ignored around key names and values.", so I thought it to be redundant. Also, the ability to indent is a weird way to segue into nested tables, and makes it sound as if indentation might carry some semantic value.

`=`, `#`, `.`, `[`, and `]`.

Key/value pairs within tables are not guaranteed to be in any specific order.

```toml
[table]
key = "value"
```

You can indent keys and their values as much as you like. Tabs or spaces. Knock
yourself out. Why, you ask? Because you can have nested tables. Snap.

Nested tables are denoted by table names with dots in them. Name your tables
whatever crap you please, just don't use `#`, `.`, `[` or `]`.
Dots are prohibited in key names because dots are used to signify nested tables!
Naming rules for each dot separated part are the same as for keys (see above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

```toml
[dog.tater]
Expand Down