Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Proposal for Keys #220

Closed
wycats opened this issue Jun 24, 2014 · 32 comments
Closed

A Proposal for Keys #220

wycats opened this issue Jun 24, 2014 · 32 comments

Comments

@wycats
Copy link
Contributor

wycats commented Jun 24, 2014

This is related to #65, #67, #185, #90, #180, #62, #126, #83, and probably others.

Motivation

The TL;DR is that keys are currently slightly ambiguous, but also don't support a number of commonly desired characters.

A couple of examples:

[ips.127.0.0.1]
[directories.Space Separated]

In Cargo, this comes up because we use dependency names as keys:

[dependencies]

hammer.rs = "1.0.0"

# or

[dependencies.hammer.rs] = "1.0.0"

Proposal

So this would be valid:

[dependencies]

"hammer.rs" = "1.0.0"

# or

[dependencies."hammer.rs"] = "1.0.0"

I know there have been many discussions on this topic before. I have read them and have tried to include their considerations in this proposal.

One note: I think including - as a valid unadorned identifier character is extremely important.

@wycats
Copy link
Contributor Author

wycats commented Jun 24, 2014

cc @mojombo @BurntSushi

@mojombo
Copy link
Member

mojombo commented Jun 24, 2014

Of all the proposals I've seen for the "dots in key names" problem, I like this the best so far. One question: why not allow any character except .[]" in unquoted key parts?

@wycats
Copy link
Contributor Author

wycats commented Jun 24, 2014

You would also have to disallow white space and =. But sure!

@BurntSushi
Copy link
Member

I think this is a pretty graceful way to support more flexible key names, and your examples with the IP addresses and file names are compelling. But I'd like to include @mojombo's addendum. (The addendum makes this an almost-backwards-compatible change. But this would only affect users using " in their key names, which seems suspect.)

@wycats Would you like to submit a PR? I'm happy to do it otherwise. I think it should specifically state that key names may be in the string syntactic category, so that other types of strings may be used if they are added. (And I hope they are, e.g., raw strings.) This keeps the spec simple.

@wycats
Copy link
Contributor Author

wycats commented Jun 25, 2014

I can submit a PR, sure.

@jleclanche
Copy link

Copying my comment from another of the bugs:

I'd be strongly in favour of allowing spaces in keys. They are very commonly used in XDG desktop files and the like (example below), and since TOML is somewhat of a superset, this makes the formats implicitly compatible.

[Desktop Entry]
Name=Skype
Comment=Skype Internet Telephony
Exec=env skype %U
Icon=skype.png
Terminal=false
Type=Application
Encoding=UTF-8

@redhotvengeance
Copy link
Contributor

@jleclanche Keys with spaces in them is allowed in the TOML spec right now. The rules defined for value keys are:

Keys start with the first non-whitespace character and end with the last non-whitespace character before the equals sign.

And the rules defined for table keys are:

Name your tables whatever crap you please, just don't use a dot. Dot is reserved. OBEY.

That leaves open the ability to have spaces in keys. @BurntSushi's toml-test test runner also assumes that keys with spaces are valid.

@jleclanche
Copy link

Ah! Great. Then I'm happy with this. :)

@wycats
Copy link
Contributor Author

wycats commented Jun 25, 2014

So the idea is to allow all characters but a few up until the first equals sign, but trimmed for whitespace on the right?

@BurntSushi
Copy link
Member

Yup, that sounds right to me. (But also trimmed for whitespace on the left.)
On Jun 24, 2014 10:24 PM, "Yehuda Katz" [email protected] wrote:

So the idea is to allow all characters but a few up until the first equals
sign, but trimmed for whitespace on the right?


Reply to this email directly or view it on GitHub
#220 (comment).

@redhotvengeance
Copy link
Contributor

Trimmed for whitespace on both sides, I think. But whitespace is still allowed in the middle of the key.

So this key:

   i am key       = "hear me roar"

...becomes i am key.

@mojombo
Copy link
Member

mojombo commented Jun 26, 2014

My goal is the principle of least surprise. As such, whitespace and whitespace trimming should act to be as unsurprising as possible. Some examples may serve best:

# key names
abc def = 1     #=> {"abc def": 1}
abc   def = 1   #=> {"abc   def": 1}
 abc  def  =  1 #=> {"abc  def": 1}

# table names
[foo bar]       #=> {"foo bar": ...}
[foo bar.baz]   #=> {"foo bar": {"baz": ...}}
[foo   bar.baz] #=> {"foo   bar": {"baz": ...}}
[ foo bar.baz ] #=> {"foo bar": {"baz": ...}}
[     foo     ] #=> {"foo": ...}
[foo . bar]     #=> {"foo ": {" bar": ...}}

That last one is pretty funky, so sane people will probably use the quoted syntax to clarify:

["foo "." bar"] #=> {"foo ": {" bar": ...}}

@redhotvengeance
Copy link
Contributor

@mojombo This list is great, and super helpful!

I do question the last one, though ([foo . bar] #=> {"foo ": {" bar": ...}}). In my eyes, the . is the delimiter that separates keys, therefore everything on either side of the dot is a key, and should have the key rules applied to it. According the the key rules, the whitespace would be trimmed, so shouldn't it be:

[foo . bar]     #=> {"foo": {"bar": ...}}

If the goal is to include the whitespace in the keys, then users can fallback on your alternate:

["foo "." bar"] #=> {"foo ": {" bar": ...}}

@BurntSushi
Copy link
Member

@mojombo Some of those definitely aren't clear from the spec, particularly the table names. I don't think the spec mentions anything about whitespace in table names, so, e.g., [ foo ] really does have two spaces around it. I'd support a clarification that says table names (and each component) are trimmed on both sides for whitespace. So, e.g., [foo . bar] would become [foo.bar].

(Of course, this wouldn't apply to quoted keys.)

@redhotvengeance
Copy link
Contributor

👍 for clarification of table names and having each table name component be trimmed on both sides for whitespace. Since those are the rules for value keys, I think it'll keep it consistent for the TOML user.

@lra
Copy link

lra commented Aug 9, 2014

👍
I'd love to be able to put hostnames as keys to describe infrastructures.

This looks fine to me:

["some.host.tld"]
region = us-east

["domain.tld".us-west]
nameservers = [
  "ns-1.domain.tld",
  "ns-2.whatever.tld"
]

[us-west.dc-1a."host.domain.tld".master]
healthcheck = true

@jefferai
Copy link

Found this issue looking for a way to do exactly what @lra wants to do: use FQDNs as keys leading to tables. 👍 from me.

@dhardy
Copy link

dhardy commented Oct 8, 2014

Mostly nice, but two questions:

  1. Is allowing any Unicode in quoted keys a good idea? Normalisation?
  2. Is there any particular reason not to normalise to_and.to, say,!, thus giving all of the quoted keys in examples so far an equivalent unquoted form?

I guess this is less about name collisions than it is about whether keys can be used as values (being able to extract "domain.tld" or "hammer.rs" from the key and use it to find the intended server or file).

@cies
Copy link
Contributor

cies commented Oct 8, 2014

@wycats I like the proposal you make, but I think (like @dhardy) that it is alrgely a trade-off between TOML's syntactic complexity and verbosity of the a TOML document (in the case you describe). In the following example I show the verbose document that fits your example:

[[dependency]]
packageName = "hammer.rs"
version = "1.0.0"

@lra As with the example of @wycats, I also translated your example to a slightly more verbose document, which keeps the syntax of TOML lean.

[[node]]
hostname = "some.host.tld"
region = "us-east"

[[node]]
hostname = "domain.tld"
region = "us-east"
nameservers = [
  "ns-1.domain.tld",
  "ns-2.whatever.tld"
]

[us-west.dc-1a."host.domain.tld".master]
healthcheck = true

@mojombo I think that whitespace in table header speficiations should best be forbidden. And when allowed I prefer that it needs to be "string'ed". But as I make my case above, I rather not deal with the syntactic overhead of difficult keys. I rather have keys to be "easy on the eyes". In my example abobve I show how easy it is to move difficult values out of the key/tableheaders into the values.

@dhardy
Copy link

dhardy commented Oct 8, 2014

@wycats do you even want to use this syntax now? The manifest lists a somewhat different syntax.

@wycats
Copy link
Contributor Author

wycats commented Oct 8, 2014

@dhardy
Copy link

dhardy commented Oct 9, 2014

@wycats I mean the syntax there isn't [dependencies."hammer.rs"] but rather [dependencies.hammer]. I presume using the latter is okay?

For what it's worth, I actually find using lists of tables like @cies just proposed clearer to read than your proposal at the top of this page.

@cies
Copy link
Contributor

cies commented Oct 9, 2014

@dhardy somehow i hope that toml can remain as K.I.S.S.-able as it is in it's current shape (or more KISSable, by further restricting "difficult" stuff).

@dhardy
Copy link

dhardy commented Oct 9, 2014

The KISS approach would be to restrict keys to [a-zA-Z0-9_]+ or similar. Absolutely fine with me.

@cies
Copy link
Contributor

cies commented Oct 9, 2014

@dhardy indeed, I would like to keep allowing unicode. but no []. and whitespace. and allowing a bunch of special characters would also not hurt KISS imho.

another way, still quite KISS, would be [a-zA-Z0-9_]+ (or a bit more) and singe/double/trippe quoted strings for all that is fancy.

@ChristianSi
Copy link
Contributor

If we want to restrict keys, we should at least allow arbitrary Unicode letters and numbers, since the world isn't English-speaking only. The JavaScript definition of identifiers could serve as an example, except that there is no need to restrict the first character further.

Or, to keep it simple: key parts must be comprised of arbitrary sequences of characters belonging to the Unicode Categories Letter (L.), Number (N.) and Mark (M.), as well as _ and - (underscore, hyphen).

@dhardy
Copy link

dhardy commented Oct 10, 2014

@ChristianSi , if you do that I think you already need normalisation. There's quite a discussion on that in #65.

@ChristianSi
Copy link
Contributor

@dhardy I would leave that to applications rather than prescribing anything in the spec. JSON, I think, does the same.

@mk-pmb
Copy link

mk-pmb commented Jan 7, 2015

So with current TOML, can I have unicode characters 0 to 31 a key name? I'm not familiar enough with TOML yet to guess about \n, but \r\t and terminal control escape sequences will surely add to my experience of editing config files on a terminal with grep/sed/cat. Especially for terminals that support silent/password input mode.

Name your tables whatever crap you please, just don't use a dot. Dot is reserved. OBEY.

Probably well intentioned, but I'd still favor a whitelist approach. With a blacklist, many webdevs might remember to care about non-breaking space, some who know that JSON is not a JavaScript subset may even remember about U+2028 (line separator) and U+2029 (paragraph separator), and might hope that Unicode Consortium won't add too fancy new whitespace ever.

Or, to keep it simple: key parts must be comprised of arbitrary sequences of characters belonging to the Unicode Categories Letter (L.), Number (N.) and Mark (M.), as well as _ and - (underscore, hyphen).

I couldn't find a good enough twin for . in just letters and digits yet, so maybe it's safe (for now) to whitelist all letters and digits. Depending on font, U+05C5 (hebrew mark lower dot) might work in some terminals and editors.

@mojombo
Copy link
Member

mojombo commented Jan 7, 2015

@mk-pmb Please see #283 for the latest proposal on this matter.

@mk-pmb
Copy link

mk-pmb commented Jan 7, 2015

thanks!

@mojombo
Copy link
Member

mojombo commented Jan 15, 2015

Resolved by #283.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests