Relax variant-name grammar #90

stasm · 2018-02-09T12:10:29Z

variant-name is currently defined as one or more words.

/* exclude whitespace and [ \ ] { } */
word ::= (((char - line-break) - inline-space) - [#x5b#x5c#x5d#x7b#x7d])+
variant-key ::= number | word (_ word)*

We don't use words anywhere else and I find it weird that they forbid so many arbitrary characters. Should we relax it to only forbid ] and maybe [? Should we allow escaping with \]? As in: [variant name with a closing bracket \] looks like this].

Additionally, I don't see why we have to force variant-name to be non-blank. [] should parse as a variant name of "" (a zero-width string). It's nor terribly useful, but consistent.

The text was updated successfully, but these errors were encountered:

zbraniecki · 2018-02-09T15:53:30Z

My concern here is that it opens up room for a lot of edge cases that may be confusing and potentially hard to handle. In particular:

Empty/accidental variant names with copy pasted invisible characters that are not matched later
Characters that are unlikely to be useful but since we allowed them will block us from using those characters anywhere else

I'm not completely against pursuing this direction (I know stas was in favor of it from the very beginning), but I'd like to see use cases and/or examples of users wanting to use this feature. I'm concerned about limiting our future options by relaxing the syntax with noone asking for it.

Pike · 2018-02-09T17:36:09Z

I think there's value in variant-key being strict. This is mostly about the readability of the message. At the point where we have variants, that already goes down a fair amount. Having complexity in there would further degrade that.

Side question, is 15 ducks an OK variant-key?

zbraniecki · 2018-02-09T18:45:03Z

15 ducks is not an OK variant-key because it starts with a digit (1) so we expect it to be a number type value.

stasm · 2018-05-21T13:34:53Z

Use-cases

There are two use-cases which would be well-served by relaxing the grammar of VariantName:

Term attribute values in the native language.
Free-form variable values.

Native term attribute values

Term attributes are private to the current language. They may represent grammatical concepts which are foreign to the English grammar. Consequently, they may be hard to give English names to. To fully embrace the asymmetric design of terms and their attributes, I'd like to make it possible for attribute values to be written in the target language of the localization.

-brand-name = Firefox
    .Genus = männlich

(Ideally, the attribute's name could also contain non-ASCII characters. I'll get back to this in a second.)

Variable values

There might be cases where the localizer might want to provide a specialized translation for one of the possible variable values.

# The ʻ character is U+02BB, outside of the Basic Latin block.
welcome = { $state ->
    [Hawaiʻi] Aloha!
   *[other] Welcome!
}

In most cases, dynamic references (#80) will be the preferred way of implementing such UIs, but they do have the drawback of requiring the developer's intervention. For lightweight one-off customizations simple SelectExpressions like the one above might still be good and cheap solutions.

Proposed design

I suggest that make the [name] syntax restricted to just identifiers. This is actually more restrictive than today because today's grammar also allows whitespace.

For names containing any other characters, including whitespace, ] or any non-Latin characters, I propose a new syntax: ["name"]. The standard StringExpression grammar would apply here.

This design has the advantage of being explicit about digits: [15 ducks] (illegal) vs. ["15 ducks"], as well as about whitespace: [ trim whitespace ] vs. [" keep whitespace "].

~~Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax.~~ (Moved to #117.)

zbraniecki · 2018-05-21T15:21:43Z

If I'm not mistaken this is something that we did not encounter anyone requesting yet. Can it wait for post-1.0 and any user request?

stasm · 2018-05-21T15:27:10Z

The sooner we relax, the less trouble we'll encounter due to cross-channel.

We haven't seen much use of custom term attributes yet, which is why I think we're not seeing these requests. I'd like to default to Unicode wherever possible. It's 2018.

zbraniecki · 2018-05-21T15:43:54Z

As I stated before, the risk of Unicode here is that it opens edge cases to be legitimate unless you blacklist them. Things like \n, \t, many forms of whitespaces etc. all become proper variant name characters.
It would be nice to research the concept of visible/non-breaking/non-whitespace unicode characters and maybe use this notion? Maybe we can use UAX31? My worry is that I don't feel I know enough about traps and limitations coming from allowing Unicode in syntax and I've seen how much time languages that wanted to introduce them spent trying.
Do you think we're somehow immune to such issues? Is there any programming language that successfully did this not via UAX31?

stasm · 2018-05-21T15:54:52Z

I'm trying to make a point that this is about matching strings, not identifiers. However, in context of identifiers, UAX31 could help us, yes.

In my proposal, ["text"] would follow the same semantics and parsing logic as StringExpressions do. In fact, I'm proposing that we make the current syntax ([text]) even more restricted than it is right now, so your concern is mitigated.

zbraniecki · 2018-05-21T16:14:19Z

key = { key2["
"] }

Would that make such code a valid FTL?

stasm · 2018-05-21T16:16:49Z

No, because StringExpressions may not contain newlines.

zbraniecki · 2018-05-21T16:37:16Z

Oh! Thanks for explaining it to me! No objections.

stasm · 2018-05-21T19:11:30Z

Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax.

I filed #117 to keep this out of scope of this issue.

stasm · 2018-05-23T12:26:56Z

I'd like to nominate this for Syntax 0.6 rather than 0.7. Or, perhaps we could separate this into two changes:

Syntax 0.6: Restrict variant keys to numbers and identifiers only.
Syntax 0.7: Alllow string expressions as variant keys, too.

@Pike -- what do you think? #118 implements both in one go, but I'll be happy to split it in two.

Pike · 2018-05-23T13:12:38Z

TBH, I don't like the proposal. I didn't get to actually read it before just now, sorry.

What I don't like in particular is introducing identifier and it's constraints in an unrelated context.

I think what we should do is more in the line of just supporting StringExpressions as VariantNames, and un-special-case the " for that context. That also implies dropping NumberExpression from the grammar.mjs, and to make the number detection in abstract.mjs, I think. Then 15 ducks and ducks 15 work both (which they currently don't).

In particular splitting this between 0.6 and 0.7 leaves us without the ability to express [männlich], and that's not what 0.6 is about.

stasm · 2018-05-23T19:33:53Z

You're right about 0.6, let's keep this in 0.7 for now, thanks.

Do you mean introducing a new production, say, bracketed_text which is like quoted_text but delimited with [ and ]? Would you require only ] to be escaped when needed? I'm concerned about adding unnecessary complexity which extends the set of contextually-special characters.

What I think this boils down to is how we look at [ and ]. Are they supposed to delimit the variant name or to differentiate the variant's key from the variant's value? A useful test to perform is this:

padded-variant-keys =
    { $num ->
        [one  ] One
        [many ] Many
       *[other] Other
    }

In the example above, what are the variants' keys?

one, many and other, or
one__, many_, other (using _ for spaces).

If it's the former, than [ and ] are not delimiters. There's an extra trimming logic involved. Which means that variant keys are not StringExpressions but with a different delimiter.

If it's the latter, than $num == 1 will likely not match against one__ because the plural category is called one, without the trailing space. I don't think we want this behavior.

["string literal"] solves this by sticking to a single set of rules for what makes a character special. In other words, we already have a grammar production for delimited text, and it's quoted_text.

I'm also not sold on moving the parsing of numbers to abstract.mjs. I don't want to make them second-class citizens of the grammar. There might be ways around it, though. For instance, we could move the brackets inside of each of the VariantKey's alternates to make sure it parses fully. "[" NumberExpression | VariantName) "]" could become ("[" NumberExpression "]" | "[" bracketed_text"]").

stasm added this to the Syntax 0.7 milestone Feb 9, 2018

stasm added the syntax label Mar 26, 2018

stasm added syntax and removed syntax labels May 15, 2018

stasm mentioned this issue May 21, 2018

Allow non-ASCII identifiers #117

Open

stasm mentioned this issue May 22, 2018

Variant keys can now be numbers, identifiers or quoted text #118

Open

stasm added the status: has PR label May 22, 2018

stasm mentioned this issue May 22, 2018

New syntax for Unicode literals #115

Closed

stasm removed this from the Syntax 0.7 milestone May 23, 2018

stasm removed the status: has PR label May 23, 2018

stasm mentioned this issue May 23, 2018

Restrict variant keys to Numbers and Identifier #127

Closed

stasm added forwards incompatible Old parsers won't parse newer files. backwards incompatible Old files won't parse in new parsers. labels May 25, 2018

stasm added the FUTURE Ideas and requests to consider after Fluent 1.0 label Jul 26, 2018

stasm mentioned this issue Oct 19, 2018

Allow selectors to be lists #4

Open

Pike mentioned this issue Oct 23, 2018

Remove backslash escapes from TextElement #123

Closed

stasm mentioned this issue Nov 5, 2018

Unicode Escapes should cover more unicode planes #194

Closed

stasm mentioned this issue Mar 14, 2019

BaseNode.equals support, and other ast goodness from python-fluent projectfluent/fluent.js#172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax variant-name grammar #90

Relax variant-name grammar #90

stasm commented Feb 9, 2018

zbraniecki commented Feb 9, 2018

Pike commented Feb 9, 2018

zbraniecki commented Feb 9, 2018

stasm commented May 21, 2018 •

edited

Loading

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

zbraniecki commented May 21, 2018 •

edited

Loading

stasm commented May 21, 2018

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

Relax variant-name grammar #90

Relax variant-name grammar #90

Comments

stasm commented Feb 9, 2018

zbraniecki commented Feb 9, 2018

Pike commented Feb 9, 2018

zbraniecki commented Feb 9, 2018

stasm commented May 21, 2018 • edited Loading

Use-cases

Native term attribute values

Variable values

Proposed design

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

zbraniecki commented May 21, 2018 • edited Loading

stasm commented May 21, 2018

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

zbraniecki commented May 21, 2018

stasm commented May 21, 2018

stasm commented May 23, 2018

Pike commented May 23, 2018

stasm commented May 23, 2018

stasm commented May 21, 2018 •

edited

Loading

zbraniecki commented May 21, 2018 •

edited

Loading