Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax variant-name grammar #90

Open
stasm opened this issue Feb 9, 2018 · 15 comments
Open

Relax variant-name grammar #90

stasm opened this issue Feb 9, 2018 · 15 comments
Labels
backwards incompatible Old files won't parse in new parsers. forwards incompatible Old parsers won't parse newer files. FUTURE Ideas and requests to consider after Fluent 1.0 syntax

Comments

@stasm
Copy link
Contributor

stasm commented Feb 9, 2018

variant-name is currently defined as one or more words.

/* exclude whitespace and [ \ ] { } */
word ::= (((char - line-break) - inline-space) - [#x5b#x5c#x5d#x7b#x7d])+
variant-key ::= number | word (_ word)*

We don't use words anywhere else and I find it weird that they forbid so many arbitrary characters. Should we relax it to only forbid ] and maybe [? Should we allow escaping with \]? As in: [variant name with a closing bracket \] looks like this].

Additionally, I don't see why we have to force variant-name to be non-blank. [] should parse as a variant name of "" (a zero-width string). It's nor terribly useful, but consistent.

@stasm stasm added this to the Syntax 0.7 milestone Feb 9, 2018
@zbraniecki
Copy link
Collaborator

My concern here is that it opens up room for a lot of edge cases that may be confusing and potentially hard to handle. In particular:

  • Empty/accidental variant names with copy pasted invisible characters that are not matched later
  • Characters that are unlikely to be useful but since we allowed them will block us from using those characters anywhere else

I'm not completely against pursuing this direction (I know stas was in favor of it from the very beginning), but I'd like to see use cases and/or examples of users wanting to use this feature. I'm concerned about limiting our future options by relaxing the syntax with noone asking for it.

@Pike
Copy link
Contributor

Pike commented Feb 9, 2018

I think there's value in variant-key being strict. This is mostly about the readability of the message. At the point where we have variants, that already goes down a fair amount. Having complexity in there would further degrade that.

Side question, is 15 ducks an OK variant-key?

@zbraniecki
Copy link
Collaborator

15 ducks is not an OK variant-key because it starts with a digit (1) so we expect it to be a number type value.

@stasm stasm added the syntax label Mar 26, 2018
@stasm stasm added syntax and removed syntax labels May 15, 2018
@stasm
Copy link
Contributor Author

stasm commented May 21, 2018

Use-cases

There are two use-cases which would be well-served by relaxing the grammar of VariantName:

  1. Term attribute values in the native language.
  2. Free-form variable values.

Native term attribute values

Term attributes are private to the current language. They may represent grammatical concepts which are foreign to the English grammar. Consequently, they may be hard to give English names to. To fully embrace the asymmetric design of terms and their attributes, I'd like to make it possible for attribute values to be written in the target language of the localization.

-brand-name = Firefox
    .Genus = männlich 

(Ideally, the attribute's name could also contain non-ASCII characters. I'll get back to this in a second.)

Variable values

There might be cases where the localizer might want to provide a specialized translation for one of the possible variable values.

# The ʻ character is U+02BB, outside of the Basic Latin block.
welcome = { $state ->
    [Hawaiʻi] Aloha!
   *[other] Welcome!
}

In most cases, dynamic references (#80) will be the preferred way of implementing such UIs, but they do have the drawback of requiring the developer's intervention. For lightweight one-off customizations simple SelectExpressions like the one above might still be good and cheap solutions.

Proposed design

I suggest that make the [name] syntax restricted to just identifiers. This is actually more restrictive than today because today's grammar also allows whitespace.

For names containing any other characters, including whitespace, ] or any non-Latin characters, I propose a new syntax: ["name"]. The standard StringExpression grammar would apply here.

This design has the advantage of being explicit about digits: [15 ducks] (illegal) vs. ["15 ducks"], as well as about whitespace: [ trim whitespace ] vs. [" keep whitespace "].

Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax. (Moved to #117.)

@zbraniecki
Copy link
Collaborator

If I'm not mistaken this is something that we did not encounter anyone requesting yet. Can it wait for post-1.0 and any user request?

@stasm
Copy link
Contributor Author

stasm commented May 21, 2018

The sooner we relax, the less trouble we'll encounter due to cross-channel.

We haven't seen much use of custom term attributes yet, which is why I think we're not seeing these requests. I'd like to default to Unicode wherever possible. It's 2018.

@zbraniecki
Copy link
Collaborator

zbraniecki commented May 21, 2018

As I stated before, the risk of Unicode here is that it opens edge cases to be legitimate unless you blacklist them. Things like \n, \t, many forms of whitespaces etc. all become proper variant name characters.
It would be nice to research the concept of visible/non-breaking/non-whitespace unicode characters and maybe use this notion? Maybe we can use UAX31? My worry is that I don't feel I know enough about traps and limitations coming from allowing Unicode in syntax and I've seen how much time languages that wanted to introduce them spent trying.
Do you think we're somehow immune to such issues? Is there any programming language that successfully did this not via UAX31?

@stasm
Copy link
Contributor Author

stasm commented May 21, 2018

I'm trying to make a point that this is about matching strings, not identifiers. However, in context of identifiers, UAX31 could help us, yes.

In my proposal, ["text"] would follow the same semantics and parsing logic as StringExpressions do. In fact, I'm proposing that we make the current syntax ([text]) even more restricted than it is right now, so your concern is mitigated.

@zbraniecki
Copy link
Collaborator

key = { key2["
"] }

Would that make such code a valid FTL?

@stasm
Copy link
Contributor Author

stasm commented May 21, 2018

No, because StringExpressions may not contain newlines.

@zbraniecki
Copy link
Collaborator

Oh! Thanks for explaining it to me! No objections.

@stasm
Copy link
Contributor Author

stasm commented May 21, 2018

Lastly, although this might be best discussed in a separate issue, we could consider allowing attributes names to contain arbitrary characters using a new ."attr name" = Value syntax.

I filed #117 to keep this out of scope of this issue.

@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

I'd like to nominate this for Syntax 0.6 rather than 0.7. Or, perhaps we could separate this into two changes:

  • Syntax 0.6: Restrict variant keys to numbers and identifiers only.
  • Syntax 0.7: Alllow string expressions as variant keys, too.

@Pike -- what do you think? #118 implements both in one go, but I'll be happy to split it in two.

@Pike
Copy link
Contributor

Pike commented May 23, 2018

TBH, I don't like the proposal. I didn't get to actually read it before just now, sorry.

What I don't like in particular is introducing identifier and it's constraints in an unrelated context.

I think what we should do is more in the line of just supporting StringExpressions as VariantNames, and un-special-case the " for that context. That also implies dropping NumberExpression from the grammar.mjs, and to make the number detection in abstract.mjs, I think. Then 15 ducks and ducks 15 work both (which they currently don't).

In particular splitting this between 0.6 and 0.7 leaves us without the ability to express [männlich], and that's not what 0.6 is about.

@stasm stasm removed this from the Syntax 0.7 milestone May 23, 2018
@stasm
Copy link
Contributor Author

stasm commented May 23, 2018

You're right about 0.6, let's keep this in 0.7 for now, thanks.

Do you mean introducing a new production, say, bracketed_text which is like quoted_text but delimited with [ and ]? Would you require only ] to be escaped when needed? I'm concerned about adding unnecessary complexity which extends the set of contextually-special characters.

What I think this boils down to is how we look at [ and ]. Are they supposed to delimit the variant name or to differentiate the variant's key from the variant's value? A useful test to perform is this:

padded-variant-keys =
    { $num ->
        [one  ] One
        [many ] Many
       *[other] Other
    }

In the example above, what are the variants' keys?

  • one, many and other, or
  • one__, many_, other (using _ for spaces).

If it's the former, than [ and ] are not delimiters. There's an extra trimming logic involved. Which means that variant keys are not StringExpressions but with a different delimiter.

If it's the latter, than $num == 1 will likely not match against one__ because the plural category is called one, without the trailing space. I don't think we want this behavior.

["string literal"] solves this by sticking to a single set of rules for what makes a character special. In other words, we already have a grammar production for delimited text, and it's quoted_text.

I'm also not sold on moving the parsing of numbers to abstract.mjs. I don't want to make them second-class citizens of the grammar. There might be ways around it, though. For instance, we could move the brackets inside of each of the VariantKey's alternates to make sure it parses fully. "[" NumberExpression | VariantName) "]" could become ("[" NumberExpression "]" | "[" bracketed_text"]").

@stasm stasm added forwards incompatible Old parsers won't parse newer files. backwards incompatible Old files won't parse in new parsers. labels May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards incompatible Old files won't parse in new parsers. forwards incompatible Old parsers won't parse newer files. FUTURE Ideas and requests to consider after Fluent 1.0 syntax
Projects
None yet
Development

No branches or pull requests

3 participants