Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit the set of disallowed characters in a file type indicator #2140

Closed
zygoloid opened this issue Sep 1, 2022 · 4 comments
Closed

Revisit the set of disallowed characters in a file type indicator #2140

zygoloid opened this issue Sep 1, 2022 · 4 comments
Labels
leads question A question for the leads team

Comments

@zygoloid
Copy link
Contributor

zygoloid commented Sep 1, 2022

#1360 changed multiline string literals from using """ to using ''' as delimiters, but it left this rule alone:

A file type indicator is any sequence of non-whitespace characters other than " or #.

This doesn't seem like an appropriate restriction any more. Maybe one of these options would be more useful:

  1. Don't have any restriction.
  2. Disallow ' and #, so that for example #'''# doesn't run onto the next line, in case it was intended to be some kind of raw character literal.
  3. Disallow ' and #, but only at the start of the file type indicator.
  4. Disallow ' throughout and # at the start.
  5. Disallow anything that looks like the terminator (''' plus N hashes) from appearing within the string literal after the first character and before the first newline.

Some examples:

// Disallowed by status quo, allowed by all the above
var cpp20: String = '''c++ format="IndentWidth: 4"
''';
// Allowed by (1), (3), (5)
var haskell_prime: String = '''haskell'
x = 0
''';
// Allowed by (1), (3), (4), (5)
var c_sharp: String = '''c#
var n = 0;
''';
// Allowed by (1) and (3).
var looks_like_single_line_string = '''not_the_content''';
''';
// Allowed by (1) only.
var too_many_single_quotes = ''''
''';
// Allowed by (1) and (5).
var too_many_quotes_but_raw = #'''''
'''#;
// Allowed by (1) only.
var not_a_raw_character_literal = #'''#
'''#;
// Allowed by (1) and (5).
var still_not_a_raw_character_literal = #'''x'#
'''#;
@chandlerc
Copy link
Contributor

I think I prefer #2. I think failing to switch from " to ' was an oversight and certainly not my intent.

I don't see any real advantages of having the less restrictive options here, and the simplicity of the rule in addition to the ability to catch a wide variety of possible typos or syntax confusions seems reasonable to motivate the restruction.

@chandlerc chandlerc added the leads question A question for the leads team label Sep 2, 2022
@zygoloid
Copy link
Contributor Author

zygoloid commented Sep 12, 2022

I'm slightly motivated to allow c# in particular, and slightly motivated to allow an arbitrary string to be represented somehow (maybe with escaping?), but not enough to argue for a change at this time. I think it's OK to wait and see how these filetype indicators are used in practice and if these restrictions are problematic. In the mean time, #2 seems like the right continuation of our previous decisions.

For reference, in GitHub-Flavored Markdown, an info string following a ``` fence cannot contain ` characters at all, but an info string following a ~~~ fence can contain any characters (other than newline), and "```c#" is valid in GFM to start a C#-formatted code block.

@chandlerc
Copy link
Contributor

I'm a bit split on c# vs. csharp. The existence of spelled-out aliases in most existing syntax highlighting systems I'm aware of makes me suspect that the restriction is fine, but as you say, time will tell. I'm pretty happy just starting restrictive and seeing how it goes.

@zygoloid
Copy link
Contributor Author

The leads have converged on option (2) for the time being. We'll keep an eye open for this decision causing issues (such as the c# issue) in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

2 participants