Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenise suffixes on all literals #19103

Merged
merged 7 commits into from
Nov 20, 2014
Merged

Conversation

huonw
Copy link
Member

@huonw huonw commented Nov 19, 2014

Futureproof Rust for fancier suffixed literals. The Rust compiler tokenises a literal followed immediately (no whitespace) by an identifier as a single token: (for example) the text sequences "foo"bar, 1baz and 1u1024 are now a single token rather than the pairs "foo" bar, 1 baz and 1u 1024 respectively.

The compiler rejects all such suffixes in the parser, except for the 12 numeric suffixes we have now.

I'm fairly sure this will affect very few programs, since it's not currently legal to have <literal><identifier> in a Rust program, except in a macro invocation. Any macro invocation relying on this behaviour can simply separate the two tokens with whitespace: foo!("bar"baz) becomes foo!("bar" baz).

This implements RFC 463, and so closes #19088.

@huonw
Copy link
Member Author

huonw commented Nov 19, 2014

I made some changes to src/grammar and the makefiles along the way:

  • update to compile with master,
  • make the src/grammar/verify.rs executable built as part of the check-all target (apparently this is run on the bots?) to avoid it going completely out of date in future,
  • update the antlr4 grammar to hopefully reflect these lexical changes.

However, I don't have antlr4 working properly: I downloaded it and it generated a java file (so my new grammar parses correctly), but building that just gave me a pile of "symbol not defined" errors. I don't know how to wrangle java & classpaths into doing what I want very well at all...

This adds an optional suffix at the end of a literal token:
`"foo"bar`. An actual use of a suffix in a expression (or other literal
that the compiler reads) is rejected in the parser.

This doesn't switch the handling of numbers to this system, and doesn't
outlaw illegal suffixes for them yet.
This moves errors and all handling of numeric suffixes into the parser
rather than the lexer.
This makes the formal lexical grammar (more closely) reflect the one
implemented by the compiler.
This changes the stated grammar of literals to move all suffixes into
the generic literal production.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue for RFC 463 - tokenise idents immediately after a literal as part of the literal
2 participants