Canonical Rust grammar distinct from parser (tracking issue for RFC #1331) #30942

nikomatsakis · 2016-01-15T19:45:17Z

This is the tracking issue for rust-lang/rfcs#1331, which specifies a procedure for creating a canonical grammar apart from the compiler. This is a multi-phase process. I think the first step, honestly, is just to lay out a firm plan of how to proceed -- what kind of automatic testing to use and so forth. The RFC provides a general plan but it needs to be made more concrete. Once this issue has an owner, I (or they) can update this summary and try to keep it up-to-date.

steveklabnik · 2016-01-26T22:32:55Z

@nagisa , as it was your RFC, do you have opinions on the strategy we take here? Or should I be thinking about this.

EDIT: @nagisa mentioned on IRC that they are too busy, so I'll come up with aplan.

SimonSapin · 2016-01-27T00:45:47Z

I’m curious what kind of testing can check that a grammar matches the not-grammar-based parser.

steveklabnik · 2016-01-27T00:59:29Z

Well, it can show the presence of bugs, but not the absence of them.

https://github.com/rust-lang/rust/blob/master/mk/grammar.mk is what we used to do. Just vestigial at this point :)

matklad · 2016-01-27T21:46:29Z

I’m curious what kind of testing can check that a grammar matches the not-grammar-based parser.

In intellij-rust, we have regression tests for the parser, which consist of a Rust file and a serialized AST. Then we check that the parser produces the expected AST. We also have some negative "you shall not parse" tests. The same technique can be applied to the reference grammar, if we ensure that the serialized AST is sufficiently abstract and doesn't leak grammar "implementation details".

The question here is what tool should be used for the reference grammar? I think it has to goals:

Create two rust parsers, which can be verified against each other.
Make it easier to learn and reason about Rust grammar.

One option is to continue to use Bison. In my opinion, it doesn't fulfill the second goal: grammar is difficult to read because of large amount of duplication and low level details. Regular expression extensions would be really nice to have in the canonical grammar.

Another extreme option is to build a custom parser generator (LALRPOP ?) :) It would have a nice side effect of making Rust more suitable for efficiently implementing programming languages: a realm currently occupied by C/C++.

nikomatsakis · 2016-01-27T22:10:30Z

I have opinions but not much time. I've been interesting in porting the Rust grammar to LALRPOP -- I think it'd be much cleaner, as LALRPOP has a number of features that should allow us to avoid some of the abuse of precedence declarations and the like. (I found the existing grammar not very useful in evaluating syntactic changes because of those precedence declarations, for example.)

(As an aside, this would probably be a good stress test for LALRPOP, I assume we'd have to fix various bugs or add new features to scale it up that large.)

nikomatsakis · 2016-01-28T17:50:55Z

So this morning I started porting the yacc grammar in the repository to LALRPOP, just to see what would happen. I made some progress. But I was wondering: is that grammar the most up to date version, or is the one in the repo more up to date? Does anyone know what the differences are between them?

fhahn · 2016-01-28T18:14:05Z

I was thinking about working on that as well, but I unfortunately don't have too much
time in the next month to work on this. What do you think about creating a
repo to collaborate? (I could take care of that as well)
On Jan 28, 2016 18:51, "Niko Matsakis" [email protected] wrote:

So this morning I started porting the yacc grammar in the repository to
LALRPOP, just to see what would happen. I made some progress. But I was
wondering: is that grammar the most up to date version, or is the one in
the repo more up to date? Does anyone know what the differences are between
them?

—
Reply to this email directly or view it on GitHub
#30942 (comment).

nagisa · 2016-01-28T19:46:45Z

The (parser-lalr) grammar originally comes from https://github.com/bleibig/rust-grammar, which hasn’t been updated in a while.

nikomatsakis · 2016-01-31T12:55:56Z

@fhahn

I was thinking about working on that as well, but I unfortunately don't have too much time in the next month to work on this. What do you think about creating a repo to collaborate? (I could take care of that as well)

I will do this. I am right now just working through some LALRPOP issues that arose (I'm not sure if it's a bug or if LALRPOP just needs optimization; I suspect both :) but once I get that sorted out a bit I will open up a repo and post the link here. Hopefully tomorrow. It seems @TyOverby may be interested as well, and no doubt others. For example, I was talking to @jorendorff, who developed https://github.com/jorendorff/rust-grammar/, as well.

nikomatsakis · 2016-02-01T13:26:48Z

OK, I put up a definite work in progress here: https://github.com/nikomatsakis/rustypop

I also described some conventions that I am aiming towards, but have not yet achieved.

This is hard work to parallelize in the early stages, but if you're interested in hacking on it, let me know over IRC or what have you (or just open some PRs). I'll probably alternate between working on it and improving LALRPOP (for example, I am now highly motivated to get a better printout on shift-reduce errors). =)

glaebhoerl · 2016-02-11T22:14:34Z

If we can generate a parser from the official grammar, and if we can run rustc in parse-only mode, and if we can make the two of them produce output and errors in identical format, then I wonder if we could use AFL to automatically generate test cases for them. Of course even this could not prove that they match, but it might give a higher level of assurance than a relatively smaller number of ad hoc human-devised test cases.

eternaleye · 2016-02-12T04:31:52Z

@glaebhoerl: Well, there's also that the entire formalism the official grammar is based on on is called "generative grammars" - using them to create working parsers came after using them to create exemplars, which is essentially a matter of transforming the grammar into a tree and executing a depth-first search.

For example:

start := possibility start?
possibility := light | neutral | dark
light := "Solitari" | "Gandalf"
neutral := "Lunitari" | "Switzerland"
dark := "Nuitari" | "Chocolate"

We form the following tree:

start
- possibility
  - light
    - "Solitari"
    - "Gandalf"
  - ...
- possibility start
- ...

With that, we then walk the tree, generating each possible string covered by the grammar.

One can do this efficiently, without actually generating the tree, by assigning each optional subrule a bit, treating the collection of those bits as a number, and counting.

Enabling a subrule will sometimes reveal another optional; in that case, push the current "number" onto a queue, and when you've finished counting the "basic optionals", pop items off the queue and count the newly-revealed optionals again, adding them to the queue if they reveal more.

That ensures one will generate every rule, in their shortest exemplars, before continuing on to the next shortest, etc.

nikomatsakis · 2016-02-22T21:07:52Z

OK, so there's probably something horribly wrong with it, but the rustypop crate now builds without any shift/reduce conflicts. I haven't actually tried RUNNING the code it produces, of course, and I have all empty actions, so it will only yield a "true/false" result. Plus I need to adapt the rustc tokenizer. But it seems like progress. :)

Enough progress that it may be possible to start parallelizing the work (debugging shift/reduce conflicts is kind of a serial task...).

Of course, I also expect that as soon as we try using this grammar we'll find that I did some bone-headed things that resolved all conflicts by just not parsing anything at all, or something like that.

Anyway, I plan to write up a blog post about the approach I took, since it made use of a number of LALRPOP features to try and pare down duplication. I also took the liberty of ripping out various bits of obsolete syntax from the existing rust-parser.y, and made a few arbitrary judgement calls about dubious bits of our grammar. ;)

nikomatsakis · 2016-02-22T21:09:08Z

Oh, I should mention that it generates a 500MB .rs file. Definitely need to do some work reducing the size of LALRPOP's output (@fhahn has actually been pursuing the most immediately obvious strategy, and I've been meaning to implement various other cool optimizations that I've read about...)

nikomatsakis · 2016-02-22T21:16:48Z

Hmm, now I see some more conflicts. So maybe premature. But still, getting close I think. :)

matklad · 2016-05-17T22:45:07Z

Another tool which can be used for the canonical grammar is antlr4. I have not used it myself, but I think it should be mentioned in this thread.

nagisa · 2016-05-17T22:54:52Z

I believe we used to use antlr4 before the yacc-based grammar got merged.

matklad · 2016-05-17T23:12:46Z

Looks like only the lexer was implemented in antlr4: https://github.com/rust-lang/rust/tree/29bd9a06efd2f8c8a7b1102e2203cc0e6ae2dcba/src/grammar

jorendorff · 2016-08-09T13:33:38Z

I used antlr4 to make https://github.com/jorendorff/rust-grammar and it has a few drawbacks:

antlr4 is too lax about ambiguity: if two productions match a program, antlr simply selects whichever production appeared first in the grammar. So my grammar is almost certainly full of ambiguities I don't know about.
antlr4 doesn't support lookahead except by a gruesome hack
There are places where we really want a production to take boolean arguments, instead of having two productions, like assign_expr and assign_expr_no_struct (the production for if-statements is IF expr_no_struct block to force parentheses around extremely strange conditions like if Range { start: a, stop: b }.count() > 0 { do_something() }). antlr4's support for arguments to productions is unusable, which is why my grammar has a second complete copy of the expression grammar tagged with _no_struct.

DemiMarie · 2016-08-10T01:21:14Z

@jorendorff I suggest using a macro preprocessor or build step to reduce some of the code duplication.

alexcrichton · 2016-08-22T22:16:59Z

cc #15880 (we'll want a bot for this)

matklad · 2016-09-17T10:34:16Z

Hey, and what about lexer / parser split?

Perhaps we should create a canonical lexical structure grammar before jumping onto the grammar for the whole language?

cc @dns2utf8

nagisa · 2016-09-17T12:13:28Z

@matklad I’d consider formalising lexer an inherent part of grammar formalisation.

That being said, unlike the grammar, which has been extended significantly over time, lexer has stayed considerably constant (i.e. is a different problem space) and the reference is still pretty good at capturing the lexical structure of the language.

matklad · 2016-09-17T12:46:44Z

I’d consider formalising lexer an inherent part of grammar formalisation.

The point is that lexer can be formalized before the rest of the grammar, so it is a good independent first step. Having an executable semi declarative specification of the lexer would help for technical reasons:

We will be able to add grammar testing infrastructure and make sure that it is executed during tests.
Having a tool which takes a rust file and produces tokens and spans will be useful for creating the grammar itself, because you wouldn't need to implement your own lexer. And it is important, because if you test your grammar against a large corpus of Rust code, you'll hit a lot of lexer bugs with a hand written lexer.

is still pretty good at capturing the lexical structure of the language.

It can be better though! There are some corner cases like 16 >> 2 vs collect::<Vec<_>>, raw string literals, self::foo::bar vs self::foo ::bar vs self ::foo::bar. Also, the reference is not executable, so it must have some bugs (like this one)

dns2utf8 · 2016-09-17T14:27:00Z

I would like to create a complete grammar first.
If the src/grammar/*.g4 files covered everything we were able to run a generated lexer against the tests and spot cases where one of them accepts where the other does not and find differences.

I am currently at RustFest.eu if somebody is here too and would like to talk a little about it.

Update grammar to parse current rust syntax Mainly addressing rust-lang#32723. This PR updates the bison grammar so that it can parse the current rust syntax, except for feature-gated syntax additions. It has been tested with all the tests in run-pass. The grammar in this repo doesn't have build logic anymore, but you can test it out in https://github.com/bleibig/rust-grammar, which has all of what's in this PR. If you are interested in having build logic and grammar tests again, I can look into implementing that as well. I'm aware that things are somewhat undecided as to what an official rust grammar should be from the discussion in rust-lang#30942. With this PR we can go back to having an up-to-date flex/bison based grammar, but the rustypop grammar looks interesting as well.

willy610 · 2018-01-19T22:10:13Z

Hello

Each open source jungle is difficult in the beginning.
Therefore, the following posts may be placed in the wrong place. Someone can certainly correct this to the right forum.

Well well

I lack a grammar for Rust. For users. There are fragments in some documents but no cohesive.
The grammar does not have to be the basis for compilation but should help to understand the structure of the language RUST.

I am 70 years old and have used RUST for a few months and have done a tool

https://github.com/willy610/bnf2railroad

who reads a grammar in EBNF style and produces so-called railroad views. The views are raw TTY, html with anchors and svg.

I have always thought that these railroad views have been a good complement to other documentation such as examples and small projects.
My long experience of C, Smalltalk, SQL, Mathematica, JS and Java has of course helped me to get RUST relatively quickly. RUST is a very ambitious and radical approach to programming.

Upon learning, I have interpreted the grammar files contained in RUST documentation. They are a bit incomplete and do not have the quality that other languages offer. Much is undefined.

Together with the tool there are grammar for Pascal, Smalltalk, Lua, Json, rege and fragments of RUST.

The working method could be:

Create an EBNF file for a section to be documented
Produce output in the form of railroad
Finally, run all the files in a scan to get complete documentation
If necessary, do an analysis using the tool for undefined, unused and duplicate

I have also used the tool to generate better help information for using the tool (parameters of the program)

Perhaps the tool could be useful internally for developers of RUST for, for example, specification, bug report etc

nikomatsakis · 2018-03-29T16:27:16Z

@harpocrates I'm curious, what is the status of your rust parser? @matklad and I have been talking about trying to get a better effort going here. The rough plan is to start with your grammar (probably converted to LALRPOP and then perhaps clean it up). Also, would you be interested in being involved in any such effort?

harpocrates · 2018-03-29T21:13:21Z

@nikomatsakis my rust parser is complete. I would be interested in helping out with any effort to convert to LALRPOP. What would be the best way to proceed?

Note that converting to LALRPOP may not be completely trivial. My grammar is currently written for Happy (a Haskell parser generator) and makes use of a handful of Happy features, namely:

parameterized productions (e.g. to reduce the boilerplate of defining many very similar productions for the different classes of expressions in Rust)
pushing back tokens (e.g. a >> token in something like Vec<Vec<i32>>, gets split into > and >)

nikomatsakis · 2018-03-29T21:25:24Z

@harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that.

nikomatsakis · 2018-03-29T21:28:48Z

@harpocrates

Note that converting to LALRPOP may not be completely trivial. My grammar is currently written for Happy (a Haskell parser generator) and makes use of a handful of Happy features, namely:

LALRPOP supports parameterized productions. It does not support pushing back tokens, but there is an alternative way to handle >> in any case. (You distinguish "> followed by >" as one token and "> followed by something else" as another kind of token.)

The trickier thing will be precedence, since LALRPOP does not support those sorts of declarations. We ought to be able to refactor the grammar though (I actually converted the existing .y grammar once already, in rustypop, but I figured I would rather start with your grammar since it seems far more complete and better tested.)

The first thing I had planned to do, in any case, is to write a tool to convert your grammar into LALRPOP syntax, and see how LALRPOP feels about it. The next priority would be a testing harness. It's rather late here so I'll have to write more later but I'd love to collaborate, particularly since my personal time is quite limited.

Separately, @matklad and I have been talking about extending LALRPOP with a way to generate default actions that will fire off events suitable for building a generic parse tree, roughly as described in their RFC.

willy610 · 2018-03-30T08:23:48Z

Hej Niko The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains. Today only the ASCII subset is supported. I really will - and can!? contribute - as I got a lot of time. I’m am retired but works at the computer at least 8 hour a day. As of writing I have a very tiny end expensive connection to the internet; just via phone as the wlan supplier in this big city of Gothenburg have trouble with just my connection for more than a week. Crazy. So please pass the git(s) to the most covering project on LALRPOP so I don’t have the surf around too much. And also I would very much like some kind of use case or scenarios how the function/tool should be used Looking forward to the work and the result! Kindly, Willy

…

On 2018-03-29, at 18:27, Niko Matsakis ***@***.***> wrote: @harpocrates I'm curious, what is the status of your rust parser? @matklad and I have been talking about trying to get a better effort going here. The rough plan is to start with your grammar (probably converted to LALRPOP and then perhaps clean it up). Also, would you be interested in being involved in any such effort? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

----------------------------- Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 [email protected]

willy610 · 2018-03-31T11:36:56Z

Hej I copy and pasted grammar rules from https://doc.rust-lang.org/grammar.html for RUST I did the same for Lua, for Smalltalk I read from the 'Smalltalk-80 The language’ and for Pascal of read the Nicklaus Wirth ’Algorithms + Data Structures = Program’ I have not locked into the RUST compiler source or any projects in crates.io. My vision! with my work was to support writers of language documentation giving them tools to work with - reference - subsets of proper/versioned grammar rules as inclusions of both BNF rules and RailRoad graphs. I assume that there are no explicit language rules in the compilers sources but the rules appear in some pre-step for generating part of the complier. That why I didn’t digger into the compiler source. So from some versioned preprocess rule definitions there might a possibility to extract rule snippets to be converted to BNF and RailRoad. But I will look into both the source of RUST compiler and some crates.io and the LALRPOP too. Kindly, Willy

…

On 2018-03-29, at 23:25, Niko Matsakis ***@***.***> wrote: @harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

----------------------------- Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 [email protected]

ehuss · 2018-03-31T19:37:42Z

You may also be interested in following rust-lang/reference#221. The people working on the reference have been making great progress, and the up-to-date grammar (at https://brauliobz.github.io/rust-reference/grammar.html) appears to be getting closer to complete.

nikomatsakis · 2018-04-17T10:34:55Z

@willy610

The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains.

Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it.

@ehuss

The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete

That's great! Is it presently being tested? And, if so, how?

willy610 · 2018-04-18T21:09:55Z

Hej Niko The project I’m referencing is my project #30942 (comment) My focus is to keep the possibility to generate railroad view from grammars of kind EBNF. And have the verification function there (Unused, missing rules etc) At the moment I’m investing some RUST grammar approaches 1. The old yacc from RUST source #30942 (comment) 2. From some md files in the Rust nursery https://github.com/rust-lang-nursery/reference/tree/master/src 3. From the work by Jason Orendorff. I think he said who wrote the grammar in order to understand RUST when writing the book Programming Rust. What a grammar and what a book!! https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4 4. and from nikos older? rusty-pop https://github.com/nikomatsakis/rustypop/blob/master/src/rusty.lalrpop 5. but also LALRPOP https://github.com/lalrpop/lalrpop A. In most sources I struggled with character set descriptions and all the regeexpr. So I introduced, in my EBNF, 'Character Set Expressions' with set, union, difference and range. It will show up soon. B. I think Mark Down ’marking’ is insufficient for a BNF. Perhaps generating plain HTML snippets from rules. They can be styled looking like program source. Color, font etc C. EBNF decorated with types is a must. I’m not there yet D. Actions in the rules like {} in yacc and => in LARPOP could perhaps be specified in the EBNF as a (parametric) reference to other source like pub Expr: Box<Expr> = { Expr ExprOp Factor @action(ActionName,Expr), Factor, }; So the @ or similar unused sign could be an element in the syntax. Clean up grammar from actions and relate them using ’@' instead. The intersection of EBNF and LALRPOP could be ... E. Or perhaps the best. Have a tool exporting from LALRPOP as EBNF (with type) And the continue with generation railroad and markdown files from that source. I’m really sorry for working mostly off road. I will probably not update the above bnf2railroad but add a heavy refactored one as an other project

…

On 2018-04-17, at 12:35, Niko Matsakis ***@***.***> wrote: @willy610 The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains. Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it. @ehuss The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete That's great! Is it presently being tested? And, if so, how? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

----------------------------- Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 [email protected]

nagisa · 2018-04-19T04:51:52Z

The grammar that's within reference (the markdown files) is way overdue for an update and you are not the first to notice it. In fact -- this RFC happened specifically because of it. Replacing the grammar definitions in the reference markdown files with something else (such as railroads) would be super awesome, but before we can do that we need a complete grammar definition anyway. On Thu, Apr 19, 2018, 00:10 Willy Svenningsson <[email protected]> wrote:

…

Hej Niko The project I’m referencing is my project #30942 (comment) My focus is to keep the possibility to generate railroad view from grammars of kind EBNF. And have the verification function there (Unused, missing rules etc) At the moment I’m investing some RUST grammar approaches 1. The old yacc from RUST source #30942 (comment) 2. From some md files in the Rust nursery https://github.com/rust-lang-nursery/reference/tree/master/src 3. From the work by Jason Orendorff. I think he said who wrote the grammar in order to understand RUST when writing the book Programming Rust. What a grammar and what a book!! https://github.com/jorendorff/rust-grammar/blob/master/Rust.g4 4. and from nikos older? rusty-pop https://github.com/nikomatsakis/rustypop/blob/master/src/rusty.lalrpop 5. but also LALRPOP https://github.com/lalrpop/lalrpop A. In most sources I struggled with character set descriptions and all the regeexpr. So I introduced, in my EBNF, 'Character Set Expressions' with set, union, difference and range. It will show up soon. B. I think Mark Down ’marking’ is insufficient for a BNF. Perhaps generating plain HTML snippets from rules. They can be styled looking like program source. Color, font etc C. EBNF decorated with types is a must. I’m not there yet D. Actions in the rules like {} in yacc and => in LARPOP could perhaps be specified in the EBNF as a (parametric) reference to other source like pub Expr: Box<Expr> = { Expr ExprOp Factor @action(ActionName,Expr), Factor, }; So the @ or similar unused sign could be an element in the syntax. Clean up grammar from actions and relate them using ’@' instead. The intersection of EBNF and LALRPOP could be ... E. Or perhaps the best. Have a tool exporting from LALRPOP as EBNF (with type) And the continue with generation railroad and markdown files from that source. I’m really sorry for working mostly off road. I will probably not update the above bnf2railroad but add a heavy refactored one as an other project > On 2018-04-17, at 12:35, Niko Matsakis ***@***.***> wrote: > > @willy610 > > The status of my product is stable and no bigger issues more than proper UNICOE/UTF processing remains. > > Sorry for disappearing! After writing those messages, I got totally overwhelmed (and this week i'm actually on vacation, so I'll probably be slow to reply again.) I'm a bit confused about which project you are referring to -- do you mean the railroad diagrams you referenced here? If so, that seems like a cool visualization technique, but presumably it requires an EBNF grammar to start? At the moment, that last part is what I am most interesting in obtaining; note though that if we had a working LALRPOP grammar, it "desugars down" to a plain CFG internally, so we ought to be able to use your tool to visualize it. > > @ehuss > > The people working on the reference have been making great progress, and the up-to-date grammar...appears to be getting closer to complete > > That's great! Is it presently being tested? And, if so, how? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. > ----------------------------- Willy Svenningsson Johannas Vag 28 S-425 42 Hisings Karra Sweden +46 0768 22 20 26 ***@***.*** — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30942 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApc0klOHg4m2tWYN9CZhGDnfrqNIprlks5tp6vSgaJpZM4HF_MM> .

ehuss · 2018-04-19T06:12:33Z

That's great! Is it presently being tested? And, if so, how?

I don't know, I don't think there is anything formal set up. I think @brauliobz is doing most of the work, and he once mentioned that he was using Antlr4 to test.

harpocrates · 2018-04-19T06:25:44Z

@nikomatsakis I'm sorry for the long overdue response.

@harpocrates looking at your parser, I was curious, how have you tested it? I see a few tests in the repository, but I was wondering if you tested against e.g. the sources of the Rust compiler, or crates.io, or something like that.

I have a script that automatically scrapes large files from repos under the rust-lang organization. I use rustc -Z ast-json -Z parse-only as an oracle to tell me if my parser's output is correct. Any difference in AST outputs causes tests to fail. Last time I ran the script, it collected (and ran tests) on upwards of a million lines of Rust code.

The next priority would be a testing harness. It's rather late here so I'll have to write more later but I'd love to collaborate, particularly since my personal time is quite limited.

I should have more free time in the coming months and I'd be happy to help on whatever you guys need the most help on (porting over a concrete grammar, generally improving LALRPOP, etc.)

steveklabnik · 2018-05-28T16:28:32Z

Triage: we have a grammar.md, but the intention is to move it into the reference.

steveklabnik · 2019-01-08T21:07:39Z

Triage: we have a grammar WG working on the grammar now, and their work will end up in the reference.

Is this issue still worth keeping open?

pmatos · 2019-01-09T06:52:11Z

@steveklabnik excellent. is there a way to keep up with how things are going with regards to the work the grammar wg is doing?

glaebhoerl · 2019-01-09T08:48:47Z

www.github.com/rust-lang-nursery/wg-grammar

DevQps · 2019-08-08T21:27:35Z

@steveklabnik Since there is now a dedicated repository and nobody has responded since January 8, I guess we can close this issue now right?

steveklabnik · 2019-08-09T12:37:21Z

Yep!

steveklabnik added A-docs B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. labels Jan 15, 2016

steveklabnik mentioned this issue Jan 15, 2016

Specification and Grammar incomplete with regards to lexing rules #24272

Closed

nagisa added the A-grammar Area: The grammar of Rust label Jan 18, 2016

nikomatsakis mentioned this issue Jan 31, 2016

Declarative precedence declarations lalrpop/lalrpop#67

Open

Mark-Simulacrum removed the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 27, 2017

chordowl mentioned this issue Aug 23, 2017

Bison grammar is outdated #32723

Closed

bleibig mentioned this issue Oct 9, 2017

Update grammar to parse current rust syntax #45125

Merged

ehuss mentioned this issue Apr 25, 2018

add deconstructor in function argument rust-lang/rust-enhanced#245

Closed

ehuss mentioned this issue May 26, 2018

Syntax Highlighting Improvements rust-lang/rust-enhanced#284

Open

33 tasks

ehuss mentioned this issue Aug 13, 2018

Syntax Diagrams rust-lang/reference#398

Open

ehuss mentioned this issue Oct 11, 2018

Post a list of useful/interesting links? rust-lang/wg-grammar#8

Closed

steveklabnik closed this as completed Aug 9, 2019

mattheww mentioned this issue Mar 4, 2024

Output of the lexer rust-lang/spec#42

Open

Canonical Rust grammar distinct from parser (tracking issue for RFC #1331) #30942

Canonical Rust grammar distinct from parser (tracking issue for RFC #1331) #30942

Comments

nikomatsakis commented Jan 15, 2016

steveklabnik commented Jan 26, 2016

SimonSapin commented Jan 27, 2016

steveklabnik commented Jan 27, 2016

matklad commented Jan 27, 2016

nikomatsakis commented Jan 27, 2016

nikomatsakis commented Jan 28, 2016

fhahn commented Jan 28, 2016

nagisa commented Jan 28, 2016

nikomatsakis commented Jan 31, 2016

nikomatsakis commented Feb 1, 2016

glaebhoerl commented Feb 11, 2016

eternaleye commented Feb 12, 2016

nikomatsakis commented Feb 22, 2016

nikomatsakis commented Feb 22, 2016

nikomatsakis commented Feb 22, 2016

matklad commented May 17, 2016

nagisa commented May 17, 2016

matklad commented May 17, 2016 • edited Loading

jorendorff commented Aug 9, 2016

DemiMarie commented Aug 10, 2016

alexcrichton commented Aug 22, 2016

matklad commented Sep 17, 2016

nagisa commented Sep 17, 2016

matklad commented Sep 17, 2016

dns2utf8 commented Sep 17, 2016

willy610 commented Jan 19, 2018

nikomatsakis commented Mar 29, 2018

harpocrates commented Mar 29, 2018

nikomatsakis commented Mar 29, 2018

nikomatsakis commented Mar 29, 2018 • edited Loading

willy610 commented Mar 30, 2018 via email

willy610 commented Mar 31, 2018 via email

ehuss commented Mar 31, 2018

nikomatsakis commented Apr 17, 2018

willy610 commented Apr 18, 2018 via email

nagisa commented Apr 19, 2018 via email

ehuss commented Apr 19, 2018

harpocrates commented Apr 19, 2018

steveklabnik commented May 28, 2018

steveklabnik commented Jan 8, 2019

pmatos commented Jan 9, 2019

glaebhoerl commented Jan 9, 2019

DevQps commented Aug 8, 2019

steveklabnik commented Aug 9, 2019

matklad commented May 17, 2016 •

edited

Loading

nikomatsakis commented Mar 29, 2018 •

edited

Loading