The road to a custom parser #1363

jacqueswww · 2019-03-20T15:18:11Z

The road to a custom parser

As we have discussed this at lengths as parts of weekly calls, this is the proposed road map to swap out the current python standard library. Also lots of discussion on github (see #563).

Step 1
Create own custom AST node classes to represent our AST:
- Quick example located at https://gist.github.com/jacqueswww/aa24d22d95a578429e0f25f5f2de0b36
- This is great for readibility and understanding what is available.
- We should only add needed fields and/or attributes support, this way no existing python syntax can slip through (*args/**kwargs for example).
- Add a translate for taking the python ast classes and translate the to VyperAST classes.
- Renaming classes to be less abbreviated (can bestep 1 or step2) (see The road to a custom parser #1363 (comment))
Step 2
After having our own AST classes (and improving readability by seeing where nodes originate from).
- create serializer of our AST nodes, preferably JSON (both encode & decode)
- add type annotation to each class (where appropriate)
- create a type checking step straight on the VyperAST nodes
- add any other needed AST functionality on classes directly
Step 3
Implement other parses that produce or translate to VyperAST classes.
- This is quite open but allows us to choose any type of parsing library or parser generator at this point, without making vyper code generator dependant on the library.

The text was updated successfully, but these errors were encountered:

jacqueswww · 2019-03-20T15:18:41Z

I plan on starting with Step1 some time next week ;) let me know if I should add anythin.

pipermerriam · 2019-03-20T15:25:45Z

Some naming suggestions to closer conform to what I think are standard-ish names (feel free to ignore any of these an use your best judgement).

GtE -> Ge
Compare -> RelOp
Mult -> Mul
LtE -> Le
NotEq -> Ne

And maybe missing

Else
ElseIf

Forgive any lack of context on my part for these suggestions. Very much like the direction of this.

jacqueswww · 2019-03-20T15:51:03Z

@pipermerriam definitely keen to renaming the classes to be more sensible (and less abbreviated) ! Will add link to this on Step 1 (or maybe will make it part of Step 2, to make the PR smaller - will see).

Side Note: Else is quite interesting, it's all part of If currently using ast.If.orelse .

davesque · 2019-03-20T20:11:24Z

@pipermerriam Yeah, I would expect else to be a property on an If ast object; rather like body is a list of statements on a function definition object.

fubuloubu · 2019-03-20T21:43:58Z

I think it is noted, but exposing an API to get the AST for a given source program would be quite handy for Step 1.

For Step 3, my wishlist might include being able to "extend" the Vyper module by inserting tools "in between" stages of the compiler, such as advanced static analysis on the AST that may be able to reduce the AST nodes through some custom rewrite rules, then inserting the optimized AST back into the next stage of the compiler (code translation to IR). Similar idea for the IR as well (for example, symbolic analysis-based optimizations that exhaust all possible paths and remove assertion checks as impossible)

davesque · 2019-03-20T22:11:58Z

@pipermerriam Also, not sure if you noticed but I think @jacqueswww chose those names largely based on the naming scheme in the python standard ast library.

pipermerriam · 2019-03-20T22:22:12Z

👍 on ignoring my name suggestions, or only using the ones you like. I don't have strong feelings on any of them.

davesque · 2019-03-21T22:11:15Z

I've begun some experimental work on a parser using lark. I've got it hosted here for the time being. It might also be wise to create a CST parser using TatSu since that seems like a very good potential alternative to lark. However, I'm using lark for now since it was easier to get up and going with it.

I've currently got a working parser that parses code files into a CST. I've begun adding some custom AST classes so that I have something to convert the CST into. After that's done, some routines will need to be written to convert the CST into an AST represented with AST class instances. This file in the cpython implementation should act as a good guide for that: https://github.com/python/cpython/blob/master/Python/ast.c .

davesque · 2019-03-21T22:14:18Z

Also, a note about my repo that I linked to. It's really just meant to act as a place to store code that I'm hacking on. So there are no tests and no real structure to the repo. So don't freak out :).

davesque · 2019-03-23T19:36:25Z

Custom Vyper AST classes in the vyper_parser repo are done (see here). Currently, they attempt to mirror the naming scheme and structure of the ASDL-derived Python AST classes as precisely as possible.

As a quick aside, Python's ASDL definitions are found here. CPython uses a specific grammar to define its collection of AST classes. The C implementation parses the ASDL grammar and uses it to create all the AST classes using the C API.

Next, I'll work on defining some routines to convert the lark parse tree into an AST composed of our custom AST classes. Currently, I'm targeting the entire Python AST. I think this will be useful in verifying the correctness of the implementation. Our test routines can just compare the results of Python's native ast.parse function to our custom parser for all python files in the standard library. After this is done, we can cut back the definition to just what we need.

davesque · 2019-03-26T00:23:42Z

I went ahead and migrated my code into a proper project (found here). I don't necessarily intend to have this work exist in a separate package, but I wanted to take advantage of our project template test rigging. I added a test that automatically generates fixtures from python files in the standard library.

davesque · 2019-03-28T21:59:52Z

Here's a fun development from today. For the string literal parsing portion of our custom parser, I'm going to just defer to python's parsing facilities. Parsing string literals is pretty complicated and I think we can safely just use python's parser and convert the result to a tree of our custom AST classes. In order to do this, I needed to write some code to convert python AST trees into trees of our Vyper-specific AST classes. Here it is: https://github.com/davesque/vyper-parser/blob/master/vyper_parser/ast.py#L59

It passes tests for all python files in my standard lib directory: https://github.com/davesque/vyper-parser/blob/master/tests/test_ast.py#L21

This is something we could potentially begin using today if we're interested in converting Vyper to using custom AST classes. Although I'd probably recommend holding off a little longer in case I want to change some things before committing to the API that I've come up with.

davesque · 2019-03-28T22:08:06Z

The VyperAST.from_python_ast method I just mentioned has one potential issue at this point. It automatically infers the proper VyperAST subclass to use for an instance of python's standard ast.AST subclasses. It does this by automatically collecting all subclasses of VyperAST. This means that someone could potentially create one of their own subclasses of VyperAST and get weird behavior when it happens to shadow an existing class name. This can be fixed by just explicitly creating a mapping from python AST classes to vyper AST classes. Just wanted to make a note of that here.

jacqueswww · 2019-03-28T22:42:43Z

@davesque very cool! It's coming together nicely, I like the layout of the VyperASTs - I suggest we adopt those, but we start with empty the _fields on each classes init. Then we only handle the fields we need :)

rocky · 2019-04-01T14:44:16Z

Added a comment to https://gist.github.com/jacqueswww/aa24d22d95a578429e0f25f5f2de0b36 which I won't duplicate here. This gist is to follow solc's AST where it makes sense (e.g. in how to specify a location in a general, flexible and compact way), that maybe some thought should be given to how this is specified when writing to JSON so as to be compatible again with solc's JSON AST.

jacqueswww · 2019-04-01T17:54:13Z

@rocky this id field is unique per node? Or can nodes share ID's, if not I am not sure how would one would make the deterministic - unless there is a global auto increment counter on each file compiled (global context in our code base).

davesque · 2019-04-01T17:57:00Z

@jacqueswww I think all of that is clarified in the link that @rocky posted in his gist comment: https://solidity.readthedocs.io/en/develop/miscellaneous.html#source-mappings

fubuloubu · 2019-04-04T22:41:24Z

Really cool extra for hypothesis that can generate strategies for a lark grammar: https://hypothesis.readthedocs.io/en/latest/extras.html#hypothesis-lark

fubuloubu · 2020-07-15T15:55:49Z

Thanks to @iamdefinitelyahuman for helping us achieve Step 2 here!

jacqueswww · 2020-07-16T09:45:13Z

Thanks to @iamdefinitelyahuman for helping us achieve Step 2 here!

0xalpharush · 2023-09-09T20:14:44Z

I think this is a worthwhile investment. Many of the issues I've opened recently have to do with trying to use the Python AST while supporting syntax in Vyper that is semantically different than the node produced by the Python parser. I think it would simplify the compiler by making AST nodes single purpose, removing the need to interoperate with changes to the Python AST. It would also mitigate issues like #3475 since a custom parser would require types and not backfill the AST of a dynamically type language with type annotations. I would imagine there's also the added benefit of no longer having to worry about anything related to Python when discussing features and syntax.

fubuloubu · 2023-09-09T20:29:19Z

Definitely something we want to do, we have a Lark grammar that's been tested and a part of the core now for over a year. We haven't had issues with it for a while.

The biggest problem with custom grammar is that we would get a ton of parsing issues grieving the users, the python ast solves that by being extremely well used and debugged. Hopefully could work on switching over to it as a part of a major revision

What's still missing is that even though the parser parses reliably, we have not yet done the work of having it produces Vyper AST nodes or checked for equivalency with the current parser

That would be next steps before we can swap it in
(Taking any contributors)

charles-cooper · 2023-09-10T15:24:22Z

I would imagine there's also the added benefit of no longer having to worry about anything related to Python when discussing features and syntax.

This is a double edged sword, really. Actually the main reason I haven't gotten rid of the existing parser flow so far is not technical, it's that in some sense it "keeps us honest" -- we can't really break away from python syntax too easily.

jacqueswww added enhancement work in progress Work on this PR or issue is not yet complete but reviewers are free to add their input for guidance. labels Mar 20, 2019

This was referenced Mar 25, 2019

Meeting 25th March 2019 #1341

Closed

Codebase Hardening Strategy #1360

Closed

leonprou mentioned this issue Apr 1, 2019

[WIP] Vyper support crytic/slither#191

Closed

jacqueswww mentioned this issue Apr 4, 2019

VIP: Custom Parser #563

Closed

jacqueswww mentioned this issue Apr 4, 2019

1363 vyper ast #1382

Merged

fubuloubu mentioned this issue Jul 15, 2020

Standalone Binary #1953

Closed

3 tasks

fubuloubu mentioned this issue Nov 5, 2020

Use Vyper AST vyperlang/blackadder#7

Open

fubuloubu mentioned this issue Feb 1, 2021

VIP: Structured AST Output #2276

Open

charles-cooper mentioned this issue Oct 20, 2021

Meeting November 01 2021 #2492

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The road to a custom parser #1363

The road to a custom parser #1363

jacqueswww commented Mar 20, 2019 •

edited by fubuloubu

Loading

jacqueswww commented Mar 20, 2019

pipermerriam commented Mar 20, 2019

jacqueswww commented Mar 20, 2019 •

edited

Loading

davesque commented Mar 20, 2019

fubuloubu commented Mar 20, 2019

davesque commented Mar 20, 2019

pipermerriam commented Mar 20, 2019

davesque commented Mar 21, 2019 •

edited

Loading

davesque commented Mar 21, 2019

davesque commented Mar 23, 2019 •

edited

Loading

davesque commented Mar 26, 2019

davesque commented Mar 28, 2019

davesque commented Mar 28, 2019

jacqueswww commented Mar 28, 2019

rocky commented Apr 1, 2019

jacqueswww commented Apr 1, 2019

davesque commented Apr 1, 2019 •

edited

Loading

fubuloubu commented Apr 4, 2019

fubuloubu commented Jul 15, 2020

jacqueswww commented Jul 16, 2020

0xalpharush commented Sep 9, 2023

fubuloubu commented Sep 9, 2023

charles-cooper commented Sep 10, 2023

The road to a custom parser #1363

The road to a custom parser #1363

Comments

jacqueswww commented Mar 20, 2019 • edited by fubuloubu Loading