-
-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The road to a custom parser #1363
Comments
I plan on starting with Step1 some time next week ;) let me know if I should add anythin. |
Some naming suggestions to closer conform to what I think are standard-ish names (feel free to ignore any of these an use your best judgement).
And maybe missing
Forgive any lack of context on my part for these suggestions. Very much like the direction of this. |
@pipermerriam definitely keen to renaming the classes to be more sensible (and less abbreviated) ! Will add link to this on Step 1 (or maybe will make it part of Step 2, to make the PR smaller - will see). Side Note: |
@pipermerriam Yeah, I would expect else to be a property on an |
I think it is noted, but exposing an API to get the AST for a given source program would be quite handy for Step 1. For Step 3, my wishlist might include being able to "extend" the Vyper module by inserting tools "in between" stages of the compiler, such as advanced static analysis on the AST that may be able to reduce the AST nodes through some custom rewrite rules, then inserting the optimized AST back into the next stage of the compiler (code translation to IR). Similar idea for the IR as well (for example, symbolic analysis-based optimizations that exhaust all possible paths and remove assertion checks as impossible) |
@pipermerriam Also, not sure if you noticed but I think @jacqueswww chose those names largely based on the naming scheme in the python standard |
👍 on ignoring my name suggestions, or only using the ones you like. I don't have strong feelings on any of them. |
I've begun some experimental work on a parser using lark. I've got it hosted here for the time being. It might also be wise to create a CST parser using TatSu since that seems like a very good potential alternative to lark. However, I'm using lark for now since it was easier to get up and going with it. I've currently got a working parser that parses code files into a CST. I've begun adding some custom AST classes so that I have something to convert the CST into. After that's done, some routines will need to be written to convert the CST into an AST represented with AST class instances. This file in the cpython implementation should act as a good guide for that: https://github.com/python/cpython/blob/master/Python/ast.c . |
Also, a note about my repo that I linked to. It's really just meant to act as a place to store code that I'm hacking on. So there are no tests and no real structure to the repo. So don't freak out :). |
Custom Vyper AST classes in the As a quick aside, Python's ASDL definitions are found here. CPython uses a specific grammar to define its collection of AST classes. The C implementation parses the ASDL grammar and uses it to create all the AST classes using the C API. Next, I'll work on defining some routines to convert the lark parse tree into an AST composed of our custom AST classes. Currently, I'm targeting the entire Python AST. I think this will be useful in verifying the correctness of the implementation. Our test routines can just compare the results of Python's native |
I went ahead and migrated my code into a proper project (found here). I don't necessarily intend to have this work exist in a separate package, but I wanted to take advantage of our project template test rigging. I added a test that automatically generates fixtures from python files in the standard library. |
Here's a fun development from today. For the string literal parsing portion of our custom parser, I'm going to just defer to python's parsing facilities. Parsing string literals is pretty complicated and I think we can safely just use python's parser and convert the result to a tree of our custom AST classes. In order to do this, I needed to write some code to convert python AST trees into trees of our Vyper-specific AST classes. Here it is: https://github.com/davesque/vyper-parser/blob/master/vyper_parser/ast.py#L59 It passes tests for all python files in my standard lib directory: https://github.com/davesque/vyper-parser/blob/master/tests/test_ast.py#L21 This is something we could potentially begin using today if we're interested in converting Vyper to using custom AST classes. Although I'd probably recommend holding off a little longer in case I want to change some things before committing to the API that I've come up with. |
The |
@davesque very cool! It's coming together nicely, I like the layout of the VyperASTs - I suggest we adopt those, but we start with empty the |
Added a comment to https://gist.github.com/jacqueswww/aa24d22d95a578429e0f25f5f2de0b36 which I won't duplicate here. This gist is to follow solc's AST where it makes sense (e.g. in how to specify a location in a general, flexible and compact way), that maybe some thought should be given to how this is specified when writing to JSON so as to be compatible again with solc's JSON AST. |
@rocky this id field is unique per node? Or can nodes share ID's, if not I am not sure how would one would make the deterministic - unless there is a global auto increment counter on each file compiled (global context in our code base). |
@jacqueswww I think all of that is clarified in the link that @rocky posted in his gist comment: https://solidity.readthedocs.io/en/develop/miscellaneous.html#source-mappings |
Really cool extra for |
Thanks to @iamdefinitelyahuman for helping us achieve Step 2 here! |
|
I think this is a worthwhile investment. Many of the issues I've opened recently have to do with trying to use the Python AST while supporting syntax in Vyper that is semantically different than the node produced by the Python parser. I think it would simplify the compiler by making AST nodes single purpose, removing the need to interoperate with changes to the Python AST. It would also mitigate issues like #3475 since a custom parser would require types and not backfill the AST of a dynamically type language with type annotations. I would imagine there's also the added benefit of no longer having to worry about anything related to Python when discussing features and syntax. |
Definitely something we want to do, we have a Lark grammar that's been tested and a part of the core now for over a year. We haven't had issues with it for a while. The biggest problem with custom grammar is that we would get a ton of parsing issues grieving the users, the python ast solves that by being extremely well used and debugged. Hopefully could work on switching over to it as a part of a major revision What's still missing is that even though the parser parses reliably, we have not yet done the work of having it produces Vyper AST nodes or checked for equivalency with the current parser That would be next steps before we can swap it in |
This is a double edged sword, really. Actually the main reason I haven't gotten rid of the existing parser flow so far is not technical, it's that in some sense it "keeps us honest" -- we can't really break away from python syntax too easily. |
The road to a custom parser
As we have discussed this at lengths as parts of weekly calls, this is the proposed road map to swap out the current python standard library. Also lots of discussion on github (see #563).
Step 1
Create own custom AST node classes to represent our AST:
ast
classes and translate the to VyperAST classes.Step 2
After having our own AST classes (and improving readability by seeing where nodes originate from).
Step 3
Implement other parses that produce or translate to VyperAST classes.
The text was updated successfully, but these errors were encountered: