-
-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] "Template"/Macro/Generic rules #261
Comments
CCing some of the linked project authors whose grammar looked at a glance that they might benefit from this (and are currently using the derive grammar): @sunng87 (handlebars-rust), @jturner314 (py_literal), @wahn (rs_pbrt), @Keats (tera), |
I wrote this RFC because I'm currently wishing I had it in my personal project. If we want to write a large language grammar using pest, this re-usability factor is probably much more useful than even #197. |
I like the idea and I think it could potentially be something to be included in 2.0 or 2.1. However, I would prefer a slightly different approach. How about every rule can take arguments like normal functions? The implementation would be a bit more demanding, since the AST would need to be changed to some degree. Monomorphization would also be needed in order to have good performance, probably implemented as an optimization step in @CAD97, would you be willing to take a jab at it once we put everything in order? |
Yep, I can work on an MVP implementation. I like the idea of making every rule a function that takes rule arguments. If we translate this to generics at the Rust level, Rust will take care of the monomorphization pass for us. This is definitely a 2.1 thing rather than a 2.0 blocker, though. Either formulation of |
There were little comma-separated syntax in handlebars, but I think this can be a good addition to pest. Also I'm a fan of a more generic Also how about a |
Sounds like a good idea to me!
+1 on that |
Motivation
One of the advertised features of LALRPOP, macros/templates/generics are a useful tool for factoring out common parts of your grammar. The common example is
CommaSeparated<production>
to representproduction ~ ("," ~ production)* ~ ","?
. (This can also be written(production ~ ",")* ~ production?
, but I prefer the former formulation.)In this RFC I lay out how a design for generic rules might look in pest, and attempt to make a case for their implementation.
A proposal for standard casing
In the 2.0 version of pest, the standard casing sees builtin rules in
SHOUT_CASE
and user rules recommended to be insnake_case
. This RFC proposes that generic rules could beTitleCase
by convention, along with their arguments, to distinguish them from normal rules.Guide-Level Explanation
(Shamelessly adapted from the LALRPOP book, which is licensed MIT/Apache as is LALRPOP itself)
When writing grammars we encounter repetitive constructs that might normally be copy-and-pasted. A common example is something like a "comma-separated list". If we want to parse a comma-separated list of expressions, it might look something like:
But what happens if later we want a comma-separated list of
term
s, or anything else? For this, pest offers generic rules. By using a generic rule, we can factor out this common functionality into one place.Because
CommaSeparated
is marked as a silent rule with a_
, this means this is functionally equivalent to inlining its structure into bothexpressions
andterms
. If a generic rule is not silenced, it will be included in the output structure just like any other rule.Implementation-Level Explanation
There are two ways to handle generic rules. In the first, we treat it as a template, and generate multiple parsing functions for each instantiation. In the second, we pass along the generics to the Rust code.
I will explain via walking through in pseudocode the following example (note that no rules are silent, unlike above):
Template desugaring
For each unique rule that is passed into a generic rule, desugar to a new rule instantiated with the concrete rule(s) passed to the generic rule.
The generated rules do not correspond to unique
Rule
enum variants in the output, however; all generated rules from the same generic rule map to the sameRule
enum variant.Generic implementation
In addition to the parser state, the generated function for parsing this rule takes an argument representing what rule is passed as its generic argument. It then calls said function for any time the generic argument is present in the definition.
Grammar changes
The
terminal
rule is changed to accommodate generic rules:In the future, we may wish to relax this such that a generic rule can take a
term
or evenexpression
instead. Conversely, we may wish to only accept oneterminal
instead of a list to begin with.Prior Art
Unresolved Questions
Separated<Term, By> = { Term ~ (By ~ Term)* ~ By? }
? Do we need to support it?term
orexpression
instead ofterminal
give any more convenience to the user? Any generic rule can be expressed solely by taking a terminal by just defining a silent terminal to be the desired more complicated expression.The text was updated successfully, but these errors were encountered: