-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Pest Grammar to Ion Conversion #38
Conversation
The purpose of this commit is the start of a tool to generate Ion formatted ASTs from Pest grammars, particularly that of PartiQL. We could then easily process the grammar outside of Rust in scripts that could be used to generate things like TextMate grammars for IDE support or LaTeX fragments for the specification document. Adds the `pestion` crate to provide a simple trait `PestToElement` and implementations over the Pest AST and `&str` syntax definitions to create Ion `Element` serialized forms. Resolves partiql#35.
Codecov Report
@@ Coverage Diff @@
## main #38 +/- ##
==========================================
- Coverage 90.66% 89.42% -1.24%
==========================================
Files 7 11 +4
Lines 482 700 +218
==========================================
+ Hits 437 626 +189
- Misses 45 74 +29
Continue to review full report at Codecov.
|
pestion/Cargo.toml
Outdated
@@ -0,0 +1,29 @@ | |||
[package] | |||
name = "pestion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this is a pun/reference that I'm missing, I suggest naming this pest-ion
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this offline, but the idea here was just a portmanteau of Pest and Ion into something that is pronounced like "question". I was trying to avoid a hyphenated name, so let's get a second opinion from someone like @dlurton / @alancai98 / @abhikuhikar to see if they don't like the name and suggest any alternatives.
If we do like it, I will add something to the README. If we don't I can refactor the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight preference to pest-ion
so it doesn't get too closely associated with something David sent offline: https://www.pestions.com/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility is pest2ion
. It avoids the hyphen and any association with that pest control website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the consensus is pest-ion
so I will refactor it. The problem with pest2ion
is that it will look very jarring as a crate (e.g. use pest2ion
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to pest-ion
.
pestion/src/lib.rs
Outdated
//! // parse a Pest grammar and convert it to Ion element | ||
//! let element = r#"a = @{ "a" | "b" ~ "c" }"#.pest_to_element()?; | ||
//! | ||
//! // the grammar is a struct containing a field for each rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These examples are fine, but I think it would be more immediately illuminating to dump the element
to pretty Ion text so folks could see how it's being modeled.
I'm not sure what the most idiomatic way to show that in a doc comment is. With a println!
and a comment showing the expected output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can replicate this by parsing an element from Ion text and comparing the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
pestion/src/lib.rs
Outdated
(text_token("type"), self.ty.pest_to_element()?), | ||
(text_token("expression"), self.expr.pest_to_element()?), | ||
]; | ||
Ok(Self::Element::new_struct(fields)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element::new_struct
takes an iterator:
/// Builds a `struct` from Struct using Builder.
fn new_struct<K, V, I>(structure: I) -> Self::Element
where
K: Into<Self::SymbolToken>,
V: Into<Self::Element>,
I: IntoIterator<Item = (K, V)>;
You could make fields
into a slice to save yourself an allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not make it into a slice because I need ownership of the pair (I would get a &(_, _)
) that we construct the struct
with (without a clone), However, I am able to write it as:
let fields = std::array::IntoIter::new([
(text_token("type"), self.ty.pest_to_element()?),
(text_token("expression"), self.expr.pest_to_element()?),
]);
Ok(Self::Element::new_struct(fields))
Note that the explicit IntoIter::new
is needed because sadly, arrays don't implement IntoIter
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The explicit IntoIter::new
is a bit unfortunate, but happily it looks like arrays will get a stable implementation of IntoIterator
in the next Rust release (v1.53), which I believe happens in ~2 weeks.
pestion/src/lib.rs
Outdated
Expr::Str(text) => vec![ | ||
text_token("string").into(), | ||
text_token("exact").into(), | ||
String(text).into(), | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Element::new_sexp
takes an iterator:
/// Builds a `sexp` from Sequence using Builder.
fn new_sexp<I: IntoIterator<Item = Self::Element>>(seq: I) -> Self::Element;
so each of these vec![]
s could be a slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this is challenging because the slice iterator will yield &OwnedElement
which is not compatible to Iterator<Item = OwnedElement>
, I could rewrite this using ArrayVec
/SmallVec
, but I don't know if that is worth it. What do you think about that, is there something I am missing with the slice suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a factoring with SmallVec
, could not do it with ArrayVec
because of the limitations in that data type (specifically Copy
requirements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 783de7f
pestion/src/lib.rs
Outdated
{ | ||
a: { | ||
type: non_atomic, | ||
expression: (predicate positive (identifier "b")) | ||
}, | ||
b: { | ||
type: non_atomic, | ||
expression: (predicate negative (string exact "hi")) | ||
} | ||
}"# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is to generate BNF, it would make sense output that in the same order they were defined in the pest grammar, but structs are unordered... so... just wondering how you would control the ordering of the production rules in your BNF in that case.
I guess one could work around this by using the reader API of their favorite Ion library, but in the case of an element API the order may not be preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rules are not ordered and PEG (and BNF) rules are declarative. I could've modeled this as a list
of rules where the name was a field in the struct, but I hoisted them up to the top-level because I figured that made more logical sense.
In the above example, it shouldn't matter what the order of a
and b
, the meaning is the same.
pestion/src/lib.rs
Outdated
(sequence | ||
(sequence | ||
(string exact "a") | ||
(string insensitive "b") | ||
) | ||
(string exact "c") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be flattened?
i.e. something like:
(sequence (string exact "a") (string insensitive "b") (string exact "c"))
It seems like that would be a more useful representation of this production rule. I'm guess not sure why the nesting is needed. The BNF would be flattened, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a TODO (see #37).
As per feedback from @zslayton. Renames `PestToElement` to `TryPestToElement` which captures the fallible conversion (e.g. from `str` slices) and makes `PestToElement` be an infallible version of the API. This is a more correct factoring, but also needed to avoid the intermediate copy for `Vec<AstNode>` conversion. Changes `AstRule` structure generation to use an array as it has a fixed number of fields. Adds `smallvec` to make it easier to model dynamically sized yet stack allocated fields for sub-structures for `Expr` conversion. Adjusts the doc test to be more illustrative.
Updates surrounding meta-data and README. Also renames the following types: * `PestionResult` => `PestToIonResult` * `PestionError` => `PestToIonError`
The purpose of this is the beginning of a tool to generate Ion ASTs for Pest grammars. This leverages the Pest AST in the
pest_meta
crate to construct a reasonable Ion representation. The idea is that this could be bridged to a CLI and then piped into a script to generate things like TextMate grammars and the LaTeX fragments for our specification.Adds the
pestion
crate to provide a simple traitPestToElement
andimplementations over the Pest AST and
&str
syntax definitions tocreate Ion
Element
serialized forms.Resolves #35.
Here is a rendering of the doc comment example:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.