Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Pest Grammar to Ion Conversion #38

Merged
merged 3 commits into from
Jun 8, 2021
Merged

Adds Pest Grammar to Ion Conversion #38

merged 3 commits into from
Jun 8, 2021

Conversation

almann
Copy link
Contributor

@almann almann commented Jun 7, 2021

The purpose of this is the beginning of a tool to generate Ion ASTs for Pest grammars. This leverages the Pest AST in the pest_meta crate to construct a reasonable Ion representation. The idea is that this could be bridged to a CLI and then piped into a script to generate things like TextMate grammars and the LaTeX fragments for our specification.

Adds the pestion crate to provide a simple trait PestToElement and
implementations over the Pest AST and &str syntax definitions to
create Ion Element serialized forms.

Resolves #35.

Here is a rendering of the doc comment example:

image

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The purpose of this commit is the start of a tool to generate Ion
formatted ASTs from Pest grammars, particularly that of PartiQL.
We could then easily process the grammar outside of Rust in scripts that
could be used to generate things like TextMate grammars for IDE support
or LaTeX fragments for the specification document.

Adds the `pestion` crate to provide a simple trait `PestToElement` and
implementations over the Pest AST and `&str` syntax definitions to
create Ion `Element` serialized forms.

Resolves partiql#35.
@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #38 (5ae9e61) into main (e08a7cf) will decrease coverage by 1.23%.
The diff coverage is 89.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #38      +/-   ##
==========================================
- Coverage   90.66%   89.42%   -1.24%     
==========================================
  Files           7       11       +4     
  Lines         482      700     +218     
==========================================
+ Hits          437      626     +189     
- Misses         45       74      +29     
Impacted Files Coverage Δ
pest-ion/src/result.rs 10.00% <10.00%> (ø)
pest-ion/src/lib.rs 94.04% <94.04%> (ø)
partiql-parser/src/result.rs 79.31% <0.00%> (-0.41%) ⬇️
partiql-parser/src/peg/gen.rs 100.00% <0.00%> (ø)
partiql-parser/src/peg/mod.rs 77.27% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6fa75b...5ae9e61. Read the comment docs.

@@ -0,0 +1,29 @@
[package]
name = "pestion"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this is a pun/reference that I'm missing, I suggest naming this pest-ion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted about this offline, but the idea here was just a portmanteau of Pest and Ion into something that is pronounced like "question". I was trying to avoid a hyphenated name, so let's get a second opinion from someone like @dlurton / @alancai98 / @abhikuhikar to see if they don't like the name and suggest any alternatives.

If we do like it, I will add something to the README. If we don't I can refactor the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference to pest-ion so it doesn't get too closely associated with something David sent offline: https://www.pestions.com/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is pest2ion. It avoids the hyphen and any association with that pest control website.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the consensus is pest-ion so I will refactor it. The problem with pest2ion is that it will look very jarring as a crate (e.g. use pest2ion).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pest-ion.

//! // parse a Pest grammar and convert it to Ion element
//! let element = r#"a = @{ "a" | "b" ~ "c" }"#.pest_to_element()?;
//!
//! // the grammar is a struct containing a field for each rule
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples are fine, but I think it would be more immediately illuminating to dump the element to pretty Ion text so folks could see how it's being modeled.

I'm not sure what the most idiomatic way to show that in a doc comment is. With a println! and a comment showing the expected output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can replicate this by parsing an element from Ion text and comparing the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 783de7f

(text_token("type"), self.ty.pest_to_element()?),
(text_token("expression"), self.expr.pest_to_element()?),
];
Ok(Self::Element::new_struct(fields))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Element::new_struct takes an iterator:

    /// Builds a `struct` from Struct using Builder.
    fn new_struct<K, V, I>(structure: I) -> Self::Element
    where
        K: Into<Self::SymbolToken>,
        V: Into<Self::Element>,
        I: IntoIterator<Item = (K, V)>;

You could make fields into a slice to save yourself an allocation.

Copy link
Contributor Author

@almann almann Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not make it into a slice because I need ownership of the pair (I would get a &(_, _)) that we construct the struct with (without a clone), However, I am able to write it as:

        let fields = std::array::IntoIter::new([
            (text_token("type"), self.ty.pest_to_element()?),
            (text_token("expression"), self.expr.pest_to_element()?),
        ]);
        Ok(Self::Element::new_struct(fields))

Note that the explicit IntoIter::new is needed because sadly, arrays don't implement IntoIter...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 783de7f

Copy link

@zslayton zslayton Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The explicit IntoIter::new is a bit unfortunate, but happily it looks like arrays will get a stable implementation of IntoIterator in the next Rust release (v1.53), which I believe happens in ~2 weeks.

Comment on lines 138 to 142
Expr::Str(text) => vec![
text_token("string").into(),
text_token("exact").into(),
String(text).into(),
],
Copy link

@zslayton zslayton Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Element::new_sexp takes an iterator:

    /// Builds a `sexp` from Sequence using Builder.
    fn new_sexp<I: IntoIterator<Item = Self::Element>>(seq: I) -> Self::Element;

so each of these vec![]s could be a slice.

Copy link
Contributor Author

@almann almann Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this is challenging because the slice iterator will yield &OwnedElement which is not compatible to Iterator<Item = OwnedElement>, I could rewrite this using ArrayVec/SmallVec, but I don't know if that is worth it. What do you think about that, is there something I am missing with the slice suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a factoring with SmallVec, could not do it with ArrayVec because of the limitations in that data type (specifically Copy requirements).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 783de7f

Comment on lines 271 to 280
{
a: {
type: non_atomic,
expression: (predicate positive (identifier "b"))
},
b: {
type: non_atomic,
expression: (predicate negative (string exact "hi"))
}
}"#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is to generate BNF, it would make sense output that in the same order they were defined in the pest grammar, but structs are unordered... so... just wondering how you would control the ordering of the production rules in your BNF in that case.

I guess one could work around this by using the reader API of their favorite Ion library, but in the case of an element API the order may not be preserved.

Copy link
Contributor Author

@almann almann Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules are not ordered and PEG (and BNF) rules are declarative. I could've modeled this as a list of rules where the name was a field in the struct, but I hoisted them up to the top-level because I figured that made more logical sense.

In the above example, it shouldn't matter what the order of a and b, the meaning is the same.

Comment on lines 289 to 295
(sequence
(sequence
(string exact "a")
(string insensitive "b")
)
(string exact "c")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be flattened?

i.e. something like:

(sequence (string exact "a") (string insensitive "b") (string exact "c"))

It seems like that would be a more useful representation of this production rule. I'm guess not sure why the nesting is needed. The BNF would be flattened, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a TODO (see #37).

As per feedback from @zslayton.

Renames `PestToElement` to `TryPestToElement` which captures the
fallible conversion (e.g. from `str` slices) and makes `PestToElement`
be an infallible version of the API.  This is a more correct factoring,
but also needed to avoid the intermediate copy for `Vec<AstNode>`
conversion.

Changes `AstRule` structure generation to use an array as it has a fixed
number of fields.

Adds `smallvec` to make it easier to model dynamically sized yet stack
allocated fields for sub-structures for `Expr` conversion.

Adjusts the doc test to be more illustrative.
@almann
Copy link
Contributor Author

almann commented Jun 8, 2021

Updates surrounding meta-data and README.

Also renames the following types:
* `PestionResult` => `PestToIonResult`
* `PestionError` => `PestToIonError`
@almann almann merged commit fb2628f into partiql:main Jun 8, 2021
@almann almann deleted the pestion branch June 8, 2021 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Pest Syntax to Ion Conversion
5 participants