Replies: 30 comments 51 replies
-
Grammar Change: simplified handling of whitespaces/commentsSince atomic rules are cascading, it is not immediately obvious if two sequenced expressions a ~ b accept trivia—it wholly depends on whether or not the current rule inherits atomicity. The idea is to make it more explicit by being able to define the infix sequence operator
|
Beta Was this translation helpful? Give feedback.
-
Grammar Change: better reusability of expressions using macro/template/generic rulesThis change is to be able to parametrize rules at definition time. |
Beta Was this translation helpful? Give feedback.
-
Grammar and API Change: token parametrizationThere was also an idea for parametrizing tokens that can be replaced at runtime. |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: module/namespace systempest can now define multiple grammar files: Line 5 in d9bfdde but it is a simple "rule concat“ mechanism and it does not work with pest_vm. The idea is to have modules more akin to Rust modules. This change removes the need for capitalization of built-in rules. As suggested in #333 and #660 :
|
Beta Was this translation helpful? Give feedback.
-
Grammar Change: stack slicing, additional stack operations or alternativespest 2.X has A few issues regarding the stack were posted:
Stack operations are an extra grammar complexity, so the question is whether they are needed for the parsing problems, i.e. whether there are simpler alternatives. For example, for indentation-sensitive languages, one may use two operators for indentation and alignment. The separate lexer may also allow the use of the "lexer trick". |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: grammar versioningEither inside the grammar file directly or in the derivation expression.
|
Beta Was this translation helpful? Give feedback.
-
Grammar Change: small bikesheddingAs suggested in #333 (comment) :
|
Beta Was this translation helpful? Give feedback.
-
Grammar Change: better error-reportingChumsky has a nice And non-token generating symbols in errors can also help: #327 |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: custom hooksAs suggested in #815
For all terms beginning with #[derive(Parser)]
#[grammar = "../examples/hook.pest"]
#[custom_state(crate::parser::CustomState)]
pub struct Parser;
pub struct CustomState {
pub max_int_visited: usize,
}
impl Parser {
#[allow(non_snake_case)]
fn hook__HOOK_INT<'a>(state: &mut CustomState, span: Span<'a>) -> bool {
let val: usize = span.as_str().parse().unwrap();
if val >= state.max_int_visited {
state.max_int_visited = val;
true
} else {
false
}
}
} The state will need to support snapshot / recovery so that the hook can be placed anywhere. But users can also opt-out recovery and modify their grammar to avoid this situation. Examples in This is powerful, but it is an "escape hatch" that goes against the pest's portability (e.g. how would this work in pest_vm?). |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: supporting left recursionAs suggested in #533 |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: parsing binary datapest's focus has been on textual data, but it may be something to consider. |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: separating a lexerAs suggested in #580 |
Beta Was this translation helpful? Give feedback.
-
API Change: typed AST and alternatives to Pairs APIAs suggested in #882 #416 #440 #806 |
Beta Was this translation helpful? Give feedback.
-
API Change: crate restructuring or not depending on pest_derivepest could be like serde, i.e. that there’s no need to explicitly declare a dependency on pest_derive and one could just do Alternatively, as suggested in #333 (comment)
|
Beta Was this translation helpful? Give feedback.
-
API Change: streaming inputAs suggested in #370 #153 |
Beta Was this translation helpful? Give feedback.
-
API Change: detailed character offsets in Pair (UTF-16 and UTF-32)As mentioned in #370 |
Beta Was this translation helpful? Give feedback.
-
API Change: multiple error return (i.e. parser not early terminating on errors)as e.g. mentioned in #711 |
Beta Was this translation helpful? Give feedback.
-
API Change: pluggable backendsAs mentioned in #178 |
Beta Was this translation helpful? Give feedback.
-
Grammar or API Change: easier precedence specificationAs mentioned in #386 #[rule]
#[precedence = [
left(plus, minus),
left(times, divided),
right(power)
]]
fn binary_expression(lhs: Expr, op: OP, rhs: Expr) -> Expr { ... } |
Beta Was this translation helpful? Give feedback.
-
API Change: better debugging supportOne way may be to have |
Beta Was this translation helpful? Give feedback.
-
Experimental: egraph-based optimizer@dragostis mentioned that in simpler grammars, it may be possible to use egraphs to capture the rewriting rules using egg and ruler. |
Beta Was this translation helpful? Give feedback.
-
Experimental: better correctness and performanceI am a fan of this early effort paguroidea by @SchrodingerZhu @QuarticCat @CyanPineapple for a few reasons, mainly:
Finally, pest3's grammar does not need to remain PEG-based, and it may be worth exploring whether ideas used in paguroidea can be applied in pest. |
Beta Was this translation helpful? Give feedback.
-
Grammar Change: Parametrized rulesLike I explained in #886 it would be really nice to be able to parametrize some rules. For example,
Could expand to:
And could be used like |
Beta Was this translation helpful? Give feedback.
-
Grammar Change:
|
Beta Was this translation helpful? Give feedback.
-
Grammar Change: Grammar and Rule AttributesSyntax can be the same as rust's attributes (except leading # is replaced by some other character for obvious reasons) i guess? Though unlike rust here attributes will be used for metadata exclusively. Instead of requiring code changes to both actual implementation and pest_derive (or whatever will replace it in pest3) each time we want to make a new feature, attributes will not have any concrete list of allowed values/keywords and feature implementations will just consume those values/keywords they recognise. As an example, several features from this discussion can be changed to use pest attributes:
#[derive(Parser)]
#[grammar = "../examples/hook.pest"]
#[custom_state(crate::parser::CustomState)]
pub struct Parser;
pub struct CustomState {
pub max_int_visited: usize,
}
impl Parser {
fn hook_int<'a>(state: &mut CustomState, span: Span<'a>) -> bool {
let val: usize = span.as_str().parse().unwrap();
if val >= state.max_int_visited {
state.max_int_visited = val;
true
} else {
false
}
}
}
Better debugging support can be complemented by attributes:
I am no expert on the topic, but perhaps Easier precedence climbing can be implemented using attributes too. |
Beta Was this translation helpful? Give feedback.
-
Tag V1(Node Tag)Tag is an important feature leading to strongly typed AST. It is divided into the following three steps:
Why do we need node tag?Node tag is used to guide you which elements should be captured in the AST. More importantly, it is used to distinguish elements of the same type (such as identifiers), but play different roles. Tag V2(Branch Tag)Now that the first phase has been merged, the next two phases can continue to evolve after the first phase is stable. Why do we need branch tag?A strongly typed AST must conform to the ADT model, so we need to distinguish whether it is a For example: #832 (comment) Tag V3(Group Tag)This step is optional, it is not required The main appeal of this feature is that if there are a lot of elements, it is very cumbersome to mark which ones are nodes (semantic elements, such as ID) and which ones are leaf (link elements, such as Mark all elements at once with a token eg: #(string_prefix #(string_start #(string_inner) string_end) string_suffix) Automatically capture all odd-numbered layer elements, that is, Once these features are implemented, strongly typed ASTs (example) can be automatically derived and parsed based on tags (example). |
Beta Was this translation helpful? Give feedback.
-
Custom parserSupports embedding any parsing function, as long as it conforms to For example, if you want to match a URL, but it is from a third-party library, you can wrap a parser and embed it. Custom inspectorSupports checking a successful matching rule, that is, deciding whether to convert For example, you may want to match the string first, then scan it a second time to see if there is any illegal escape, and reject it if there is any. |
Beta Was this translation helpful? Give feedback.
-
Parser parametersUnlike parameterized rules, parser parameters are used to pass static (&ctx, static during parsing) environment variables. Very useful when you are writing a template engine, because you need to customize the slot starting symbol. |
Beta Was this translation helpful? Give feedback.
-
Trap rulesAdd a keyword. When this rule is encountered, the remaining matches will no longer be tried and failure will be returned directly, allowing customization of failure content. Maybe you need to distinguish whether this is a branch failure or a global failure. It is also possible to provide a recovery variable, directly inserting the given EndToken such as |
Beta Was this translation helpful? Give feedback.
-
Experimental: Incremental parsing and other goodies in OhmI was looking at Ohm, a PEG-based parser generator in the JS/TS ecosystem, and it's got some interesting ideas under the hood:
(A side-note, its online editor is pretty cool: https://ohmjs.org/editor/ ) |
Beta Was this translation helpful? Give feedback.
-
UPDATE (May 14, 20224): early pest3 prototype is here #1016
pest3 started as an effort to improve pest's language grammar and parser API, i.e., as per pest's focus, pest3 aimed to improve their accessibility, correctness, and performance. 💪 (Note that flexibility was not a primary goal.)
As @dragostis put it during our brainstorming call a few days ago, the idea was to have a language that is easier to use than the current pest 2's grammar and a better API that would leverage Rust's type system (unlike the existing Pairs API). Unfortunately, @dragostis burned out, and that effort did not lead to completion. 😢
A lot has changed since then, so this discussion aims to gather more feedback on what pest3 could be. I imagine pest3's future steps are the following:
💡 This discussion will run for a while to gather different ideas and current preferences;
🧪 A period of experimentation will follow and produce a more realistic idea for pest3's scope (to avoid the second-system syndrome, or in other words: "perfect is the enemy of good");
🚧 After that, the maintenance focus will be on implementation, documentation, tooling, and transition to pest3.
Given the uncertainty at this stage (from pest3's scope to people interested in helping out), this comes without a concrete timeline.
Anyway, in this discussion, I will post many separate threads, and label each thread with one potential grammar or API-breaking change. You can do the following actions:
⬆️ UPVOTE a thread if that breaking change interests you and you wish to see that in pest3. (This will help to sort potential preferences for the focus during experimentations.)
💬 COMMENT on a thread if you have more ideas for its proposed change or if you can help to implement it.
🧵 START A NEW THREAD if you find a missing feature among existing threads and wish to see it in pest3 (ideally, you can link existing GH issues).
Beta Was this translation helpful? Give feedback.
All reactions