Skip to content

Commit

Permalink
Add doc for load multiple pest files and include! syntax.
Browse files Browse the repository at this point in the history
  • Loading branch information
huacnlee committed Jan 9, 2023
1 parent ef5c9fc commit 0d3e966
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 32 deletions.
26 changes: 22 additions & 4 deletions src/grammars/grammars.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Grammars

Like many parsing tools, `pest` operates using a *formal grammar* that is
distinct from your Rust code. The format that `pest` uses is called a *parsing
expression grammar*, or *PEG*. When building a project, `pest` automatically
Like many parsing tools, `pest` operates using a _formal grammar_ that is
distinct from your Rust code. The format that `pest` uses is called a _parsing
expression grammar_, or _PEG_. When building a project, `pest` automatically
compiles the PEG, located in a separate file, into a plain Rust function that
you can call.

Expand Down Expand Up @@ -36,6 +36,24 @@ exists during compilation. However, you can use `Rules` just like any other
enum, and you can use `parse(...)` through the [`Pairs`] interface described in
the [Parser API chapter](../parser_api.html).

## Load multiple grammars

If you have multiple grammars, you can load them all at once:

```rust
use pest::Parser;

#[derive(Parser)]
#[grammar = "parser/base.pest"]
#[grammar = "parser/grammar.pest"]
struct MyParser;
```

Then `pest` will generate a `Rules` enum that contains all the rules from both.
This is useful if you have a base grammar that you want to extend in multiple.

> You also can use [include!](./syntax.md#include) to load rules from other pest files.
## Warning about PEGs!

Parsing expression grammars look quite similar to other parsing tools you might
Expand All @@ -50,7 +68,7 @@ tripped up by comparisons to other tools.
If you have used other parsing tools before, be sure to read the next section
carefully. We'll mention some common mistakes regarding PEGs.

[`Pairs`]: https://docs.rs/pest/2.0/pest/iterators/struct.Pairs.html
[`pairs`]: https://docs.rs/pest/2.0/pest/iterators/struct.Pairs.html
[`include_str!`]: https://doc.rust-lang.org/std/macro.include_str.html
[eager]: peg.html#eagerness
[non-backtracking]: peg.html#non-backtracking
Expand Down
83 changes: 55 additions & 28 deletions src/grammars/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ atomic_rule = @{ ... }

## Expressions

Grammar rules are built from *expressions* (hence "parsing expression
Grammar rules are built from _expressions_ (hence "parsing expression
grammar"). These expressions are a terse, formal description of how to parse an
input string.

Expand Down Expand Up @@ -85,7 +85,7 @@ successfully, `and_then` is attempted next. However, if `first` fails, the
entire expression fails.

A list of expressions can be chained together with sequences, which indicates
that *all* of the components must occur, in the specified order.
that _all_ of the components must occur, in the specified order.

### Ordered choice

Expand All @@ -98,7 +98,7 @@ first | or_else
```

When matching a choice expression, `first` is attempted. If `first` matches
successfully, the entire expression *succeeds immediately*. However, if `first`
successfully, the entire expression _succeeds immediately_. However, if `first`
fails, `or_else` is attempted next.

Note that `first` and `or_else` are always attempted at the same position, even
Expand All @@ -124,7 +124,7 @@ It is somewhat tempting to borrow terminology and think of this operation as
"alternation" or simply "OR", but this is misleading. The word "choice" is used
specifically because [the operation is *not* merely logical "OR"].

[the operation is *not* merely logical "OR"]: peg.html#ordered-choice
[the operation is *not* merely logical "or"]: peg.html#ordered-choice

### Repetition

Expand Down Expand Up @@ -163,19 +163,19 @@ Thus `expr*` is equivalent to `expr{0, }`; `expr+` is equivalent to `expr{1,
### Predicates

Preceding an expression with an ampersand `&` or exclamation mark `!` turns it
into a *predicate* that never consumes any input. You might know these
into a _predicate_ that never consumes any input. You might know these
operators as "lookahead" or "non-progressing".

The **positive predicate**, written as an ampersand `&`, attempts to match its
inner expression. If the inner expression succeeds, parsing continues, but at
the *same position* as the predicate — `&foo ~ bar` is thus a kind of
the _same position_ as the predicate — `&foo ~ bar` is thus a kind of
"AND" statement: "the input string must match `foo` AND `bar`". If the inner
expression fails, the whole expression fails too.

The **negative predicate**, written as an exclamation mark `!`, attempts to
match its inner expression. If the inner expression *fails*, the predicate
*succeeds* and parsing continues at the same position as the predicate. If the
inner expression *succeeds*, the predicate *fails* — `!foo ~ bar` is thus
match its inner expression. If the inner expression _fails_, the predicate
_succeeds_ and parsing continues at the same position as the predicate. If the
inner expression _succeeds_, the predicate _fails_ — `!foo ~ bar` is thus
a kind of "NOT" statement: "the input string must match `bar` but NOT `foo`".

This leads to the common idiom meaning "any character but":
Expand Down Expand Up @@ -239,7 +239,7 @@ my_rule = {

## Start and end of input

The rules `SOI` and `EOI` match the *start* and *end* of the input string,
The rules `SOI` and `EOI` match the _start_ and _end_ of the input string,
respectively. Neither consumes any text. They only indicate whether the parser
is currently at one edge of the input.

Expand Down Expand Up @@ -292,8 +292,8 @@ ws = _{ " " }
com = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
```

Note that implicit whitespace is *not* inserted at the beginning or end of rules
— for instance, `expression` does *not* match `" 4+5 "`. If you want to
Note that implicit whitespace is _not_ inserted at the beginning or end of rules
— for instance, `expression` does _not_ match `" 4+5 "`. If you want to
include implicit whitespace at the beginning and end of a rule, you will need to
sandwich it between two empty rules (often `SOI` and `EOI` [as above]):

Expand Down Expand Up @@ -384,14 +384,41 @@ fstring = @{ "\"" ~ ... }
expr = !{ ... }
```

## Include

`include!` is allows you to load rules from other pest files.

Sometimes, you'll want to split rules into multiple parts. For instance, you
might want to define rules, then use them in multiple places.

`base.pest`:

```pest
WHITESPACE = _{ " " | "\t" | "\r" | "\n" }
identifier = { (ASCII_ALPHANUMERIC | "_" | "-")+ }
```

Then we can load use `include!("base.pest")` in other files to load the rules
of `base.pest` into the current file.

`toml.pest`:

```pest
include!("base.pest")
pair = { key ~ WHITESPACE* ~ "=" ~ WHITESPACE* ~ value }
key = @{ identifier }
value = { !NEWLINE ~ ANY }
```

## The stack (WIP)

`pest` maintains a stack that can be manipulated directly from the grammar. An
expression can be matched and pushed onto the stack with the keyword `PUSH`,
then later matched exactly with the keywords `PEEK` and `POP`.

Using the stack allows *the exact same text* to be matched multiple times,
rather than *the same pattern*.
Using the stack allows _the exact same text_ to be matched multiple times,
rather than _the same pattern_.

For example,

Expand Down Expand Up @@ -442,20 +469,20 @@ raw_string_interior = {

# Cheat sheet

| Syntax | Meaning | Syntax | Meaning |
|:----------------:|:---------------------------------:|:-----------------------:|:--------------------:|
| `foo = { ... }` | [regular rule] | `baz = @{ ... }` | [atomic] |
| `bar = _{ ... }` | [silent] | `qux = ${ ... }` | [compound-atomic] |
| | | `plugh = !{ ... }` | [non-atomic] |
| `"abc"` | [exact string] | `^"abc"` | [case insensitive] |
| `'a'..'z'` | [character range] | `ANY` | [any character] |
| `foo ~ bar` | [sequence] | <code>baz \| qux</code> | [ordered choice] |
| `foo*` | [zero or more] | `bar+` | [one or more] |
| `baz?` | [optional] | `qux{n}` | [exactly *n*] |
| `qux{m, n}` | [between *m* and *n* (inclusive)] | | |
| `&foo` | [positive predicate] | `!bar` | [negative predicate] |
| `PUSH(baz)` | [match and push] | | |
| `POP` | [match and pop] | `PEEK` | [match without pop] |
| Syntax | Meaning | Syntax | Meaning |
| :--------------: | :-------------------------------: | :---------------------: | :------------------: |
| `foo = { ... }` | [regular rule] | `baz = @{ ... }` | [atomic] |
| `bar = _{ ... }` | [silent] | `qux = ${ ... }` | [compound-atomic] |
| | | `plugh = !{ ... }` | [non-atomic] |
| `"abc"` | [exact string] | `^"abc"` | [case insensitive] |
| `'a'..'z'` | [character range] | `ANY` | [any character] |
| `foo ~ bar` | [sequence] | <code>baz \| qux</code> | [ordered choice] |
| `foo*` | [zero or more] | `bar+` | [one or more] |
| `baz?` | [optional] | `qux{n}` | [exactly *n*] |
| `qux{m, n}` | [between *m* and *n* (inclusive)] | | |
| `&foo` | [positive predicate] | `!bar` | [negative predicate] |
| `PUSH(baz)` | [match and push] | | |
| `POP` | [match and pop] | `PEEK` | [match without pop] |

[regular rule]: #syntax-of-pest-parsers
[silent]: #silent-and-atomic-rules
Expand Down

0 comments on commit 0d3e966

Please sign in to comment.