Add doc for load multiple pest files and include! syntax.

Ref: - pest-parser/pest#759 - pest-parser/pest#758
huacnlee · Jan 9, 2023 · 0d3e966 · 0d3e966
1 parent ef5c9fc
commit 0d3e966
Show file tree

Hide file tree

Showing 2 changed files with 77 additions and 32 deletions.
diff --git a/src/grammars/grammars.md b/src/grammars/grammars.md
@@ -1,8 +1,8 @@
 # Grammars
 
-Like many parsing tools, `pest` operates using a *formal grammar* that is
-distinct from your Rust code. The format that `pest` uses is called a *parsing
-expression grammar*, or *PEG*. When building a project, `pest` automatically
+Like many parsing tools, `pest` operates using a _formal grammar_ that is
+distinct from your Rust code. The format that `pest` uses is called a _parsing
+expression grammar_, or _PEG_. When building a project, `pest` automatically
 compiles the PEG, located in a separate file, into a plain Rust function that
 you can call.
 
@@ -36,6 +36,24 @@ exists during compilation. However, you can use `Rules` just like any other
 enum, and you can use `parse(...)` through the [`Pairs`] interface described in
 the [Parser API chapter](../parser_api.html).
 
+## Load multiple grammars
+
+If you have multiple grammars, you can load them all at once:
+
+```rust
+use pest::Parser;
+
+#[derive(Parser)]
+#[grammar = "parser/base.pest"]
+#[grammar = "parser/grammar.pest"]
+struct MyParser;
+```
+
+Then `pest` will generate a `Rules` enum that contains all the rules from both.
+This is useful if you have a base grammar that you want to extend in multiple.
+
+> You also can use [include!](./syntax.md#include) to load rules from other pest files.
+
 ## Warning about PEGs!
 
 Parsing expression grammars look quite similar to other parsing tools you might
@@ -50,7 +68,7 @@ tripped up by comparisons to other tools.
 If you have used other parsing tools before, be sure to read the next section
 carefully. We'll mention some common mistakes regarding PEGs.
 
-[`Pairs`]: https://docs.rs/pest/2.0/pest/iterators/struct.Pairs.html
+[`pairs`]: https://docs.rs/pest/2.0/pest/iterators/struct.Pairs.html
 [`include_str!`]: https://doc.rust-lang.org/std/macro.include_str.html
 [eager]: peg.html#eagerness
 [non-backtracking]: peg.html#non-backtracking

diff --git a/src/grammars/syntax.md b/src/grammars/syntax.md
@@ -25,7 +25,7 @@ atomic_rule = @{ ... }
 
 ## Expressions
 
-Grammar rules are built from *expressions* (hence "parsing expression
+Grammar rules are built from _expressions_ (hence "parsing expression
 grammar"). These expressions are a terse, formal description of how to parse an
 input string.
 
@@ -85,7 +85,7 @@ successfully, `and_then` is attempted next. However, if `first` fails, the
 entire expression fails.
 
 A list of expressions can be chained together with sequences, which indicates
-that *all* of the components must occur, in the specified order.
+that _all_ of the components must occur, in the specified order.
 
 ### Ordered choice
 
@@ -98,7 +98,7 @@ first | or_else
 ```
 
 When matching a choice expression, `first` is attempted. If `first` matches
-successfully, the entire expression *succeeds immediately*. However, if `first`
+successfully, the entire expression _succeeds immediately_. However, if `first`
 fails, `or_else` is attempted next.
 
 Note that `first` and `or_else` are always attempted at the same position, even
@@ -124,7 +124,7 @@ It is somewhat tempting to borrow terminology and think of this operation as
 "alternation" or simply "OR", but this is misleading. The word "choice" is used
 specifically because [the operation is *not* merely logical "OR"].
 
-[the operation is *not* merely logical "OR"]: peg.html#ordered-choice
+[the operation is *not* merely logical "or"]: peg.html#ordered-choice
 
 ### Repetition
 
@@ -163,19 +163,19 @@ Thus `expr*` is equivalent to `expr{0, }`; `expr+` is equivalent to `expr{1,
 ### Predicates
 
 Preceding an expression with an ampersand `&` or exclamation mark `!` turns it
-into a *predicate* that never consumes any input. You might know these
+into a _predicate_ that never consumes any input. You might know these
 operators as "lookahead" or "non-progressing".
 
 The **positive predicate**, written as an ampersand `&`, attempts to match its
 inner expression. If the inner expression succeeds, parsing continues, but at
-the *same position* as the predicate &mdash; `&foo ~ bar` is thus a kind of
+the _same position_ as the predicate &mdash; `&foo ~ bar` is thus a kind of
 "AND" statement: "the input string must match `foo` AND `bar`". If the inner
 expression fails, the whole expression fails too.
 
 The **negative predicate**, written as an exclamation mark `!`, attempts to
-match its inner expression. If the inner expression *fails*, the predicate
-*succeeds* and parsing continues at the same position as the predicate. If the
-inner expression *succeeds*, the predicate *fails* &mdash; `!foo ~ bar` is thus
+match its inner expression. If the inner expression _fails_, the predicate
+_succeeds_ and parsing continues at the same position as the predicate. If the
+inner expression _succeeds_, the predicate _fails_ &mdash; `!foo ~ bar` is thus
 a kind of "NOT" statement: "the input string must match `bar` but NOT `foo`".
 
 This leads to the common idiom meaning "any character but":
@@ -239,7 +239,7 @@ my_rule = {
 
 ## Start and end of input
 
-The rules `SOI` and `EOI` match the *start* and *end* of the input string,
+The rules `SOI` and `EOI` match the _start_ and _end_ of the input string,
 respectively. Neither consumes any text. They only indicate whether the parser
 is currently at one edge of the input.
 
@@ -292,8 +292,8 @@ ws = _{ " " }
 com = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
 ```
 
-Note that implicit whitespace is *not* inserted at the beginning or end of rules
-&mdash; for instance, `expression` does *not* match `" 4+5 "`. If you want to
+Note that implicit whitespace is _not_ inserted at the beginning or end of rules
+&mdash; for instance, `expression` does _not_ match `" 4+5 "`. If you want to
 include implicit whitespace at the beginning and end of a rule, you will need to
 sandwich it between two empty rules (often `SOI` and `EOI` [as above]):
 
@@ -384,14 +384,41 @@ fstring = @{ "\"" ~ ... }
 expr = !{ ... }
 ```
 
+## Include
+
+`include!` is allows you to load rules from other pest files.
+
+Sometimes, you'll want to split rules into multiple parts. For instance, you
+might want to define rules, then use them in multiple places.
+
+`base.pest`:
+
+```pest
+WHITESPACE = _{ " " | "\t" | "\r" | "\n" }
+identifier = { (ASCII_ALPHANUMERIC | "_" | "-")+ }
+```
+
+Then we can load use `include!("base.pest")` in other files to load the rules
+of `base.pest` into the current file.
+
+`toml.pest`:
+
+```pest
+include!("base.pest")
+
+pair  = { key ~ WHITESPACE* ~ "=" ~ WHITESPACE* ~ value }
+key   = @{ identifier }
+value = { !NEWLINE ~ ANY }
+```
+
 ## The stack (WIP)
 
 `pest` maintains a stack that can be manipulated directly from the grammar. An
 expression can be matched and pushed onto the stack with the keyword `PUSH`,
 then later matched exactly with the keywords `PEEK` and `POP`.
 
-Using the stack allows *the exact same text* to be matched multiple times,
-rather than *the same pattern*.
+Using the stack allows _the exact same text_ to be matched multiple times,
+rather than _the same pattern_.
 
 For example,
 
@@ -442,20 +469,20 @@ raw_string_interior = {
 
 # Cheat sheet
 
-| Syntax           | Meaning                           | Syntax                  | Meaning              |
-|:----------------:|:---------------------------------:|:-----------------------:|:--------------------:|
-| `foo =  { ... }` | [regular rule]                    | `baz = @{ ... }`        | [atomic]             |
-| `bar = _{ ... }` | [silent]                          | `qux = ${ ... }`        | [compound-atomic]    |
-|                  |                                   | `plugh = !{ ... }`      | [non-atomic]         |
-| `"abc"`          | [exact string]                    | `^"abc"`                | [case insensitive]   |
-| `'a'..'z'`       | [character range]                 | `ANY`                   | [any character]      |
-| `foo ~ bar`      | [sequence]                        | <code>baz \| qux</code> | [ordered choice]     |
-| `foo*`           | [zero or more]                    | `bar+`                  | [one or more]        |
-| `baz?`           | [optional]                        | `qux{n}`                | [exactly *n*]        |
-| `qux{m, n}`      | [between *m* and *n* (inclusive)] |                         |                      |
-| `&foo`           | [positive predicate]              | `!bar`                  | [negative predicate] |
-| `PUSH(baz)`      | [match and push]                  |                         |                      |
-| `POP`            | [match and pop]                   | `PEEK`                  | [match without pop]  |
+|      Syntax      |              Meaning              |         Syntax          |       Meaning        |
+| :--------------: | :-------------------------------: | :---------------------: | :------------------: |
+| `foo =  { ... }` |          [regular rule]           |    `baz = @{ ... }`     |       [atomic]       |
+| `bar = _{ ... }` |             [silent]              |    `qux = ${ ... }`     |  [compound-atomic]   |
+|                  |                                   |   `plugh = !{ ... }`    |     [non-atomic]     |
+|     `"abc"`      |          [exact string]           |        `^"abc"`         |  [case insensitive]  |
+|    `'a'..'z'`    |         [character range]         |          `ANY`          |   [any character]    |
+|   `foo ~ bar`    |            [sequence]             | <code>baz \| qux</code> |   [ordered choice]   |
+|      `foo*`      |          [zero or more]           |         `bar+`          |    [one or more]     |
+|      `baz?`      |            [optional]             |        `qux{n}`         |    [exactly *n*]     |
+|   `qux{m, n}`    | [between *m* and *n* (inclusive)] |                         |                      |
+|      `&foo`      |       [positive predicate]        |         `!bar`          | [negative predicate] |
+|   `PUSH(baz)`    |         [match and push]          |                         |                      |
+|      `POP`       |          [match and pop]          |         `PEEK`          | [match without pop]  |
 
 [regular rule]: #syntax-of-pest-parsers
 [silent]: #silent-and-atomic-rules