RFC: Use a non-regex parser #138

epage · 2017-10-23T13:33:32Z

This is a possible enhancement that could make it easier to address some of our bugs and missing features.

The downside is that shopofy/liquid uses regex iirc so it'll be more difficult to follow their grammar. I have heard that there are some differences in the regex engines that already cause some problems.

Goals

Performance (faster than what we have now at least)
Making it easier to implement features, for example
- More complex indexing (Allow expressions in indexed variables #145)
- Variable includes (Add variable support ({% include {{ page.my_variable }} %}) to our Jekyll-style include #142)
- Named arguments for filters (Support for named arguments in filters #92, also requires changes to filter API)
- nil literals (Support nil literals #223)
- blank / empty keywords (Support blank and empty literals #222)
Generally easier to maintain
A concise top-level that reduces churn (all helpers can be moved into a sub-crate and break compat whenever)
File location information in errors (formerly Display line (and column?) in syntax errors #232)

Background

There are really to main issues with the current parser

The performance of the lexer (I'm assuming thats whats so slow)
The maintainability of the parser

The parser right now is cobbled together functions that have too much logic in them, causing others to duplicate it. For example, the for-block has named arguments but not filters, because we don't have reusable argument parsing. The parser is also inconsistent on how it consumes tokens. A consistent, composible model would be a big help.

Solutions

Patch up the current approach.

The parser could be made more maintainable while keeping the current token stream if desired. We could make combinators for token processing, that return where they left off parsing (maybe even on failure like nom's IResult). We can then refactor all of our parsers into using these combinators so we can have an easier to maintain code base.

Parser comparison

Benchmarks

Good

nom
pest
chomp

Bad

Combine
pom (order of magnitude slower than pest)

Line numbers

Good:

Pest has it native
Nom has nom_locate

Composability

Can plugins reuse the grammar without centralizing all token processing which also hurts correctness?

Good:

Nom

Bad:

Pest

Partial parsing

An idea to improve performance and API stability.

Good:

Nom: slice of str
- Downside is if you need to store the parse results with what you parsed
Pest: byte position
- Good for parse results being in a struct with what was parsed

Maintenance

Bad

Chomp hasn't had a commit in over a year

Brittleness

Bad

Pest
- Requires 100% test coverage to ensure grammar changes don't break code.
- See https://www.reddit.com/r/rust/comments/9xt5sh/introducing_pest_into_glsl_and_hindsight_about/?st=jooy4jhz&sh=dab05392
- See https://www.reddit.com/r/rust/comments/9xz9c7/introducing_pest_into_glsl_and_hindsight_about/?st=jooy4k81&sh=39cafabf

Ergonomics and Docs

Good

Pest: only thing to complain about is the syntax is less familiar

Mixed

Nom: bad ergonomics due to the macros but docs are pretty ok

The text was updated successfully, but these errors were encountered:

Fixes cobalt-org#105, mostly. You get enough context. This is also more universal (hard to track line numbers in every context). Any further improvements will be a part of cobalt-org#138. This locks mostly locks down the error API, minus `FilterError` which won't happen until filters get refactored. See cobalt-org#114.

epage · 2018-10-22T14:05:35Z

Looks like nom can take in some amount of state. How much we can "plugin" other parsers, no idea.
https://users.rust-lang.org/t/nom-referencing-external-variables-from-parser-s/21249/5

Replace the old regex-based parser with a pest-based one. Introduce line/column context in some (syntax) errors (see cobalt-org#232). closes cobalt-org#138 fixes cobalt-org#145 fixes cobalt-org#226 fixes cobalt-org#227 fixes cobalt-org#242 BREAKING CHANGES Behavior - Expressions no longer support tags (they weren't supposed to) - More strictness in tokens accepted (Tags will raise an error if given a surplus of arguments. This is to alert the user for possible mistakes) API - Changed signature for tags and blocks - `compiler::parse` takes a `&str` directly as an argument, instead of requiring the midstep of `tokenize`

epage · 2018-11-30T22:15:43Z

Re-openeing because I want to consider this topic some more.

This comment has been minimized.

Sign in to view

epage mentioned this issue Oct 23, 2017

Bug: Array Indexes #127

Closed

This comment has been minimized.

Sign in to view

This was referenced Nov 8, 2017

Add variable support ({% include {{ page.my_variable }} %}) to our Jekyll-style include #142

Open

Allow expressions in indexed variables #145

Closed

This comment has been minimized.

Sign in to view

epage added the api-break label Dec 16, 2017

This was referenced Jan 20, 2018

Provide context to errors #164

Merged

Display file and location of liquid errors cobalt-org/cobalt.rs#136

Closed

epage mentioned this issue Oct 5, 2018

Unable to use a variable or value index for object in assign tag #207

Closed

epage mentioned this issue Oct 29, 2018

feat(filters): array manipulation filters #220

Merged

Goncalerta mentioned this issue Nov 5, 2018

Pest Parser #221

Merged

epage closed this as completed in dcac742 Nov 30, 2018

epage reopened this Nov 30, 2018

Goncalerta mentioned this issue Jan 19, 2019

fix(parser): blocks can accept invalid liquid #328

Merged

epage closed this as completed Jan 23, 2020

jmackie mentioned this issue May 31, 2022

Reimplement the parser with LALRPOP and logos lexer 🙌 ditto-lang/ditto#56

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Use a non-regex parser #138

RFC: Use a non-regex parser #138

epage commented Oct 23, 2017 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

epage commented Oct 22, 2018

epage commented Nov 30, 2018

RFC: Use a non-regex parser #138

RFC: Use a non-regex parser #138

Comments

epage commented Oct 23, 2017 • edited Loading

Goals

Background

Solutions

Patch up the current approach.

Parser comparison

Benchmarks

Line numbers

Composability

Partial parsing

Maintenance

Brittleness

Ergonomics and Docs

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

epage commented Oct 22, 2018

epage commented Nov 30, 2018

epage commented Oct 23, 2017 •

edited

Loading