Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Use a non-regex parser #138

Closed
epage opened this issue Oct 23, 2017 · 9 comments
Closed

RFC: Use a non-regex parser #138

epage opened this issue Oct 23, 2017 · 9 comments

Comments

@epage
Copy link
Member

epage commented Oct 23, 2017

This is a possible enhancement that could make it easier to address some of our bugs and missing features.

The downside is that shopofy/liquid uses regex iirc so it'll be more difficult to follow their grammar. I have heard that there are some differences in the regex engines that already cause some problems.

Goals

Background

There are really to main issues with the current parser

  • The performance of the lexer (I'm assuming thats whats so slow)
  • The maintainability of the parser

The parser right now is cobbled together functions that have too much logic in them, causing others to duplicate it. For example, the for-block has named arguments but not filters, because we don't have reusable argument parsing. The parser is also inconsistent on how it consumes tokens. A consistent, composible model would be a big help.

Solutions

Patch up the current approach.

The parser could be made more maintainable while keeping the current token stream if desired. We could make combinators for token processing, that return where they left off parsing (maybe even on failure like nom's IResult). We can then refactor all of our parsers into using these combinators so we can have an easier to maintain code base.

Parser comparison

Benchmarks

Good

  • nom
  • pest
  • chomp

Bad

  • Combine
  • pom (order of magnitude slower than pest)

Line numbers

Good:

Composability

Can plugins reuse the grammar without centralizing all token processing which also hurts correctness?

Good:

  • Nom

Bad:

  • Pest

Partial parsing

An idea to improve performance and API stability.

Good:

  • Nom: slice of str
    • Downside is if you need to store the parse results with what you parsed
  • Pest: byte position
    • Good for parse results being in a struct with what was parsed

Maintenance

Bad

  • Chomp hasn't had a commit in over a year

Brittleness

Bad

Ergonomics and Docs

Good

  • Pest: only thing to complain about is the syntax is less familiar

Mixed

  • Nom: bad ergonomics due to the macros but docs are pretty ok
@epage

This comment has been minimized.

@Albibek

This comment has been minimized.

@epage

This comment has been minimized.

@gmalette

This comment has been minimized.

@epage

This comment has been minimized.

@gmalette

This comment has been minimized.

@epage

This comment has been minimized.

epage added a commit to epage/liquid-rust that referenced this issue Jan 20, 2018
Fixes cobalt-org#105, mostly.  You get enough context.  This is also more
universal (hard to track line numbers in every context).  Any further
improvements will be a part of cobalt-org#138.

This locks mostly locks down the error API, minus `FilterError` which
won't happen until filters get refactored.  See cobalt-org#114.
@epage
Copy link
Member Author

epage commented Oct 22, 2018

Looks like nom can take in some amount of state. How much we can "plugin" other parsers, no idea.
https://users.rust-lang.org/t/nom-referencing-external-variables-from-parser-s/21249/5

@Goncalerta Goncalerta mentioned this issue Nov 5, 2018
Goncalerta added a commit to Goncalerta/liquid-rust that referenced this issue Nov 30, 2018
Replace the old regex-based parser with a pest-based one.
Introduce line/column context in some (syntax) errors (see cobalt-org#232).

closes cobalt-org#138
fixes cobalt-org#145
fixes cobalt-org#226
fixes cobalt-org#227
fixes cobalt-org#242

BREAKING CHANGES
Behavior
- Expressions no longer support tags (they weren't supposed to)
- More strictness in tokens accepted (Tags will raise an error if given a surplus of arguments. This is to alert the user for possible mistakes)
API
- Changed signature for tags and blocks
- `compiler::parse` takes a `&str` directly as an argument, instead of requiring the midstep of `tokenize`
@epage epage closed this as completed in dcac742 Nov 30, 2018
@epage
Copy link
Member Author

epage commented Nov 30, 2018

Re-openeing because I want to consider this topic some more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants