Is parsing markdown necessary? #288

websiddu · 2024-11-27T15:14:47Z

So, I have been experimenting with harper lately, for example if you pass markdown content the content is parsed by a parser and converted into AST, do we need such parsing?

Alternatively, I wrote a function clean the markup and replace it with spaces, and then run its as plain text, here is my version https://github.com/websiddu/harper/blob/master/harper-wasm/src/lib.rs#L21

This implementation is currently live on https://stubby.io/

I'm really not sure if this is more efficient than doing a full syntax tree and then getting the word position based on that. Just sharing an idea as I thought this simplify a lot of the code.

elijah-potter · 2024-11-27T17:17:15Z

Hey, thanks for reaching out!

Harper's parsing infrastructure is admittedly poorly documented at the moment, so I'll try to explain it enough to answer your question here. Expect a proper guide on it in the future.

So, I have been experimenting with harper lately, for example if you pass markdown content the content is parsed by a parser and converted into AST, do we need such parsing?

To directly answer your question: yes, and it takes negligible time. The Markdown library we use is really fast (I think it actually might be the fastest CommonMark implementation out there), so it consumes a trivial percentage of our execution time, while significantly improving Harper's internal document model.

Your implementation, while interesting, is not spec compliant, and recompiling and running so many regex expressions every time is quite slow. I intend to properly support MDX in the future, but in the meantime you can probably get significantly better results by using your same Regex stripping inside the Markdown parser (whose code you can find here). A cheap solution would involve making a copy of that file and pasting your stripping inside.

If you would like to parse MDX properly (which would give Harper the best internal document model and therefore significantly better linting) you just have to implement the Parser trait, which can be done by wrapping another existing parser, including one generated by Treesitter.

P.S. I'm so glad you're using Harper for your project. I'm honored. We've got significant JS API improvements on the way, so stay tuned!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is parsing markdown necessary? #288

Is parsing markdown necessary? #288

websiddu commented Nov 27, 2024

elijah-potter commented Nov 27, 2024 •

edited

Loading

Is parsing markdown necessary? #288

Is parsing markdown necessary? #288

Comments

websiddu commented Nov 27, 2024

elijah-potter commented Nov 27, 2024 • edited Loading

elijah-potter commented Nov 27, 2024 •

edited

Loading