Parser: Revert to PEG parser by default #947

dmsnell · 2017-05-30T21:50:03Z

Fixes several small issues created with the TinyMCE parser.

Focuses efforts on building a declarative formal grammar to handle our
parsing needs.

@nylen how would you feel here about switching back to the PEG parser and then making our needed updates to the grammar and focusing on it?

Fixes several small issues created with the TinyMCE parser. Focuses efforts on building a declarative formal grammar to handle our parsing needs.

nylen · 2017-05-31T16:58:32Z

Which issues does this fix? Can we get test cases for them to ensure they don't regress?

If you're talking about #844, I'm actually only seeing that issue with the PEG parser and not the TinyMCE parser (see latest comment for details).

More generally, my thoughts on this are tied up with #914. I am disappointed in the lack of interest in solving this critical set of problems, and I don't think we should make the change in this PR until the PEG parser returns a tree of block delimiters + HTML tags.

dmsnell · 2017-05-31T18:54:37Z

Which issues does this fix?

well #890 and yeah #844 are examples. with the grammar the invalid block is just that: invalid; but we don't mangle the content. with the TinyMCE parser it breaks the entire page.

Can we get test cases for them to ensure they don't regress?

sounds like a good idea. would be nice to do in a follow-up PR.

I am disappointed in the lack of interest in solving this critical set of problems

please try not to assume anyone else's intent on this. we have disagreements on what the problem is and how it should be resolved. I trust you that you are working to produce the best design for the project; that is what I am also doing.

I don't think we should make the change in this PR until the PEG parser returns a tree of block delimiters + HTML tags

when I test it out it appears to match the behavior of the TinyMCE-based parser at the moment. if that's not the case can you help me identity what needs to change? I don't want to add new behavior in this PR.

I'm on board with your goals. this PR is about wanting to focus on the formal grammar and the corresponding parser that it generates. there's nothing holding us back from changing it however we want, but I'd rather get this started now so we can iteratively improve it in a well-defined central spot than let the implicit spec continue to develop implicitly outside of the grammar.

nylen · 2017-05-31T19:41:06Z

I have no doubt that we are all trying to produce the best design for the project, and that we'll continue to improve the grammar. However, merging this PR now means that #914 is effectively stalled until that happens, because our default parser would not be providing a parse tree that includes HTML tags. I have to say I'd be pretty disappointed with that outcome.

I would prefer instead that we enhance the PEG grammar to return a tree of HTML tags first (aka bring it more into line with wp-post-grammar), then enable it as the default once that is done and tested.

Also, if this PR is intended to fix bugs, then I think the test cases for those bugs should be added at the same time.

nylen · 2017-05-31T19:42:09Z

enable it as the default once that is done and tested.

And then remove the TinyMCE-based parser. The latest comments on #844 provide a clear example of how limited this approach is.

dmsnell · 2017-05-31T20:00:39Z

merging this PR now means that #914 is effectively stalled

yes, though switching now means swapping out at a point where the editor's behavior doesn't change and #914 isn't merged yet.

I've been trying to give #914 the weight and consideration it deserves (which is why I've been slow to respond to it) and I'm very uncomfortable merging it in the way it's designed. Even if TinyMCE were the only parser I don't like the performance characteristics of #914 or the nondeterminism it introduces to converting raw HTML into blocks.

If we're going to add an HTML tree to the parser output then let's do that, but let's do it one step at a time instead of squashing together changes. Also, if you aren't going to approve of the grammar parser anyway (which seems to be where your comments point) why should we waste time building it up the way we want the parser to work?

mtias · 2017-06-01T12:09:36Z

Thanks for all the discussions around these important issues. I'm going to merge this as it fixes the immediate problem, but I trust we'll iterate on this crucial foundation until we are satisfied. I'd like us to remain open to layering solutions in order to strengthen the system without adding unnecessary burdens for the optimal case (a person using just the new editor for new content) which is the one that will get measured against other platforms.

I don't feel comfortable removing the TinyMCE parser until we can run different tests against the two, more faithfully determine shortcomings around performance and reliability, and iterate on the grammar offering. The changes in #914 to parseWithTinyMCE should continue regardless, I think.

dmsnell added [Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels May 30, 2017

dmsnell requested review from nylen and mtias May 30, 2017 21:50

Parser: Revert to PEG parser by default

664ef45

Fixes several small issues created with the TinyMCE parser. Focuses efforts on building a declarative formal grammar to handle our parsing needs.

dmsnell force-pushed the parser/back-to-peg branch from cfaa2b6 to 664ef45 Compare May 30, 2017 21:50

nylen mentioned this pull request May 31, 2017

Automated Testing Guidelines for Gutenberg #939

Closed

mtias merged commit 2bf6c15 into master Jun 1, 2017

mtias deleted the parser/back-to-peg branch June 1, 2017 12:09

nylen mentioned this pull request Jun 12, 2017

Parser: Add "void" block signature #1134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: Revert to PEG parser by default #947

Parser: Revert to PEG parser by default #947

dmsnell commented May 30, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

nylen commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

mtias commented Jun 1, 2017

Parser: Revert to PEG parser by default #947

Parser: Revert to PEG parser by default #947

Conversation

dmsnell commented May 30, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

nylen commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

mtias commented Jun 1, 2017