Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Revert to PEG parser by default #947

Merged
merged 1 commit into from
Jun 1, 2017
Merged

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented May 30, 2017

Fixes several small issues created with the TinyMCE parser.

Focuses efforts on building a declarative formal grammar to handle our
parsing needs.

@nylen how would you feel here about switching back to the PEG parser and then making our needed updates to the grammar and focusing on it?

@dmsnell dmsnell added [Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels May 30, 2017
@dmsnell dmsnell requested review from nylen and mtias May 30, 2017 21:50
Fixes several small issues created with the TinyMCE parser.

Focuses efforts on building a declarative formal grammar to handle our
parsing needs.
@dmsnell dmsnell force-pushed the parser/back-to-peg branch from cfaa2b6 to 664ef45 Compare May 30, 2017 21:50
@nylen
Copy link
Member

nylen commented May 31, 2017

Which issues does this fix? Can we get test cases for them to ensure they don't regress?

If you're talking about #844, I'm actually only seeing that issue with the PEG parser and not the TinyMCE parser (see latest comment for details).

More generally, my thoughts on this are tied up with #914. I am disappointed in the lack of interest in solving this critical set of problems, and I don't think we should make the change in this PR until the PEG parser returns a tree of block delimiters + HTML tags.

@dmsnell
Copy link
Member Author

dmsnell commented May 31, 2017

Which issues does this fix?

well #890 and yeah #844 are examples. with the grammar the invalid block is just that: invalid; but we don't mangle the content. with the TinyMCE parser it breaks the entire page.

Can we get test cases for them to ensure they don't regress?

sounds like a good idea. would be nice to do in a follow-up PR.

I am disappointed in the lack of interest in solving this critical set of problems

please try not to assume anyone else's intent on this. we have disagreements on what the problem is and how it should be resolved. I trust you that you are working to produce the best design for the project; that is what I am also doing.

I don't think we should make the change in this PR until the PEG parser returns a tree of block delimiters + HTML tags

when I test it out it appears to match the behavior of the TinyMCE-based parser at the moment. if that's not the case can you help me identity what needs to change? I don't want to add new behavior in this PR.


I'm on board with your goals. this PR is about wanting to focus on the formal grammar and the corresponding parser that it generates. there's nothing holding us back from changing it however we want, but I'd rather get this started now so we can iteratively improve it in a well-defined central spot than let the implicit spec continue to develop implicitly outside of the grammar.

@nylen
Copy link
Member

nylen commented May 31, 2017

I have no doubt that we are all trying to produce the best design for the project, and that we'll continue to improve the grammar. However, merging this PR now means that #914 is effectively stalled until that happens, because our default parser would not be providing a parse tree that includes HTML tags. I have to say I'd be pretty disappointed with that outcome.

I would prefer instead that we enhance the PEG grammar to return a tree of HTML tags first (aka bring it more into line with wp-post-grammar), then enable it as the default once that is done and tested.

Also, if this PR is intended to fix bugs, then I think the test cases for those bugs should be added at the same time.

@nylen
Copy link
Member

nylen commented May 31, 2017

enable it as the default once that is done and tested.

And then remove the TinyMCE-based parser. The latest comments on #844 provide a clear example of how limited this approach is.

@dmsnell
Copy link
Member Author

dmsnell commented May 31, 2017

merging this PR now means that #914 is effectively stalled

yes, though switching now means swapping out at a point where the editor's behavior doesn't change and #914 isn't merged yet.

I've been trying to give #914 the weight and consideration it deserves (which is why I've been slow to respond to it) and I'm very uncomfortable merging it in the way it's designed. Even if TinyMCE were the only parser I don't like the performance characteristics of #914 or the nondeterminism it introduces to converting raw HTML into blocks.


If we're going to add an HTML tree to the parser output then let's do that, but let's do it one step at a time instead of squashing together changes. Also, if you aren't going to approve of the grammar parser anyway (which seems to be where your comments point) why should we waste time building it up the way we want the parser to work?

@mtias
Copy link
Member

mtias commented Jun 1, 2017

Thanks for all the discussions around these important issues. I'm going to merge this as it fixes the immediate problem, but I trust we'll iterate on this crucial foundation until we are satisfied. I'd like us to remain open to layering solutions in order to strengthen the system without adding unnecessary burdens for the optimal case (a person using just the new editor for new content) which is the one that will get measured against other platforms.

I don't feel comfortable removing the TinyMCE parser until we can run different tests against the two, more faithfully determine shortcomings around performance and reliability, and iterate on the grammar offering. The changes in #914 to parseWithTinyMCE should continue regardless, I think.

@mtias mtias merged commit 2bf6c15 into master Jun 1, 2017
@mtias mtias deleted the parser/back-to-peg branch June 1, 2017 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants