Better BR elements handling #12033

niegowski · 2022-07-07T15:09:44Z

Suggested merge commit message (convention)

Fix (enter): A   element between blocks should be converted to an empty paragraph. Closes #10217.

Test (clipboard): Added test for pasting content with   element between blocks. See #10217.

Internal (engine): Handling of   elements in the data pipeline moved from DomConverter to the softBreak conversion.

Additional information

The original issue is only about pasting such content (trimmed for readability):

<p>A</p>
<br />
<p>B</p>

This content expects 3 paragraphs and the middle one is empty but got converted (because of auto-paragraphing, softBreak is not allowed in root context) as:

<paragraph>A</paragraph>
<paragraph><softBreak/></paragraph>
<paragraph>B</paragraph>

But since this is very generic HTML and the behavior of   elements is very specific in web browsers I did some research to find out how different   scenarios are handled by a web browser and CKE5.

Here is a result (from a manual test: http://fake.ckeditor.com:8125/ckeditor5-enter/tests/manual/softbreak-paragraphing.html):

The broken cases

Input	Actual	Expected
`<br> <p>foo</p>`	`<p><br> </p> <p>foo</p>`	`<p> </p> <p>foo</p>`
`<p>before</p> <br> <p>after</p>`	`<p>before</p> <p><br> </p> <p>after</p>`	`<p>before</p> <p> </p> <p>after</p>`
`<p>before</p> foo<br> <p>after</p>`	`<p>before</p> <p>foo<br> </p> <p>after</p>`	`<p>before</p> <p>foo</p> <p>after</p>`
`<p>before</p> foo<br> {whitespaces} <p>after</p>`	`<p>before</p> <p>foo<br> </p> <p>after</p>`	`<p>before</p> <p>foo</p> <p>after</p>`
`<p>before</p> <p>foo<br> {whitespaces} </p> <p>after</p>`	`<p>before</p> <p>foo<br> </p> <p>after</p>`	`<p>before</p> <p>foo</p> <p>after</p>`
`<p>foo</p> <br>`	`<p>foo</p> <p><br> </p>`	`<p>foo</p> <p> </p>`

Above cases were failing also while   was wrapped with some inline elements (like a link or strong).

The only case that was explicitly handled by the editor was   and it was fragile to any whitespace around BR element.

First solution attempt

I was trying to solve the issue by extending the   case that was handled in DomConverter:

ckeditor5/packages/ckeditor5-engine/src/view/domconverter.js

Lines 1080 to 1083 in 2a22cd9

    
           // Special case for <p><br></p> in which <br> should be treated as filler even when we are not in the 'br' mode. See ckeditor5#5564. 
        
           if ( domNode.tagName === 'BR' && hasBlockParent( domNode, this.blockElements ) && domNode.parentNode.childNodes.length === 1 ) { 
        
           	return true; 
        
           }

But it had problems:

it required using a lot of text trimming logic (that logic is already implemented in DomConverter but it would require triggering it multiple times to handle look-ahead checks (for checking block fillers and to trim text itself)
it was a "hack" because in the data pipeline only NBSP block fillers are legit, BRs were an additional workaround because they would not land normally in the data pipeline
while replacing   by a  in some cases, it generated paragraphs nested in other paragraphs or other block elements that could contain text so the generated view structure would be messed up (but upcast conversion to the model would handle it properly)

Final solution

I decided to move the main solution to softBreak upcast converter because:

the proper NBSP block fillers would be already removed
all text nodes would be properly trimmed (especially useful for lookahead checks)
scenarios of handling BR elements in a specific contexts are much clearer there

You could check how browsers and the editor are handling  s in some cases in the new manual test.

…ata.

oleq

The functionality looks OK.

There's a problem with GHS.
I'd add a regular test for paste from office just for the reference and future developers who may come across this issue.

niegowski · 2022-07-13T12:03:41Z

I just checked how it works while BR is wrapped with some inline element and it differs from the browser rendering. Checking it.

Reinmar · 2022-07-18T09:12:56Z

Internal (engine): Handling of   elements in the data pipeline moved from DomConverter to the softBreak conversion.

I don't know whether we can call this an internal change. The feature was moved from the engine to a plugin. Did you think on how the behavior of the editor without SoftBreak changed? A thing to consider is that it also touches pasting. Editors without this plugin will be quite rare so we can be rather liberal here, but let's at least have this thought through.

Reinmar · 2022-07-18T09:42:58Z

Thanks for writing down these thoughts. It's an interesting idea to move the code to the plugin. It's definitely more elegant if   handling is done on a conversion layer – fewer things to worry about this at that stage.

OTOH, we're now moving logic that was resolving issues related to how semantic data has to be enhanced to show well in a browser (or strip from these enhancements) to the conversion layer. But on the third hand, that's where we clean up the content and fix where it was misformatted (e.g. autoparagraphs, wrong nesting, disallowed children, etc.).

So, an important thing to consider here is that those BRs are not part of the enhancement that we did. We use  s only in the BR mode and those are still "deenhanced" correctly by DomConverter: which guarantees its symmetricity. Other  s can be part of the cleanup done on the conversion layer.

So after giving it a deeper thought, I think the change makes sense.

Reinmar · 2022-07-18T09:45:03Z

One more thing to consider: Performance.

We had poor implementation, but probably negligible from the performance pov. Now, it's more complete, but at what cost?

Outdated.

packages/ckeditor5-clipboard/tests/clipboardpipeline.js

Reinmar · 2022-07-18T10:16:24Z

packages/ckeditor5-typing/tests/input.js

@@ -514,6 +514,43 @@ describe( 'Input feature', () => {
 			expect( getViewData( view ) ).to.equal( '<p><strong>Foo</strong> {}</p>' );
 		} );

+		it( 'should handle bogus br correctly (with custom br to softBreak converter)', () => {


How did you come up with these tests? Did you encounter some regressions after making this change? Does it anyhow affect beforeinput-related work?

The case is that those bogus BRs are stripped while converting to the model (input compares the current model and the model generated from the modified DOM). This case won't affect the beforeinput because it's not used there anymore.

Reinmar · 2022-07-18T10:22:08Z

The first test I done was ctrl+a, ctrl+c in this page (this PR) and paste it to http://localhost:8125/ckeditor5/tests/manual/all-features.html

Unfortunately, on this branch crashes with:

converters.js:849 Uncaught TypeError: Cannot read properties of null (reading 'start')
    at viewToModelListItemChildrenConverter (converters.js:849:1)
    at UpcastDispatcher.viewModelConverter (converters.js:388:1)
    at UpcastDispatcher.fire (emittermixin.js:200:1)
    at UpcastDispatcher._convertItem (upcastdispatcher.js:258:1)
    at UpcastDispatcher._convertChildren (upcastdispatcher.js:291:1)
    at UpcastDispatcher.<anonymous> (upcasthelpers.js:441:1)
    at UpcastDispatcher.fire (emittermixin.js:200:1)
    at UpcastDispatcher._convertItem (upcastdispatcher.js:258:1)
    at UpcastDispatcher._convertChildren (upcastdispatcher.js:291:1)
    at UpcastDispatcher.<anonymous> (upcasthelpers.js:790:1)

The best way to test conversion and anything related to that is to take random pieces of HTML from various websites and just paste :D

Reinmar · 2022-07-18T10:27:13Z

Scenarios to check:

Using shft+enter in "creative" places (including creating multiple breaks) and doing editor.setData( editor.getData() ) to check that nothing was lost.
Pasting content from various websites to check the stability.
Smoketesting pasting from Google Docs and Word.
Performance comparison before vs after.

Reinmar · 2022-07-18T12:23:33Z

I was curious whether the PR indeed fixes the original issue. So, how paste from this document works: https://docs.google.com/document/d/1abo1Ca_RHEmuXnAX6k_z0itHihQb3g9rJRM3x4vQ25w/edit

NOTE: This document is really oddly formatted. When you test a normal document, created with normal paragraphs in Google Docs (I tested some other docs I had), the result is better. So, this entire PR is about fixing a rather specific case, although, apparently, relatively frequent as well.

Before

<p>
    <span style="background-color:transparent;color:#000000;">Sometimes, there’s nothing scarier than a blank page waiting to be filled - especially when it comes to figuring out how to start a resume.&nbsp;</span>
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    <span style="background-color:transparent;color:#000000;">For every position that you apply for, you will need to stand out in a pile of applications through your professional experience, achievements, and education, to impress recruiters.&nbsp;</span>
</p>
<p>
    <br>
    &nbsp;
</p>
<p>
    <span style="background-color:transparent;color:#000000;">Just thinking about all this may make starting your resume seem like a Herculean task.&nbsp;</span>
</p>
<p>
    <br>
    &nbsp;
</p>

On this PR

<p>
    <span style="background-color:transparent;color:#000000;">Sometimes, there’s nothing scarier than a blank page waiting to be filled - especially when it comes to figuring out how to start a resume.&nbsp;</span>
</p>
<p>
    &nbsp;
</p>
<p>
    <span style="background-color:transparent;color:#000000;">For every position that you apply for, you will need to stand out in a pile of applications through your professional experience, achievements, and education, to impress recruiters.&nbsp;</span>
</p>
<p>
    &nbsp;
</p>
<p>
    <span style="background-color:transparent;color:#000000;">Just thinking about all this may make starting your resume seem like a Herculean task.&nbsp;</span>
</p>
<p>
    &nbsp;
</p>

Hmm... there are still empty paragraphs that aren't visible in the original document. If paragraphs are styled to have margins (as in our docs) it looks like this:

So, differently than in Google Docs. However, as I mentioned, the problem is with the styling, not data.

I decided to check CKE4 and TinyMCE's powerpaste out of curiosity too as they worked on their filtering systems for a couple more years. Yet, the results are identical.

CKEditor 4

TinyMCE

To sum up

The PR resolves the reported issue but it's probably even impossible to ensure that the content look very much alike between Google Docs and other systems.

Reinmar · 2022-07-18T12:38:45Z

I did some quick performance tests and for a 12kb of content that's relatively heavy with  s, the difference was negligible: 190ms -> 210ms.

Co-authored-by: Piotrek Koszuliński <[email protected]>

niegowski · 2022-07-18T13:08:11Z

Unfortunately, on this branch crashes with:

converters.js:849 Uncaught TypeError: Cannot read properties of null (reading 'start')
    at viewToModelListItemChildrenConverter (converters.js:849:1)
    at UpcastDispatcher.viewModelConverter (converters.js:388:1)
    at UpcastDispatcher.fire (emittermixin.js:200:1)
    at UpcastDispatcher._convertItem (upcastdispatcher.js:258:1)
    at UpcastDispatcher._convertChildren (upcastdispatcher.js:291:1)
    at UpcastDispatcher.<anonymous> (upcasthelpers.js:441:1)
    at UpcastDispatcher.fire (emittermixin.js:200:1)
    at UpcastDispatcher._convertItem (upcastdispatcher.js:258:1)
    at UpcastDispatcher._convertChildren (upcastdispatcher.js:291:1)
    at UpcastDispatcher.<anonymous> (upcasthelpers.js:790:1)

I narrowed the crashing content to:

<ul>
	<li>foo<br><custom-element></custom-element></li>
</ul>

niegowski · 2022-07-18T14:25:45Z

I narrowed the crashing content to:
<ul>
	<li>foo <custom-element></custom-element></li>
</ul>

So the problem is that <custom-element> and any inline object (like <img>) were ignored while looking for the BR's sibling so the BR converter consumed it but without providing any model element. This crashed on this line:

ckeditor5/packages/ckeditor5-list/src/list/converters.js

Line 849 in 833f395

const convertedChild = result.modelRange.start.nodeAfter;

because there is no result.modelRange for not converted element.

The case of known inline object elements (like <img>) is now solved but the case where there is an unknown inline object is not solved. These cases are fixed:

foo<br><img/>
foo<br><custom-element><img/></custom-element>
<p>foo<br><img/></p>
<p>foo<br><custom-element><img/></custom-element></p>

But if the custom element does not contain any content then it crashes:

foo<br><custom-element></custom-element>

But there might be more cases where this would fail because of the line mentioned above.

Reinmar · 2022-07-19T07:02:58Z

I'm worried that in ab3e3ad you added just a single test. I'd generate a couple more for weird variations: elements before, multiple elements one by one, other inline elements, etc. I found that the tree-walking logic is often incomplete.

Reinmar · 2022-07-19T07:22:45Z

I did some more smoke tests and also checked whether we didn't start losing more  s. The results are good:

(left is on the PR, right is master)

If there are differences, they look like this one.

niegowski · 2022-07-19T09:12:04Z

I found another case that I missed before:

<p><br><br><br></p>

should remove only the last BR element, others should be converted as softBreaks (it's fixed in the last commit).

niegowski · 2022-07-19T09:40:52Z

More details on the crashing case (fixed in one of the previous commits):

The input data:

<ul>
	<li>foo<br></li>
</ul>

The crashing line was:

ckeditor5/packages/ckeditor5-list/src/list/converters.js

Line 849 in 685dfdb

const convertedChild = result.modelRange.start.nodeAfter;

because of the assumption that conversionApi.convertItem() would always convert a given view item and the result.modelRange would be there. This is mostly true because even if some element does not get converted by any dedicated feature then the fallback converter kicks in and converts the element content so there almost always is some modelRange:

ckeditor5/packages/ckeditor5-engine/src/conversion/upcasthelpers.js

Lines 440 to 445 in 685dfdb

    
           if ( !data.modelRange && conversionApi.consumable.consume( data.viewItem, { name: true } ) ) { 
        
           	const { modelRange, modelCursor } = conversionApi.convertChildren( data.viewItem, data.modelCursor ); 
        
           	data.modelRange = modelRange; 
        
           	data.modelCursor = modelCursor; 
        
           }

there is one exception, the fallback converter expects that the element was not consumed and modelRange was not set by any converter. In the BR case, the element itself was consumed so the fallback converter did not kick in and modelRange was left unset. BR converter must consume the view element because otherwise the GHS conversion would try to convert it and we don't want that.

So if any feature is using conversionApi.convertItem() and is not verifying whether the given view node got converted then it could crash (in the feature code). Of course, this applies to generic cases that do not use it to convert a specific view element (we use it in the table, and image features but we pass a very specific element that we know will be converted - img or table). The conversionApi.convertChildren() is safe because it verifies whether the modelRange was returned from the single child conversion and also always returns the range (sometimes collapsed at the modelCursor position).

Reinmar · 2022-07-19T10:45:21Z

So if any feature is using conversionApi.convertItem() and is not verifying whether the given view node got converted then it could crash (in the feature code). Of course, this applies to generic cases that do not use it to convert a specific view element (we use it in the table, and image features but we pass a very specific element that we know will be converted - img or table). The conversionApi.convertChildren() is safe because it verifies whether the modelRange was returned from the single child conversion and also always returns the range (sometimes collapsed at the modelCursor position).

In other words, the frequency will be really low. There has to be a combination of a custom converter, using convertItem(), not checking the range, and having   in the content inside that element.

niegowski · 2022-07-19T11:20:48Z

In other words, the frequency will be really low. There has to be a combination of a custom converter, using convertItem(), not checking the range, and having   in the content inside that element.

Yes.

dufipl · 2022-07-19T14:24:25Z

At this moment after some first testing and going through those points mainly:

Scenarios to check:

Using shft+enter in "creative" places (including creating multiple breaks) and doing editor.setData( editor.getData() ) to check that nothing was lost.

Pasting content from various websites to check the stability.

Smoketesting pasting from Google Docs and Word.

I was not able to crash the editor or see other issues.

I can see improvement regarding the original ticket and only the styling issue (line spacing) remained, but it was discussed above.

So to sum up, I haven't spotted new issues, but I think we should test this heavily during the Testing Phase.

Reinmar · 2022-07-20T05:00:53Z

CI was clean apart from a usual 404 in manual tests.

Reinmar · 2022-07-29T08:19:24Z

This change will be reverted in #12183.

niegowski added 4 commits July 6, 2022 18:07

PoC of better handling of BR elements while processing editor input d…

bb78d2d

…ata.

Added special handling for elements between blocks.

8f1aead

Adding tests.

0c25747

The handling moved to upcast converter of softBreak elements.

648ef79

niegowski changed the title ~~[WiP] Better block filler handling~~ [WiP] Better BR elements handling Jul 8, 2022

niegowski added 6 commits July 8, 2022 16:54

Moved manual test to correct package.

d0ebe88

Added tests.

decefca

Added tests for missing coverage.

5963fe5

Added more tests.

962eaec

Added test for pasting content with BRs between blocks.

b5fd820

Added manual test description.

bdaac34

niegowski marked this pull request as ready for review July 11, 2022 13:52

niegowski changed the title ~~[WiP] Better BR elements handling~~ Better BR elements handling Jul 11, 2022

Merge branch 'master' into ck/10217

95dd887

oleq previously requested changes Jul 12, 2022

View reviewed changes

niegowski added 2 commits July 12, 2022 13:19

Fixed GHS integration - ignored br elements must be consumed.

71d4cad

Added test in paste-from-office.

afc1a81

niegowski marked this pull request as draft July 13, 2022 15:58

niegowski added 3 commits July 13, 2022 18:30

Expanded BR upcast to support wrapping with inline elements.

99344dc

Code cleaning.

2a0ec86

Added tests.

cddc44e

niegowski marked this pull request as ready for review July 14, 2022 15:53

Updated manual test styles.

34654e9

Reinmar requested a review from oleq July 18, 2022 09:48

Reinmar reviewed Jul 18, 2022

View reviewed changes

packages/ckeditor5-clipboard/tests/clipboardpipeline.js Show resolved Hide resolved

Reinmar reviewed Jul 18, 2022

View reviewed changes

Merge branch 'master' into ck/10217

254f877

Apply review comment.

04d66a0

Co-authored-by: Piotrek Koszuliński <[email protected]>

Fixed integration with inline objects.

ab3e3ad

Fixed list upcast converter.

cb54e9a

niegowski added 2 commits July 19, 2022 11:08

Fixed multiple BR elements handling at the end of a block.

6b707b3

Added test.

e6c098c

Merge branch 'master' into ck/10217

f2bffce

Reinmar merged commit 394a19b into master Jul 20, 2022

Reinmar deleted the ck/10217 branch July 20, 2022 05:03

Reinmar mentioned this pull request Jul 29, 2022

Revert "Better BR elements handling" #12183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better BR elements handling #12033

Better BR elements handling #12033

niegowski commented Jul 7, 2022 •

edited

Loading

oleq left a comment

niegowski commented Jul 13, 2022

Reinmar commented Jul 18, 2022

Reinmar commented Jul 18, 2022

Reinmar commented Jul 18, 2022

Reinmar Jul 18, 2022

niegowski Jul 18, 2022

Reinmar commented Jul 18, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022

niegowski commented Jul 18, 2022

niegowski commented Jul 18, 2022

Reinmar commented Jul 19, 2022

Reinmar commented Jul 19, 2022

niegowski commented Jul 19, 2022

niegowski commented Jul 19, 2022

Reinmar commented Jul 19, 2022

niegowski commented Jul 19, 2022

dufipl commented Jul 19, 2022 •

edited

Loading

Reinmar commented Jul 20, 2022

Reinmar commented Jul 29, 2022

	// Special case for <p><br></p> in which <br> should be treated as filler even when we are not in the 'br' mode. See ckeditor5#5564.
	if ( domNode.tagName === 'BR' && hasBlockParent( domNode, this.blockElements ) && domNode.parentNode.childNodes.length === 1 ) {
	return true;
	}

Better BR elements handling #12033

Better BR elements handling #12033

Conversation

niegowski commented Jul 7, 2022 • edited Loading

Suggested merge commit message (convention)

Additional information

The broken cases

First solution attempt

Final solution

oleq left a comment

Choose a reason for hiding this comment

niegowski commented Jul 13, 2022

Reinmar commented Jul 18, 2022

Reinmar commented Jul 18, 2022

Reinmar commented Jul 18, 2022

Reinmar Jul 18, 2022

Choose a reason for hiding this comment

niegowski Jul 18, 2022

Choose a reason for hiding this comment

Reinmar commented Jul 18, 2022 • edited Loading

Reinmar commented Jul 18, 2022 • edited Loading

Reinmar commented Jul 18, 2022 • edited Loading

Before

On this PR

CKEditor 4

TinyMCE

To sum up

Reinmar commented Jul 18, 2022

niegowski commented Jul 18, 2022

niegowski commented Jul 18, 2022

Reinmar commented Jul 19, 2022

Reinmar commented Jul 19, 2022

niegowski commented Jul 19, 2022

niegowski commented Jul 19, 2022

Reinmar commented Jul 19, 2022

niegowski commented Jul 19, 2022

dufipl commented Jul 19, 2022 • edited Loading

Reinmar commented Jul 20, 2022

Reinmar commented Jul 29, 2022

niegowski commented Jul 7, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022 •

edited

Loading

Reinmar commented Jul 18, 2022 •

edited

Loading

dufipl commented Jul 19, 2022 •

edited

Loading