Support for pasting flat lists #7

f1ames · 2018-08-23T07:02:15Z

Suggested merge commit message (convention)

Feature: Support for pasting flat lists from Word. Closes ckeditor/ckeditor5#2482.

Additional information

This PR provides support for handling lists pasted from Microsoft Word (2016). This is the first feature in the Paste from Word plugin which uses any filtering/normalization so the whole base of the plugin was created.

The normalization/filtering itself is a pipeline-like: https://github.com/ckeditor/ckeditor5-paste-from-word/blob/3d105fc433f498b04907289387ef3ebed6d39e97/src/pastefromword.js#L62-L67 I tried to make each function to do one simple thing so they are reusable, could be called separately and in any order. This should provide more flexibility when working with different Paste from Soemthing cases.
As this is the first feature, that is the moment we are deciding on an architecture which will be used (to some degree) with this plugin so it is important to take detailed look into it :)

This PR relays also on ckeditor/ckeditor5-engine#1503, so it will not pass on CI without it and should not be merged untill https://github.com/ckeditor/ckeditor5-engine/issues/1501 is closed.

…' function call.

Mgsy · 2018-08-23T11:05:38Z

I've tested these changes and pasting flat lists from MS Word works fine 👌

Reinmar · 2018-08-24T14:44:31Z

src/filters/common.js

+ * @license Copyright (c) 2003-2018, CKSource - Frederico Knabben. All rights reserved.
+ * For licensing, see LICENSE.md.
+ */
+


Missing @module definition. Also, I think we can mark it as @protected.

Besides, we usually call such modules utils.js. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js, base.js, preprocessing.js, etc.

Another thing – what's the chance that these functions will be ever used (not counting tests) alone, not all at once? Isn't the only use case for them being called together so to create a data[body/styles] object? If so – a single parse() function would do better. It can do all those things at once. You can still export utils functions but then only for testing purposes.

By focusing on one function you limit the API surface and make the code less prone to changes.

The idea was that for content pasted from Word you need to extract and parse the body and do same for CSS. For content from google docs you just need to parse the input as a body because there is not head or style tag. For input from another text editor you may need other combination of functions to extract what is needed to properly transform the content.

So it really depends on an input. Ofc parse() function may be created in such way that it checks what elements are in the provided input and parses them accordingly. Or for each type of input, different parse() function can be used.

Besides, we usually call such modules utils.js. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js, base.js, preprocessing.js, etc.

Do you mean to put each function in a separate file? And then create a parse() function which will import and use them?

Reinmar · 2018-08-24T14:58:39Z

src/filters/common.js

+/**
+ * Extracts `body` tag contents from the provided HTML string.
+ *
+ * @param {Object} data


I don't like how these functions are designed to work. From the point of view of these utils there's very little sense in working on some arbitrary data object. E.g. this specific function accepts an HTML string and returns another HTML string. Period. The data object is a noise here. Plus, modifying an object that was passed to a function isn't usually a good practice.

Imagine, for instance, than in a month's time you'll need to use it somewhere else to process some HTML. This time, not together with the transformInput() function, but alone. You will either have to change this function or use it with some object created for that call.

True, I had some doubts about it too TBH. Makes perfect sense to refactor them in a way you mentioned 👍

Reinmar · 2018-08-24T15:01:37Z

src/filters/common.js

+ * @returns {Object} result
+ * @returns {String|null} result.body Extracted `body` contents. If `body` tag was not present or empty, `null` is returned.
+ */
+export function extractBody( data ) {


I wonder whether this function could/should use DOMParser. If so, it'd be good to use DOMParser once for the whole process, so it'd have to look a bit differently.

It should make sense, but then it will have to return something like DocumentFragment or a View instance. So it will be extract+parseBody basically. It seems for now that there is no need to operate on body string so we may just combine these two steps into one (because HtmlDataProcessor uses DOMParser internally). WDYT?

Reinmar · 2018-08-24T15:15:50Z

src/pastefromword.js

+	 * @param {String} input Word input.
+	 * @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} view Normalized input.
+	 */
+	_normalizeWordInput( input, editor ) {


Is this function necessary? The listener, in which it's used is fairly short, so it doesn't seem so.

TBH, I have extracted it so it can be easily unit tested, that was the main reason. We may inline it but then tests will have to listen to inputTransformation event to get the normalized data. Since those unit tests use only Paste from Word and Clipboard plugin there is rather a small chance that something in Clipboard plugin will change that can interfere with the normalized input before it is validated in a test 🤔

Reinmar · 2018-08-24T15:31:10Z

src/pastefromword.js

+		const editor = this.editor;
+		const document = editor.editing.view.document;
+
+		this.listenTo( document, 'clipboardInput', ( evt, data ) => {


Can't we make this plugin listen to inputTransformation? We talked that it misses some information now (dataTransfer), but it'd be perfect if it fitted into the existing architecture (where transformation was supposed to happen on inputTransformation). Otherwise, we need to think whether the architecture works fine.

I put this to a further discussion here - https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141 and then it slipped out somehow - @Reinmar could you take a look on my linked comment, because I'm still unsure about this one.

Reinmar · 2018-08-27T15:08:18Z

src/filters/common.js

+// @param {String} cssString String containing CSS rules/stylsheet to be parsed.
+// @param {Document} domDocument Document used to create helper element in which stylesheet will be injected.
+// @returns {CSSStyleSheet} Native `CSSStyleSheet` object containing parsed styles.
+function parseCSS( cssString, domDocument ) {


It should be parseCss().

But parseJS() ;> It's even documented: https://github.com/ckeditor/ckeditor5-design/wiki/Code-Style-Naming-Guidelines#acronyms-and-proper-names :D

f1ames · 2018-08-28T14:32:36Z

I have combined all common filters into one parseHtml() function in filters/utils.js. This made the code much shorter and simpler. It uses DOMParser internally which helped with extracting body and styles contents.

Is this function necessary? The listener, in which it's used is fairly short, so it doesn't seem so.

As for _normalizedWordInput, I left the function as is, but marked as @protected and added a notice in a description that it was exposed mainly for testing purposes.

The only unresolved issue is #7 (comment):

Can't we make this plugin listen to inputTransformation? We talked that it misses some information now (dataTransfer), but it'd be perfect if it fitted into the existing architecture (where transformation was supposed to happen on inputTransformation). Otherwise, we need to think whether the architecture works fine.

So we need to discuss https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141.

f1ames · 2018-08-31T14:11:29Z

4 failing tests on CI are basic styles integration tests also failing on master due to #8.

Reinmar

A couple minor issues.

Reinmar · 2018-09-21T10:07:19Z

src/pastefromoffice.js

+
+/**
+ * This plugin handles content pasted from Word and transforms it (if necessary)
+ * to format suitable for editor {@link module:engine/model/model~Model}.


The Paste from Office plugin. This plugin handles content pasted from Office apps (for now only Word) and transforms it (if necessary) to a valid structure which can then be understood by the editor features. For more information about this feature check the {@glink api/paste-from-office package page}.

f1ames · 2018-09-24T11:54:16Z

I have refactored the code and docs as suggested. Built docs to check if everything looks good. Also skipped 4 failing unit tests mentioned earlier so CI should be green.

…iterator' on Edge.

f1ames · 2018-09-24T13:01:53Z

Ready for another review round.

Reinmar · 2018-09-24T20:35:06Z

src/filters/list.js

+ * @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} The view
+ * structure instance with list-like elements transformed into semantic lists.
+ */
+export function transformParagraphsToLists( bodyView, stylesString ) {


So, how I read this function is:

Get first child of body view... Why of a body? Can't this be of any container?

Then... if the first child exists do something. Why is this first child that important? I thought we're filtering all children.

Now, we want to find all list nodes... at a position before the first child?

And we create lists for these... list nodes? Where do we create them? If we create them why isn't createLists() returning them?

And then we return the bodyView again. Which can be a Node? Or a doc frag? How can a body be both. This function is used in one place in the code, so why do we support both things?

How to improve this code? You need to think about signatures of the functions that you create. And how you use them and combine them together.

transformParagraphsToLists() should support just one thing – either a container or a doc frag. Not both. You can choose what you need based on the most common use of this function. And there's just one place where you use it so it'll be easy :P.

If you want to keep the outline of this function similar to what it looks now then findAllListNodes() should be corrected as well. It's unjustified from the signature of this function to work on a position. In 99.9% of cases you look for stuff in a container or a range. Even TreeWalker does that. It can accept a start position but this isn't necessary to pass it there (in this case where you scan the whole body anyway). Nor should this fact leak and pollute an unrealted method like transformParagraphsToLists().

Then, once you'll have all list nodes... Wait... are these list nodes already? I thought they are paragraphs (based on the name of this outer function). So they are neither lists nor nodes. They are list-item-like-elements. The naming should be consequent in your code. Furthermore, from the docs I saw that <h1> may represent a list item as well so... transformParagraphsToLists() should also be renamed.

OK, so we finally have that listItemLikeElements array. Let's transform these elements now. But not via a createLists() function unless we're creating lists. We are transforming X into Y. Again – naming needs to be adjusted.

Finally, to balance the amount of logic between functions, I'd recommend moving the loop out of the current createLists() to the outer function because it'll make that singular transformXIntoY() easier to test and debug.

Returning bodyView isn't necessary at this stage. I know that you anticipate there to be filters which may return something else than they accepted, but I can't see such a filter today so KISS. Don't build abstractions/mechanisms which you don't need yet.

Reinmar

Code improvements needed.

Reinmar · 2018-09-25T11:24:35Z

src/filters/list.js

+import Element from '@ckeditor/ckeditor5-engine/src/view/element';
+import Matcher from '@ckeditor/ckeditor5-engine/src/view/matcher';
+import Range from '@ckeditor/ckeditor5-engine/src/view/range';
+import TreeWalker from '@ckeditor/ckeditor5-engine/src/view/treewalker';


PS. I forgot to add – you don't need to import the tree walker. You can use Range#getWalker() or, if you don't need to set any walker params, simply iterate over the range.

f1ames · 2018-09-25T11:43:21Z

I have refactored entire list filter to be more readable following @Reinmar suggestions. The code definitely looks better IMHO 👍

Ready for review. cc @Reinmar

Reinmar · 2018-10-25T18:35:07Z

src/filters/list.js

+		let currentList = null;
+
+		for ( let i = 0; i < listLikeItems.length; i++ ) {
+			if ( !currentList || listLikeItems[ i - 1 ].id !== listLikeItems[ i ].id ) {


I had to read this code very carefully and find what's that id property to understand what this fragment does. Once you realise that we indeed have to care about creating <ul/ol> elements it gets obvious but the logic isn't that evident. So, I'll propose a bit different solution which will also avoid having to return from insertEmptyList().

So, I'll propose a bit different solution which will also avoid having to return from insertEmptyList().

Actually, it's enough to improve its name a bit to indicate it's a new element.

Reinmar · 2018-10-25T18:35:38Z

src/filters/list.js

+
+	const listLikeItems = findAllListItemLikeElements( documentFragment );
+
+	if ( listLikeItems.length ) {


Is there anything to do here if this array is empty?

BTW, "list like items"? Should be "list item like elements"

Reinmar · 2018-10-25T18:55:23Z

src/filters/list.js

+				currentList = insertEmptyList( listStyle, listLikeItems[ i ].element, writer );
+			}
+
+			const listItem = transformElementIntoListItem( listLikeItems[ i ].element, writer );


I had to read deep into the code to understand why we don't remove the old element if this function returns a new element. You need to know about the internals of this function to figure this out.

In short – I wouldn't use rename() in such a context for its complicated nature. I'd rather explicitly create a new element and remove the old one.

But this is a bit too much work now, so I'm skipping it.

f1ames added 14 commits August 13, 2018 13:43

Tests: lists integration tests.

6a2f58c

Tests: Added PFW plugin to manual test.

3138aa7

Tests: Updated lists integration tests.

8d3f491

Paste from Word plugin added with flat lists support.

9069ff0

Removed 'stringifyView' filter wrapper.

dc40c3d

Tests: common filters unit tests.

b71c6c6

Tests: list filter unit tests.

a64bcdf

Tests: general plugin tests.

17b8fa8

The 'bodyToView' filter return type adjustments.

13eccae

Tests: updated tests structure.

681da17

Tests: Moved integration tests to different directory.

10b2890

Tests: Lists normalization unit tests.

054f704

Tests: list integration tests now validates input of 'insertContent()…

ff66a56

…' function call.

Improved docs.

3d105fc

Reinmar reviewed Aug 24, 2018

View reviewed changes

Merge branch 'master' into t/5

3bee569

Reinmar reviewed Aug 27, 2018

View reviewed changes

f1ames added 4 commits August 28, 2018 11:28

Adjustments to new 'Paste from Office' name.

8595645

Common filters combined as one 'parseHtml()' function.

4e5f298

Tests: adjusted tests to new filters structure.

db5cc8e

Docs rewording.

9a28f5b

f1ames added 2 commits August 31, 2018 15:58

'UpcastWriter' calls adjusted.

f702773

Missing dev dependencies added.

c88136b

Reinmar suggested changes Sep 21, 2018

View reviewed changes

Reinmar added the status:review- label Sep 21, 2018

f1ames added 2 commits September 24, 2018 13:49

Code and docs adjustments.

1154c56

Tests: skip 4 failing unit test.

0173707

f1ames added 2 commits September 24, 2018 14:29

Use 'cssRules' instead of 'rules' when processing styles.

8954a84

Fix for 'TypeError: Object doesn't support property or method Symbol.…

4408c1b

…iterator' on Edge.

f1ames requested a review from Reinmar September 24, 2018 13:01

Reinmar removed the status:review- label Sep 24, 2018

Reinmar reviewed Sep 24, 2018

View reviewed changes

Reinmar suggested changes Sep 24, 2018

View reviewed changes

f1ames added 2 commits September 25, 2018 13:08

List filter refactoring.

ffec498

Wording. [skip ci]

84a87ef

Reinmar reviewed Sep 25, 2018

View reviewed changes

f1ames added 2 commits September 25, 2018 13:28

Tests: Unit test for empty style tag handling. Bring CC back to 100%.

7389dfe

Get rid of TreeWalker.

1671ea0

f1ames requested a review from Reinmar September 25, 2018 11:43

Other: Updated dev deps versions.

9ffb255

Reinmar reviewed Oct 25, 2018

View reviewed changes

Reinmar added 2 commits October 25, 2018 21:01

Various improvements.

c374a26

Updated dependencies.

9811665

Reinmar approved these changes Oct 25, 2018

View reviewed changes

Reinmar merged commit d72e6cd into master Oct 25, 2018

Reinmar deleted the t/5 branch October 25, 2018 19:12

This was referenced Oct 9, 2019

Paste from Office support for Safari ckeditor/ckeditor5#2511

Closed

Support for basic list indentation when pasting from Word ckeditor/ckeditor5#2518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for pasting flat lists #7

Support for pasting flat lists #7

f1ames commented Aug 23, 2018 •

edited by pomek

Loading

Mgsy commented Aug 23, 2018

Reinmar Aug 24, 2018

Reinmar Aug 24, 2018

Reinmar Aug 24, 2018

f1ames Aug 27, 2018

Reinmar Aug 24, 2018

f1ames Aug 27, 2018

Reinmar Aug 24, 2018

f1ames Aug 27, 2018

Reinmar Aug 24, 2018

f1ames Aug 27, 2018

Reinmar Aug 24, 2018

f1ames Aug 27, 2018

Reinmar Aug 27, 2018

Reinmar Aug 27, 2018

f1ames commented Aug 28, 2018

f1ames commented Aug 31, 2018

Reinmar left a comment

Reinmar Sep 21, 2018

f1ames commented Sep 24, 2018

f1ames commented Sep 24, 2018

Reinmar Sep 24, 2018

Reinmar Sep 24, 2018

Reinmar left a comment

Reinmar Sep 25, 2018

f1ames commented Sep 25, 2018

Reinmar Oct 25, 2018

Reinmar Oct 25, 2018

Reinmar Oct 25, 2018 •

edited

Loading

Reinmar Oct 25, 2018

Reinmar Oct 25, 2018

Reinmar Oct 25, 2018

Reinmar Oct 25, 2018


		const listLikeItems = findAllListItemLikeElements( documentFragment );

		if ( listLikeItems.length ) {

Support for pasting flat lists #7

Support for pasting flat lists #7

Conversation

f1ames commented Aug 23, 2018 • edited by pomek Loading

Suggested merge commit message (convention)

Additional information

Mgsy commented Aug 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f1ames commented Aug 28, 2018

f1ames commented Aug 31, 2018

Reinmar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f1ames commented Sep 24, 2018

f1ames commented Sep 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Reinmar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f1ames commented Sep 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Reinmar Oct 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f1ames commented Aug 23, 2018 •

edited by pomek

Loading

Reinmar Oct 25, 2018 •

edited

Loading