Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Support for pasting flat lists #7

Merged
merged 34 commits into from
Oct 25, 2018
Merged

Support for pasting flat lists #7

merged 34 commits into from
Oct 25, 2018

Conversation

f1ames
Copy link
Contributor

@f1ames f1ames commented Aug 23, 2018

Suggested merge commit message (convention)

Feature: Support for pasting flat lists from Word. Closes ckeditor/ckeditor5#2482.


Additional information

This PR provides support for handling lists pasted from Microsoft Word (2016). This is the first feature in the Paste from Word plugin which uses any filtering/normalization so the whole base of the plugin was created.

The normalization/filtering itself is a pipeline-like: https://github.com/ckeditor/ckeditor5-paste-from-word/blob/3d105fc433f498b04907289387ef3ebed6d39e97/src/pastefromword.js#L62-L67 I tried to make each function to do one simple thing so they are reusable, could be called separately and in any order. This should provide more flexibility when working with different Paste from Soemthing cases.
As this is the first feature, that is the moment we are deciding on an architecture which will be used (to some degree) with this plugin so it is important to take detailed look into it :)

This PR relays also on ckeditor/ckeditor5-engine#1503, so it will not pass on CI without it and should not be merged untill https://github.com/ckeditor/ckeditor5-engine/issues/1501 is closed.

@Mgsy
Copy link
Member

Mgsy commented Aug 23, 2018

I've tested these changes and pasting flat lists from MS Word works fine 👌

* @license Copyright (c) 2003-2018, CKSource - Frederico Knabben. All rights reserved.
* For licensing, see LICENSE.md.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing @module definition. Also, I think we can mark it as @protected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, we usually call such modules utils.js. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js, base.js, preprocessing.js, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing – what's the chance that these functions will be ever used (not counting tests) alone, not all at once? Isn't the only use case for them being called together so to create a data[body/styles] object? If so – a single parse() function would do better. It can do all those things at once. You can still export utils functions but then only for testing purposes.

By focusing on one function you limit the API surface and make the code less prone to changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was that for content pasted from Word you need to extract and parse the body and do same for CSS. For content from google docs you just need to parse the input as a body because there is not head or style tag. For input from another text editor you may need other combination of functions to extract what is needed to properly transform the content.

So it really depends on an input. Ofc parse() function may be created in such way that it checks what elements are in the provided input and parses them accordingly. Or for each type of input, different parse() function can be used.

Besides, we usually call such modules utils.js. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js, base.js, preprocessing.js, etc.

Do you mean to put each function in a separate file? And then create a parse() function which will import and use them?

/**
* Extracts `body` tag contents from the provided HTML string.
*
* @param {Object} data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how these functions are designed to work. From the point of view of these utils there's very little sense in working on some arbitrary data object. E.g. this specific function accepts an HTML string and returns another HTML string. Period. The data object is a noise here. Plus, modifying an object that was passed to a function isn't usually a good practice.

Imagine, for instance, than in a month's time you'll need to use it somewhere else to process some HTML. This time, not together with the transformInput() function, but alone. You will either have to change this function or use it with some object created for that call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I had some doubts about it too TBH. Makes perfect sense to refactor them in a way you mentioned 👍

* @returns {Object} result
* @returns {String|null} result.body Extracted `body` contents. If `body` tag was not present or empty, `null` is returned.
*/
export function extractBody( data ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this function could/should use DOMParser. If so, it'd be good to use DOMParser once for the whole process, so it'd have to look a bit differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should make sense, but then it will have to return something like DocumentFragment or a View instance. So it will be extract+parseBody basically. It seems for now that there is no need to operate on body string so we may just combine these two steps into one (because HtmlDataProcessor uses DOMParser internally). WDYT?

* @param {String} input Word input.
* @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} view Normalized input.
*/
_normalizeWordInput( input, editor ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function necessary? The listener, in which it's used is fairly short, so it doesn't seem so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I have extracted it so it can be easily unit tested, that was the main reason. We may inline it but then tests will have to listen to inputTransformation event to get the normalized data. Since those unit tests use only Paste from Word and Clipboard plugin there is rather a small chance that something in Clipboard plugin will change that can interfere with the normalized input before it is validated in a test 🤔

const editor = this.editor;
const document = editor.editing.view.document;

this.listenTo( document, 'clipboardInput', ( evt, data ) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we make this plugin listen to inputTransformation? We talked that it misses some information now (dataTransfer), but it'd be perfect if it fitted into the existing architecture (where transformation was supposed to happen on inputTransformation). Otherwise, we need to think whether the architecture works fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this to a further discussion here - https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141 and then it slipped out somehow - @Reinmar could you take a look on my linked comment, because I'm still unsure about this one.

// @param {String} cssString String containing CSS rules/stylsheet to be parsed.
// @param {Document} domDocument Document used to create helper element in which stylesheet will be injected.
// @returns {CSSStyleSheet} Native `CSSStyleSheet` object containing parsed styles.
function parseCSS( cssString, domDocument ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be parseCss().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@f1ames
Copy link
Contributor Author

f1ames commented Aug 28, 2018

I have combined all common filters into one parseHtml() function in filters/utils.js. This made the code much shorter and simpler. It uses DOMParser internally which helped with extracting body and styles contents.

Is this function necessary? The listener, in which it's used is fairly short, so it doesn't seem so.

As for _normalizedWordInput, I left the function as is, but marked as @protected and added a notice in a description that it was exposed mainly for testing purposes.


The only unresolved issue is #7 (comment):

Can't we make this plugin listen to inputTransformation? We talked that it misses some information now (dataTransfer), but it'd be perfect if it fitted into the existing architecture (where transformation was supposed to happen on inputTransformation). Otherwise, we need to think whether the architecture works fine.

So we need to discuss https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141.

@f1ames
Copy link
Contributor Author

f1ames commented Aug 31, 2018

4 failing tests on CI are basic styles integration tests also failing on master due to #8.

image

Copy link
Member

@Reinmar Reinmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor issues.


/**
* This plugin handles content pasted from Word and transforms it (if necessary)
* to format suitable for editor {@link module:engine/model/model~Model}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Paste from Office plugin.

This plugin handles content pasted from Office apps (for now only Word) and transforms it (if necessary)
to a valid structure which can then be understood by the editor features.

For more information about this feature check the {@glink api/paste-from-office package page}.

@f1ames
Copy link
Contributor Author

f1ames commented Sep 24, 2018

I have refactored the code and docs as suggested. Built docs to check if everything looks good. Also skipped 4 failing unit tests mentioned earlier so CI should be green.

@f1ames f1ames requested a review from Reinmar September 24, 2018 13:01
@f1ames
Copy link
Contributor Author

f1ames commented Sep 24, 2018

Ready for another review round.

* @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} The view
* structure instance with list-like elements transformed into semantic lists.
*/
export function transformParagraphsToLists( bodyView, stylesString ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how I read this function is:

  1. Get first child of body view... Why of a body? Can't this be of any container?
  2. Then... if the first child exists do something. Why is this first child that important? I thought we're filtering all children.
  3. Now, we want to find all list nodes... at a position before the first child?
  4. And we create lists for these... list nodes? Where do we create them? If we create them why isn't createLists() returning them?
  5. And then we return the bodyView again. Which can be a Node? Or a doc frag? How can a body be both. This function is used in one place in the code, so why do we support both things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to improve this code? You need to think about signatures of the functions that you create. And how you use them and combine them together.

  1. transformParagraphsToLists() should support just one thing – either a container or a doc frag. Not both. You can choose what you need based on the most common use of this function. And there's just one place where you use it so it'll be easy :P.
  2. If you want to keep the outline of this function similar to what it looks now then findAllListNodes() should be corrected as well. It's unjustified from the signature of this function to work on a position. In 99.9% of cases you look for stuff in a container or a range. Even TreeWalker does that. It can accept a start position but this isn't necessary to pass it there (in this case where you scan the whole body anyway). Nor should this fact leak and pollute an unrealted method like transformParagraphsToLists().
  3. Then, once you'll have all list nodes... Wait... are these list nodes already? I thought they are paragraphs (based on the name of this outer function). So they are neither lists nor nodes. They are list-item-like-elements. The naming should be consequent in your code. Furthermore, from the docs I saw that <h1> may represent a list item as well so... transformParagraphsToLists() should also be renamed.
  4. OK, so we finally have that listItemLikeElements array. Let's transform these elements now. But not via a createLists() function unless we're creating lists. We are transforming X into Y. Again – naming needs to be adjusted.
  5. Finally, to balance the amount of logic between functions, I'd recommend moving the loop out of the current createLists() to the outer function because it'll make that singular transformXIntoY() easier to test and debug.
  6. Returning bodyView isn't necessary at this stage. I know that you anticipate there to be filters which may return something else than they accepted, but I can't see such a filter today so KISS. Don't build abstractions/mechanisms which you don't need yet.

Copy link
Member

@Reinmar Reinmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code improvements needed.

import Element from '@ckeditor/ckeditor5-engine/src/view/element';
import Matcher from '@ckeditor/ckeditor5-engine/src/view/matcher';
import Range from '@ckeditor/ckeditor5-engine/src/view/range';
import TreeWalker from '@ckeditor/ckeditor5-engine/src/view/treewalker';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS. I forgot to add – you don't need to import the tree walker. You can use Range#getWalker() or, if you don't need to set any walker params, simply iterate over the range.

@f1ames
Copy link
Contributor Author

f1ames commented Sep 25, 2018

I have refactored entire list filter to be more readable following @Reinmar suggestions. The code definitely looks better IMHO 👍

Ready for review. cc @Reinmar

@f1ames f1ames requested a review from Reinmar September 25, 2018 11:43
let currentList = null;

for ( let i = 0; i < listLikeItems.length; i++ ) {
if ( !currentList || listLikeItems[ i - 1 ].id !== listLikeItems[ i ].id ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this code very carefully and find what's that id property to understand what this fragment does. Once you realise that we indeed have to care about creating <ul/ol> elements it gets obvious but the logic isn't that evident. So, I'll propose a bit different solution which will also avoid having to return from insertEmptyList().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'll propose a bit different solution which will also avoid having to return from insertEmptyList().

Actually, it's enough to improve its name a bit to indicate it's a new element.


const listLikeItems = findAllListItemLikeElements( documentFragment );

if ( listLikeItems.length ) {
Copy link
Member

@Reinmar Reinmar Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything to do here if this array is empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, "list like items"? Should be "list item like elements"

currentList = insertEmptyList( listStyle, listLikeItems[ i ].element, writer );
}

const listItem = transformElementIntoListItem( listLikeItems[ i ].element, writer );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read deep into the code to understand why we don't remove the old element if this function returns a new element. You need to know about the internals of this function to figure this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short – I wouldn't use rename() in such a context for its complicated nature. I'd rather explicitly create a new element and remove the old one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is a bit too much work now, so I'm skipping it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling flat lists
3 participants