-
Notifications
You must be signed in to change notification settings - Fork 18
Support for pasting flat lists #7
Conversation
I've tested these changes and pasting flat lists from MS Word works fine 👌 |
src/filters/common.js
Outdated
* @license Copyright (c) 2003-2018, CKSource - Frederico Knabben. All rights reserved. | ||
* For licensing, see LICENSE.md. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing @module
definition. Also, I think we can mark it as @protected
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, we usually call such modules utils.js
. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js
, base.js
, preprocessing.js
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing – what's the chance that these functions will be ever used (not counting tests) alone, not all at once? Isn't the only use case for them being called together so to create a data[body/styles]
object? If so – a single parse()
function would do better. It can do all those things at once. You can still export utils functions but then only for testing purposes.
By focusing on one function you limit the API surface and make the code less prone to changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that for content pasted from Word you need to extract and parse the body and do same for CSS. For content from google docs you just need to parse the input as a body
because there is not head
or style
tag. For input from another text editor you may need other combination of functions to extract what is needed to properly transform the content.
So it really depends on an input. Ofc parse()
function may be created in such way that it checks what elements are in the provided input and parses them accordingly. Or for each type of input, different parse()
function can be used.
Besides, we usually call such modules utils.js. Although, since those 3 functions here do the base preparations, I'd consider calling them init.js, base.js, preprocessing.js, etc.
Do you mean to put each function in a separate file? And then create a parse()
function which will import and use them?
src/filters/common.js
Outdated
/** | ||
* Extracts `body` tag contents from the provided HTML string. | ||
* | ||
* @param {Object} data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like how these functions are designed to work. From the point of view of these utils there's very little sense in working on some arbitrary data
object. E.g. this specific function accepts an HTML string and returns another HTML string. Period. The data
object is a noise here. Plus, modifying an object that was passed to a function isn't usually a good practice.
Imagine, for instance, than in a month's time you'll need to use it somewhere else to process some HTML. This time, not together with the transformInput()
function, but alone. You will either have to change this function or use it with some object created for that call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I had some doubts about it too TBH. Makes perfect sense to refactor them in a way you mentioned 👍
src/filters/common.js
Outdated
* @returns {Object} result | ||
* @returns {String|null} result.body Extracted `body` contents. If `body` tag was not present or empty, `null` is returned. | ||
*/ | ||
export function extractBody( data ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this function could/should use DOMParser
. If so, it'd be good to use DOMParser
once for the whole process, so it'd have to look a bit differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should make sense, but then it will have to return something like DocumentFragment
or a View
instance. So it will be extract+parseBody
basically. It seems for now that there is no need to operate on body
string so we may just combine these two steps into one (because HtmlDataProcessor
uses DOMParser
internally). WDYT?
src/pastefromword.js
Outdated
* @param {String} input Word input. | ||
* @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} view Normalized input. | ||
*/ | ||
_normalizeWordInput( input, editor ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function necessary? The listener, in which it's used is fairly short, so it doesn't seem so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I have extracted it so it can be easily unit tested, that was the main reason. We may inline it but then tests will have to listen to inputTransformation
event to get the normalized data. Since those unit tests use only Paste from Word
and Clipboard
plugin there is rather a small chance that something in Clipboard
plugin will change that can interfere with the normalized input before it is validated in a test 🤔
src/pastefromword.js
Outdated
const editor = this.editor; | ||
const document = editor.editing.view.document; | ||
|
||
this.listenTo( document, 'clipboardInput', ( evt, data ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we make this plugin listen to inputTransformation
? We talked that it misses some information now (dataTransfer
), but it'd be perfect if it fitted into the existing architecture (where transformation was supposed to happen on inputTransformation
). Otherwise, we need to think whether the architecture works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this to a further discussion here - https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141 and then it slipped out somehow - @Reinmar could you take a look on my linked comment, because I'm still unsure about this one.
src/filters/common.js
Outdated
// @param {String} cssString String containing CSS rules/stylsheet to be parsed. | ||
// @param {Document} domDocument Document used to create helper element in which stylesheet will be injected. | ||
// @returns {CSSStyleSheet} Native `CSSStyleSheet` object containing parsed styles. | ||
function parseCSS( cssString, domDocument ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be parseCss()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But parseJS()
;> It's even documented: https://github.com/ckeditor/ckeditor5-design/wiki/Code-Style-Naming-Guidelines#acronyms-and-proper-names :D
I have combined all
As for The only unresolved issue is #7 (comment):
So we need to discuss https://github.com/ckeditor/ckeditor5-clipboard/issues/52#issuecomment-413853141. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor issues.
src/pastefromoffice.js
Outdated
|
||
/** | ||
* This plugin handles content pasted from Word and transforms it (if necessary) | ||
* to format suitable for editor {@link module:engine/model/model~Model}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Paste from Office plugin.
This plugin handles content pasted from Office apps (for now only Word) and transforms it (if necessary)
to a valid structure which can then be understood by the editor features.
For more information about this feature check the {@glink api/paste-from-office package page}.
I have refactored the code and docs as suggested. Built docs to check if everything looks good. Also skipped 4 failing unit tests mentioned earlier so CI should be green. |
Ready for another review round. |
src/filters/list.js
Outdated
* @returns {module:engine/view/node~Node|module:engine/view/documentfragment~DocumentFragment} The view | ||
* structure instance with list-like elements transformed into semantic lists. | ||
*/ | ||
export function transformParagraphsToLists( bodyView, stylesString ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how I read this function is:
- Get first child of body view... Why of a body? Can't this be of any container?
- Then... if the first child exists do something. Why is this first child that important? I thought we're filtering all children.
- Now, we want to find all list nodes... at a position before the first child?
- And we create lists for these... list nodes? Where do we create them? If we create them why isn't
createLists()
returning them? - And then we return the
bodyView
again. Which can be aNode
? Or a doc frag? How can a body be both. This function is used in one place in the code, so why do we support both things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to improve this code? You need to think about signatures of the functions that you create. And how you use them and combine them together.
transformParagraphsToLists()
should support just one thing – either a container or a doc frag. Not both. You can choose what you need based on the most common use of this function. And there's just one place where you use it so it'll be easy :P.- If you want to keep the outline of this function similar to what it looks now then
findAllListNodes()
should be corrected as well. It's unjustified from the signature of this function to work on a position. In 99.9% of cases you look for stuff in a container or a range. EvenTreeWalker
does that. It can accept a start position but this isn't necessary to pass it there (in this case where you scan the whole body anyway). Nor should this fact leak and pollute an unrealted method liketransformParagraphsToLists()
. - Then, once you'll have all list nodes... Wait... are these list nodes already? I thought they are paragraphs (based on the name of this outer function). So they are neither lists nor nodes. They are list-item-like-elements. The naming should be consequent in your code. Furthermore, from the docs I saw that
<h1>
may represent a list item as well so...transformParagraphsToLists()
should also be renamed. - OK, so we finally have that
listItemLikeElements
array. Let's transform these elements now. But not via acreateLists()
function unless we're creating lists. We are transforming X into Y. Again – naming needs to be adjusted. - Finally, to balance the amount of logic between functions, I'd recommend moving the loop out of the current
createLists()
to the outer function because it'll make that singulartransformXIntoY()
easier to test and debug. - Returning
bodyView
isn't necessary at this stage. I know that you anticipate there to be filters which may return something else than they accepted, but I can't see such a filter today so KISS. Don't build abstractions/mechanisms which you don't need yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code improvements needed.
src/filters/list.js
Outdated
import Element from '@ckeditor/ckeditor5-engine/src/view/element'; | ||
import Matcher from '@ckeditor/ckeditor5-engine/src/view/matcher'; | ||
import Range from '@ckeditor/ckeditor5-engine/src/view/range'; | ||
import TreeWalker from '@ckeditor/ckeditor5-engine/src/view/treewalker'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS. I forgot to add – you don't need to import the tree walker. You can use Range#getWalker()
or, if you don't need to set any walker params, simply iterate over the range.
src/filters/list.js
Outdated
let currentList = null; | ||
|
||
for ( let i = 0; i < listLikeItems.length; i++ ) { | ||
if ( !currentList || listLikeItems[ i - 1 ].id !== listLikeItems[ i ].id ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to read this code very carefully and find what's that id
property to understand what this fragment does. Once you realise that we indeed have to care about creating <ul/ol>
elements it gets obvious but the logic isn't that evident. So, I'll propose a bit different solution which will also avoid having to return from insertEmptyList()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I'll propose a bit different solution which will also avoid having to return from
insertEmptyList()
.
Actually, it's enough to improve its name a bit to indicate it's a new element.
src/filters/list.js
Outdated
|
||
const listLikeItems = findAllListItemLikeElements( documentFragment ); | ||
|
||
if ( listLikeItems.length ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything to do here if this array is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, "list like items"? Should be "list item like elements"
src/filters/list.js
Outdated
currentList = insertEmptyList( listStyle, listLikeItems[ i ].element, writer ); | ||
} | ||
|
||
const listItem = transformElementIntoListItem( listLikeItems[ i ].element, writer ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to read deep into the code to understand why we don't remove the old element if this function returns a new element. You need to know about the internals of this function to figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short – I wouldn't use rename()
in such a context for its complicated nature. I'd rather explicitly create a new element and remove the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is a bit too much work now, so I'm skipping it.
Suggested merge commit message (convention)
Feature: Support for pasting flat lists from Word. Closes ckeditor/ckeditor5#2482.
Additional information
This PR provides support for handling lists pasted from Microsoft Word (2016). This is the first feature in the Paste from Word plugin which uses any filtering/normalization so the whole base of the plugin was created.
The normalization/filtering itself is a pipeline-like: https://github.com/ckeditor/ckeditor5-paste-from-word/blob/3d105fc433f498b04907289387ef3ebed6d39e97/src/pastefromword.js#L62-L67 I tried to make each function to do one simple thing so they are reusable, could be called separately and in any order. This should provide more flexibility when working with different Paste from Soemthing cases.
As this is the first feature, that is the moment we are deciding on an architecture which will be used (to some degree) with this plugin so it is important to take detailed look into it :)
This PR relays also on ckeditor/ckeditor5-engine#1503, so it will not pass on CI without it and should not be merged untill https://github.com/ckeditor/ckeditor5-engine/issues/1501 is closed.