<script> or <style> that's used at the beginning of an HTML data string is lost #11109

Reinmar · 2022-01-14T12:58:49Z

It’s moved by HTML parser to <head>

Same issue as with HTML comments in the past (see First comment node is missing after calling DOMParser.parseFromString() #9861 and

ckeditor5/packages/ckeditor5-engine/src/dataprocessor/htmldataprocessor.js

Lines 122 to 133 in 82ce2e9

    
           // The rules for parsing an HTML string can be read on https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhtml. 
        
           // 
        
           // In short, parsing tokens in an HTML string starts with the so-called "initial" insertion mode. When a DOM parser is in this 
        
           // state and encounters a comment node, it inserts this comment node as the last child of the newly-created `HTMLDocument` object. 
        
           // The parser then proceeds to successive insertion modes during parsing subsequent tokens and appends in the `HTMLDocument` object 
        
           // other nodes (like <html>, <head>, <body>). This causes that the first leading comments from HTML string become the first nodes 
        
           // in the `HTMLDocument` object, but not in the <body> collection, because they are ultimately located before the <html> element. 
        
           // 
        
           // Therefore, so that such leading comments do not disappear, they all are moved from the `HTMLDocument` object to the document 
        
           // fragment, until the <html> element is encountered. 
        
           // 
        
           // See: https://github.com/ckeditor/ckeditor5/issues/9861.

)

The first thing to check is why we cannot wrap the entire content with <body> before passing it here:

ckeditor5/packages/ckeditor5-engine/src/dataprocessor/htmldataprocessor.js

Line 119 in 82ce2e9

const document = this.domParser.parseFromString( data, 'text/html' );
- If it works, get also rid of the hack for comments.
- You can check with @psmyrek why we didn’t go this way.
- One hypothesis: due to parsing full HTML content, e.g. coming from the clipboard. It may already contain doctype, <html> and <body>.
- If the above is true, we can still try to check with a simple regexp for <body *> in the original HTML. If the <body> already exists, don’t the HTML.
If not, let’s use the same hack as we used for comments.
- However, if we’ll apply the same hack, we may start passing through some <script> tags that were originally in the <head> (e.g. in a <head> of a pasted content).
- 👆We are not able to verify what all external applications do and whether they put any <script> tags in the content, so we just need to wait for feedback.

The text was updated successfully, but these errors were encountered:

Reinmar · 2022-01-16T21:51:21Z

DUP of #11110.

Reinmar added the type:bug This issue reports a buggy (incorrect) behavior. label Jan 14, 2022

Reinmar closed this as completed Jan 16, 2022

Reinmar added the resolution:duplicate This issue is a duplicate of another issue and was merged into it. label Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<script> or <style> that's used at the beginning of an HTML data string is lost #11109

<script> or <style> that's used at the beginning of an HTML data string is lost #11109

Reinmar commented Jan 14, 2022

Reinmar commented Jan 16, 2022

<script> or <style> that's used at the beginning of an HTML data string is lost #11109

<script> or <style> that's used at the beginning of an HTML data string is lost #11109

Comments

Reinmar commented Jan 14, 2022

Reinmar commented Jan 16, 2022