Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<script> or <style> that's used at the beginning of an HTML data string is lost #11109

Closed
Reinmar opened this issue Jan 14, 2022 · 1 comment
Closed
Labels
resolution:duplicate This issue is a duplicate of another issue and was merged into it. type:bug This issue reports a buggy (incorrect) behavior.

Comments

@Reinmar
Copy link
Member

Reinmar commented Jan 14, 2022

  • It’s moved by HTML parser to <head>
    • Same issue as with HTML comments in the past (see First comment node is missing after calling DOMParser.parseFromString() #9861 and
      // The rules for parsing an HTML string can be read on https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhtml.
      //
      // In short, parsing tokens in an HTML string starts with the so-called "initial" insertion mode. When a DOM parser is in this
      // state and encounters a comment node, it inserts this comment node as the last child of the newly-created `HTMLDocument` object.
      // The parser then proceeds to successive insertion modes during parsing subsequent tokens and appends in the `HTMLDocument` object
      // other nodes (like <html>, <head>, <body>). This causes that the first leading comments from HTML string become the first nodes
      // in the `HTMLDocument` object, but not in the <body> collection, because they are ultimately located before the <html> element.
      //
      // Therefore, so that such leading comments do not disappear, they all are moved from the `HTMLDocument` object to the document
      // fragment, until the <html> element is encountered.
      //
      // See: https://github.com/ckeditor/ckeditor5/issues/9861.
      )
  • The first thing to check is why we cannot wrap the entire content with <body> before passing it here:
    const document = this.domParser.parseFromString( data, 'text/html' );
    • If it works, get also rid of the hack for comments.
    • You can check with @psmyrek  why we didn’t go this way.
    • One hypothesis: due to parsing full HTML content, e.g. coming from the clipboard. It may already contain doctype, <html> and <body>.
    • If the above is true, we can still try to check with a simple regexp for <body *> in the original HTML. If the <body> already exists, don’t the HTML.
  • If not, let’s use the same hack as we used for comments.
    • However, if we’ll apply the same hack, we may start passing through some <script> tags that were originally in the <head> (e.g. in a <head> of a pasted content).
    • 👆We are not able to verify what all external applications do and whether they put any <script> tags in the content, so we just need to wait for feedback.
@Reinmar Reinmar added the type:bug This issue reports a buggy (incorrect) behavior. label Jan 14, 2022
@Reinmar
Copy link
Member Author

Reinmar commented Jan 16, 2022

DUP of #11110.

@Reinmar Reinmar closed this as completed Jan 16, 2022
@Reinmar Reinmar added the resolution:duplicate This issue is a duplicate of another issue and was merged into it. label Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution:duplicate This issue is a duplicate of another issue and was merged into it. type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

No branches or pull requests

1 participant