Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should parse in Quirks Mode if doctype not set #2197

Closed
Muthukirthan opened this issue Sep 2, 2024 · 2 comments
Closed

Should parse in Quirks Mode if doctype not set #2197

Muthukirthan opened this issue Sep 2, 2024 · 2 comments
Assignees
Labels
Milestone

Comments

@Muthukirthan
Copy link

Input:

<html>
    <head></head>
    <body>
        <p>
            <span>
                <table>
                    <tbody>
                        <tr>
                            <td><span>Hello table data</span></td>
                        </tr>
                    </tbody>
                </table>
            </span>
        </p>
    </body>
</html>

Jsoup output:

<html>
    <head></head>
    <body>
        <p>
            <span>
                </span></p><table>
                    <tbody>
                        <tr>
                            <td><span>Hello table data</span></td>
                        </tr>
                    </tbody>
                </table>
            
        <p></p>
    </body>
</html>

Ref link: https://try.jsoup.org/~PNrKdofSo_QE8KX2IbZKZ-xxyq0

The table tag inside span of p tag is unwrapped as next sibling of p tag. Additionally an empty p tag is created as the next sibling of the unwrapped table tag. This output is not seen in Chrome and Firefox browsers

@jhy
Copy link
Owner

jhy commented Sep 10, 2024

Hi,

The parse you are getting is coming from the implementation of this rule:

A start tag whose tag name is "table".

In HtmlTreeBuilderState, we are treating the document as No Quirks, but because there is no doctype, it should be treated as Quirks, and so the p should not be closed when the table gets opened.

jsoup's doesn't currently implement Quirks Mode completely. We do go into quirks mode when parsing invalid doctypes, but don't have the expected doctype tests. Implementing that would allow this to be fixed.

Checked in Chrome and if there is a valid doctype and so in No-Quirks mode, Chrome does parse the same as jsoup does currently:

Screenshot 2024-09-10 at 3 19 26 PM

@jhy jhy closed this as completed in 8601e85 Sep 10, 2024
@jhy jhy changed the title Jsoup Issue: <table> tag inside <span> of <p> tags are getting unwrapped outside and empty <p> tags are created Should parse in Quirks Mode if doctype not set Sep 10, 2024
@jhy jhy self-assigned this Sep 10, 2024
@jhy jhy added the fixed label Sep 10, 2024
@jhy jhy added this to the 1.18.2 milestone Sep 10, 2024
@jhy
Copy link
Owner

jhy commented Sep 10, 2024

Thanks for reporting! Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants