You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I confirmed this with an (admittedly brittle) test on the doctype branch: 89148e8.
I don't have any immediate thoughts on a good fix — my first impressions are that part of the issue is that lxml.html.fromstring paves over the inconsistencies between handling documents and fragments, abandoning some of the associated root-level metadata in the process.
It probably would make sense for the fix to provide a new API method (e.g. toronado.from_document) to avoid any backwards incompatible implementation changes (unless it's possible to continue supporting both documents and fragments with the same method.)
I don't necessarily have time to work on this in the very near future, but am happy to merge any fixes you're able to provide (given that there is test coverage.)
It looks like the fix could be as simple as providing toronado.from_document that uses document_fromstring and tostring instead, providing doctype as a keyword argument to tostring.
If I process an HTML with a doctype declaration with toronado, the doctype gets removed.
The text was updated successfully, but these errors were encountered: