You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This only happens with the libxml2-lxml parser; the html5lib parser handles it correctly, i.e. does not extract any extra URLs.
Tested on two machines, both with Python 3.6.10. One has lxml 4.4.2 and libxml2 2.9.4 with wpull 2.0.3, the other has lxml 4.6.2 and libxml2 2.9.10 with wpull The Blocking PR 393.
The text was updated successfully, but these errors were encountered:
Via ArchiveBot job b4cobsdfap6j2kzjo3i4jwnsx:
This recurses to wonderful URLs such as https://www.e-gov.am/gov-decrees/item/23174/1clip_themedata.thmx%22%20rel=%22themeData%22%20/%3E (and it only gets worse from there).
The page contains these three
<link>
tags with NUL bytes (^@
):This only happens with the libxml2-lxml parser; the html5lib parser handles it correctly, i.e. does not extract any extra URLs.
Tested on two machines, both with Python 3.6.10. One has lxml 4.4.2 and libxml2 2.9.4 with wpull 2.0.3, the other has lxml 4.6.2 and libxml2 2.9.10 with wpull The Blocking PR 393.
The text was updated successfully, but these errors were encountered: