Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html5lib.treebuilders.dom.dom2sax crashes on 'xml:lang' attribute #6

Closed
gsnedders opened this issue Apr 9, 2013 · 0 comments
Closed
Assignees
Labels
Milestone

Comments

@gsnedders
Copy link
Member

http://code.google.com/p/html5lib/issues/detail?id=200

Reported by vovanec, Mar 6, 2012

A simple test case(my program has more complex handler implementation but the problem is reproducible with the default handler):

import xml.sax.handler
import html5lib

def test(html):
    handler = xml.sax.handler.ContentHandler()
    parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
    dom = parser.parse(html)
    html5lib.treebuilders.dom.dom2sax(dom, handler)

html = '<html xml:lang="en">'
test(html)

With html5lib 0.95 it produces the following traceback:

python test.py 
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    test(html)
  File "test.py", line 10, in test
    html5lib.treebuilders.dom.dom2sax(dom, handler)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax
    for child in node.childNodes: dom2sax(child, handler, nsmap)
  File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax
    del attributes[(attr.namespaceURI, attr.nodeName)]
KeyError: (None, u'xml:lang')

With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.

Tested it with Python 2.6.5, Linux

Please advise.

Thanks,
--Vladimir

@ghost ghost assigned ambv May 4, 2013
ambv added a commit to ambv/html5lib-python that referenced this issue May 4, 2013
ambv added a commit to ambv/html5lib-python that referenced this issue May 5, 2013
@ghost ghost assigned gsnedders May 5, 2013
gsnedders added a commit to gsnedders/html5lib-python that referenced this issue May 5, 2013
…alker

This moves the function to a new treeadapters module (where later
the adapters from test_treewalker.py will get moved). dom2sax
remains for backwards-compatibility, calling the new function.
gsnedders added a commit to gsnedders/html5lib-python that referenced this issue Jun 16, 2013
This moves the functionality to a new treeadapters module (where
later the adapters from test_treewalker.py will get moved) and
removes the previous dom2sax function.
hugovk added a commit to hugovk/html5lib-python that referenced this issue Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants