Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer and lxml tree walker: TypeError: unhashable type #68

Closed
gsnedders opened this issue Jun 21, 2013 · 2 comments
Closed

Sanitizer and lxml tree walker: TypeError: unhashable type #68

gsnedders opened this issue Jun 21, 2013 · 2 comments

Comments

@gsnedders
Copy link
Member

http://code.google.com/p/html5lib/issues/detail?id=210

Reported by r.kintzi, Aug 14, 2012

What steps will reproduce the problem?

from html5lib import HTMLParser
from html5lib.treebuilders import getTreeBuilder
from html5lib.treewalkers import getTreeWalker
from html5lib.filters.sanitizer import Filter as Sanitizer
html = "<html><body><h1>Header"

parser = HTMLParser(tree = getTreeBuilder("lxml"),
        namespaceHTMLElements = False)
doc = parser.parse(html)
root = doc.getroot()
body = doc.xpath('/html/body')
walker = getTreeWalker('lxml')
stream = walker(body)
stream = Sanitizer(stream)
for token in stream:
    print token

What is the expected output? What do you see instead?

I do not know exactly what should be printed. Instead, an exception is raised:

$ python t.py
{'namespace': u'None', 'type': 'Characters', 'data': u'<body>'}
Traceback (most recent call last):
  File "t.py", line 17, in <module>
    for token in stream:
  File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/sanitizer.py", line 7, in __iter__
    token = self.sanitize_token(token)
  File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/sanitizer.py", line 171, in sanitize_token
    token["data"][::-1] 
TypeError: unhashable type

Please provide any additional information below.

the faulty token is:

{'namespace': u'None ',' type ':' StartTag ',' name ': u'h1', 'data': {}}
@gsnedders
Copy link
Member Author

At first glance, this is the manifestation of the sanitizer trying to handle tokenizer and treewalker tokens at once.

@gsnedders
Copy link
Member Author

Dupe of #72.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant