-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizing filter broken in 0.90 #72
Comments
So the status is:
My gut says the filter level is the right level for the sanitizer to be operating (the middle of the parser doesn't make much sense, as what you really want to do is post-process the tree to remove what you dislike). I think we should probably goes as far as to remove the ability to change the tokenizer. The big downside of that, obviously, is that we go from the sanitizer only working as a tokenizer in 1.0b1 to that being unsupported and only working as a filter in 1.0b2… Thoughts, @jgraham, @garethrees, @jsocol? |
#24 has some relevance, but given duck-typing doesn't help much. |
I'm garethrees.co.uk. GL! |
Oops! Sorry! Attempt two: Thoughts, @gareth-rees, on the above? |
Ostensibly I agree that the filters are the more "correct" place to do sanitization, even if it means huge changes in bleach for 1.0, but I haven't really done it that way so my one question is: do filters enable both dropping the tag completely (with or without any content) and replacing it with an escaped version (e.g. |
It's much easier to do with filters, as you're guaranteed a matching start tag and an end tag for each node, so you can maintain a simple stack to drop content. Obviously with tokenizers you either have to reimplement half the parser or accept you'll never get it quite right. |
This drops support for the tokenizing side of thing, which is sadly the only side that works in previous releases.
…ound for https://code.google.com/p/html5lib/issues/detail?id=210 https://code.google.com/p/html5lib/issues/detail?id=210 The sanitizer filter seems to be buggy (see also html5lib/html5lib-python#72) so we rely on the sanitizing tokenizer instead
As we no longer need the sanitizer to be shared between a filter and a tokenizer, move the entire sanitizer to the filter module. Also, replace the existing, tiny sanitizer testsuite with the one in html5lib-tests.
…ound for https://code.google.com/p/html5lib/issues/detail?id=210 https://code.google.com/p/html5lib/issues/detail?id=210 The sanitizer filter seems to be buggy (see also html5lib/html5lib-python#72) so we rely on the sanitizing tokenizer instead
Howdy! I'm working to migrate the HTML sanitizer in feedparser to rely on html5lib. However, some of the feedparser unit tests are triggering the TypeError bug referenced in #68. If @gsnedders has written viable code to resolve this, would it be possible to coordinate feedparser's migration with the integration of the fix for this? |
@kurtmckee the fix for that is for currently to use the sanitizer as a filter when tokenizing and not when serializing; this will change once #72 gets fixed (which is PR #110) as then you'll use the sanitizer as a filter when serializing and not when tokenizing. |
As we no longer need the sanitizer to be shared between a filter and a tokenizer, move the entire sanitizer to the filter module. Also, replace the existing, tiny sanitizer testsuite with the one in html5lib-tests.
As we no longer need the sanitizer to be shared between a filter and a tokenizer, move the entire sanitizer to the filter module. Also, replace the existing, tiny sanitizer testsuite with the one in html5lib-tests.
As we no longer need the sanitizer to be shared between a filter and a tokenizer, move the entire sanitizer to the filter module.
As we no longer need the sanitizer to be shared between a filter and a tokenizer, move the entire sanitizer to the filter module.
Any chance someone could write replacement code for the following? from html5lib import sanitizer sanitizer.HTMLSanitizer.acceptable_elements.extend(settings.TEXT_ADDITIONAL_TAGS) This would really help me out. Thanks. |
http://code.google.com/p/html5lib/issues/detail?id=162
Reported by [email protected], Oct 10, 2010
The text was updated successfully, but these errors were encountered: