-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert tags, skip_tags, recognized_tags to sets; fix doctests; f-strings #694
Conversation
…ings This converts the "tags" argument to BleachHTMLParser to be a set. This converts the "skip_tags" and "recognized_tags" to linkify things to be sets. This updates the documentation fixing example code so that tags, skip_tags, and recognized_tags are all sets. This also converts some string interpolation from %s style to f-strings.
I need to go through this again and double-check that we've got the docs, tests, and everything correct. |
This fixes the test data to pass sets instead of lists for "tags", "skip_tags", "recognized_tags", and "protocols".
bleach/linkifier.py
Outdated
|
||
:arg bool parse_email: whether or not to linkify email addresses | ||
|
||
:arg url_re: url matching regex | ||
|
||
:arg email_re: email matching regex | ||
|
||
:arg list recognized_tags: the list of tags that linkify knows about; | ||
:arg set recognized_tags: the list of tags that linkify knows about; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops--missed this "list".
docs/clean.rst
Outdated
@@ -224,9 +228,10 @@ This adds smb to the Bleach-specified set of allowed protocols: | |||
|
|||
>>> import bleach | |||
|
|||
>>> my_protocols = bleach.ALLOWED_PROTOCOLS.union({'smb'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching these to sets makes them slightly more awkward to manipulate. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot we can use |
. That makes this better.
@g-k This is the last code change before the next release. Can you look it over? I verified the tests are passing in the right stuff by adding:
and then running the tests and building the docs (which runs the doctests). That sussed out a bunch of issues. I did the same thing for |
As a side note, while fiddling with sets and f-strings, I wondered if we'd get any performance gain. I use the Standup data (remember Standup?) to test bleach.clean since it's a pretty straight-forward data set of messages with tags and urls and a bunch of stuff in it. I used Python 3.10.9.
Seems like we got a small speedup with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+ lgtm and it's nice to get perf and readability wins
is the Standup data from the app people used to post their (stand up) status?
Yeah! pmac and I ran Standup which was the last project I worked on that used Bleach. I have a data file of the message strings. I'll do another pass on this tomorrow and then land it and finish up the release. Thank you! |
This converts the
tags
argument toBleachHTMLParser
to be a set.This converts the
skip_tags
andrecognized_tags
to linkify things to be sets.This updates the documentation fixing example code so that
tags
,skip_tags
, andrecognized_tags
are all sets.This also converts some string interpolation from
%s
style to f-strings.