Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert tags, skip_tags, recognized_tags to sets; fix doctests; f-strings #694

Merged
merged 5 commits into from
Jan 23, 2023
Merged

Convert tags, skip_tags, recognized_tags to sets; fix doctests; f-strings #694

merged 5 commits into from
Jan 23, 2023

Conversation

willkg
Copy link
Member

@willkg willkg commented Jan 21, 2023

This converts the tags argument to BleachHTMLParser to be a set.

This converts the skip_tags and recognized_tags to linkify things to be sets.

This updates the documentation fixing example code so that tags, skip_tags, and recognized_tags are all sets.

This also converts some string interpolation from %s style to f-strings.

…ings

This converts the "tags" argument to BleachHTMLParser to be a set.

This converts the "skip_tags" and "recognized_tags" to linkify things to
be sets.

This updates the documentation fixing example code so that tags,
skip_tags, and recognized_tags are all sets.

This also converts some string interpolation from %s style to f-strings.
@willkg
Copy link
Member Author

willkg commented Jan 22, 2023

I need to go through this again and double-check that we've got the docs, tests, and everything correct.

This fixes the test data to pass sets instead of lists for "tags",
"skip_tags", "recognized_tags", and "protocols".

:arg bool parse_email: whether or not to linkify email addresses

:arg url_re: url matching regex

:arg email_re: email matching regex

:arg list recognized_tags: the list of tags that linkify knows about;
:arg set recognized_tags: the list of tags that linkify knows about;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops--missed this "list".

docs/clean.rst Outdated
@@ -224,9 +228,10 @@ This adds smb to the Bleach-specified set of allowed protocols:

>>> import bleach

>>> my_protocols = bleach.ALLOWED_PROTOCOLS.union({'smb'})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching these to sets makes them slightly more awkward to manipulate. :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot we can use |. That makes this better.

@willkg
Copy link
Member Author

willkg commented Jan 22, 2023

@g-k This is the last code change before the next release. Can you look it over?

I verified the tests are passing in the right stuff by adding:

assert isinstance(tags, (type(None),  set, frozenset)), f"wrong type {type(tags) = }"

and then running the tests and building the docs (which runs the doctests). That sussed out a bunch of issues. I did the same thing for skip_tags, recognized_tags, and protocols.

@willkg willkg requested a review from g-k January 22, 2023 03:04
@willkg
Copy link
Member Author

willkg commented Jan 22, 2023

As a side note, while fiddling with sets and f-strings, I wondered if we'd get any performance gain.

I use the Standup data (remember Standup?) to test bleach.clean since it's a pretty straight-forward data set of messages with tags and urls and a bunch of stuff in it.

I used Python 3.10.9.

  • 5.0.0: bleach.clean on 58,630 items 10x: minimum 2.793s
  • with this PR: bleach.clean on 58,630 items 10x: minimum 2.304s

Seems like we got a small speedup with bleach.clean, so that's cool.

Copy link
Collaborator

@g-k g-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ lgtm and it's nice to get perf and readability wins

is the Standup data from the app people used to post their (stand up) status?

@willkg
Copy link
Member Author

willkg commented Jan 22, 2023

Yeah! pmac and I ran Standup which was the last project I worked on that used Bleach. I have a data file of the message strings.

I'll do another pass on this tomorrow and then land it and finish up the release.

Thank you!

@willkg willkg merged commit b7e8da3 into mozilla:main Jan 23, 2023
@willkg willkg deleted the minor-fixes branch January 23, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants