-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse hashtags for Bluesky #515
Conversation
link: str | ||
card: SocialCard | None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assign a default value to these new variables. This will help us avoid the UnboundLocalError
which can occur if the variable is referenced before it's explicitly assigned a value within the code's scope. Providing a default value ensures the variable is always initialized and has a defined state.
link: str | |
card: SocialCard | None | |
link: str | None = None | |
card: SocialCard | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, dropping the != None
works.
# given that our needs of a hashtag is very simple, we can do away with | ||
# only parsing alphanumeric characters | ||
tag_regex = rb"(?:^|\s)#(?P<tag>[0-9]*[a-zA-Z][a-zA-Z0-9]*)" | ||
text_bytes = text.encode("UTF-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the text is a string, which it appears to be, then there's no need to cast everything to bytes
objects.
The regex doesn't need the b
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnludwigm Thanks for taking the time to review the PR! I apologize for the late reply.
I'd like to explain the reason behind casting everything to bytes
: Bluesky strings are internally UTF-8 encoded, and the documentation suggests handling strings in UTF-8 to compute "valid indexes" for elements like Links, Mentions, and Hashtags (you can learn more about it in this link). While UTF-8 encoding wouldn't be a problem for plain text posts, we often use emojis in our templates. Emojis can be encoded as multiple characters, leading to incorrect indexes if not handled properly.
I'll create a PR to tweak this helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh right yeah I forgot about it needing to be in UTF-8 specifically
Hi @mary-ext and @johnludwigm! Thank you both! I'm the director here at Free Law Project, and I'm amazed that this just sort of got done. Open source, man — every once in a while it blows you away. I'll also introduce @ERosendo, who is the maintainer for this area of the code, so that's why he's chiming in with the review. @mary-ext, I know you just went ahead and did this work, but if you want to get in touch with me at [email protected], I'll be happy to PayPal you (or whatever) some amount of money for your time on this one (probably should have agreed on that before you jumped in, but I imagine we'll come up with a good number!). Thank you all! |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
7dbaf14
to
a0393a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's give it a try!
It works! The embed doesn't have functional hashtags, but here's the first post with them:
Thank you @mary-ext!!! |
Fixes #514
A little rusty with Python, and wasn't sure how I'd get the Bluesky bot to work, thought I'd chip in just a little.