Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse hashtags for Bluesky #515

Merged
merged 7 commits into from
Apr 20, 2024
Merged

Conversation

mary-ext
Copy link
Contributor

@mary-ext mary-ext commented Apr 19, 2024

Fixes #514

A little rusty with Python, and wasn't sure how I'd get the Bluesky bot to work, thought I'd chip in just a little.

@CLAassistant
Copy link

CLAassistant commented Apr 19, 2024

CLA assistant check
All committers have signed the CLA.

Comment on lines 360 to 361
link: str
card: SocialCard | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assign a default value to these new variables. This will help us avoid the UnboundLocalError which can occur if the variable is referenced before it's explicitly assigned a value within the code's scope. Providing a default value ensures the variable is always initialized and has a defined state.

Suggested change
link: str
card: SocialCard | None
link: str | None = None
card: SocialCard | None = None

Copy link
Contributor Author

@mary-ext mary-ext Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that checking if the value isn't None would narrow the type down, which Pylance seems to agree, but Mypy seems to be arguing otherwise, is there something I'm missing on?

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, dropping the != None works.

# given that our needs of a hashtag is very simple, we can do away with
# only parsing alphanumeric characters
tag_regex = rb"(?:^|\s)#(?P<tag>[0-9]*[a-zA-Z][a-zA-Z0-9]*)"
text_bytes = text.encode("UTF-8")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the text is a string, which it appears to be, then there's no need to cast everything to bytes objects.
The regex doesn't need the b prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnludwigm Thanks for taking the time to review the PR! I apologize for the late reply.

I'd like to explain the reason behind casting everything to bytes: Bluesky strings are internally UTF-8 encoded, and the documentation suggests handling strings in UTF-8 to compute "valid indexes" for elements like Links, Mentions, and Hashtags (you can learn more about it in this link). While UTF-8 encoding wouldn't be a problem for plain text posts, we often use emojis in our templates. Emojis can be encoded as multiple characters, leading to incorrect indexes if not handled properly.

I'll create a PR to tweak this helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh right yeah I forgot about it needing to be in UTF-8 specifically

@mlissner
Copy link
Member

Hi @mary-ext and @johnludwigm! Thank you both!

I'm the director here at Free Law Project, and I'm amazed that this just sort of got done. Open source, man — every once in a while it blows you away. I'll also introduce @ERosendo, who is the maintainer for this area of the code, so that's why he's chiming in with the review.

@mary-ext, I know you just went ahead and did this work, but if you want to get in touch with me at [email protected], I'll be happy to PayPal you (or whatever) some amount of money for your time on this one (probably should have agreed on that before you jumped in, but I imagine we'll come up with a good number!).

Thank you all!

@mary-ext mary-ext force-pushed the feat/bluesky-tags branch from 7dbaf14 to a0393a4 Compare April 20, 2024 01:49
Copy link
Member

@mlissner mlissner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's give it a try!

@mlissner mlissner merged commit 95adc95 into freelawproject:main Apr 20, 2024
5 checks passed
@mlissner
Copy link
Member

mlissner commented Apr 21, 2024

It works! The embed doesn't have functional hashtags, but here's the first post with them:

I'm now following San Francisco v. Oakland (Frisco/Bay/Oakland airport):[View Full Case]#CL68449160

[image or embed]

— Big Cases Bot (@bigcases.bots.law) Apr 20, 2024 at 9:19 PM

Thank you @mary-ext!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Fix hashtags in BlueSky posts
5 participants