Help a n00b understand how build up a thread from a CAR? :) #2368

enn-nafnlaus · 2024-03-27T16:56:38Z

enn-nafnlaus
Mar 27, 2024

So I'd like to develop a Python moderation tool for suspect posts from the firehose, but to do it well, one really needs to have the full thread, including images, along with each user's profile (username, header, avatar, description), because as we all know, people also can put awful stuff in those elements as well, or there can be info that adds context that may affect a moderation decision.

I've got a working firehose setup here. I'm extracting CARs from the firehose. So I'm now looking at the root and the blocks. In the blocks I see a lot of CIDs for some sort of cryptic "k", "p", "t" "v" thing; one containing useful info like the did; and then the main post, such as:

CID(_cid='bafyreidsfuwkyp74lrytiyet3bdn24uc6gg4nqiheixlbbnunixwih7yha', version=1, codec=113, hash=Multihash(code=18, size=32,digest=b'r-,\xac?\xfc\q4`\x93\xd8F\xddr\x82\xf1\x8d\xc6\xc1\x07".\xb0\x85\xb4j/d\x1f\xf88')):
{
'text': 'Mario Wonder the way God intended',
'$type': 'app.bsky.feed.post',
'embed':
{
'$type': 'app.bsky.embed.images',
'images':
[
{
'alt': '',
'image':
{
'ref': 'bafkreibz4uj3zsjx22id7eyucpnmvpha6umqraku4sosxyknlbkeaiv7j4',
'size': 337499,
'$type': 'blob',
'mimeType': 'image/jpeg'
},
'aspectRatio':
{
'width': 2000,
'height': 1125
}
}
]
},
'langs': ['en'],
'createdAt': '2024-03-27T16:26:17.186Z'
},

Given this, how would one:

A) Get the image(s) from the ref(s)?
B) Get the user's profile info (name, avatar, header, info)?
C) Get the parent record of the thread, if any?

Sorry, I'm sure these are total n00b questions - I have only done relatively basic stuff with ATProto before :)

snarfed · 2024-03-28T16:09:05Z

snarfed
Mar 28, 2024

If you don't need to work with CARs, MST nodes, etc, I'd recommend you use a higher level library like https://atproto.blue/en/latest/atproto_firehose/ or https://lexrpc.readthedocs.io/en/latest/#client to subscribe to the firehose and get user-level data.

To answer your specific questions, the nodes you're seeing are repo commits and MST nodes. Details in https://atproto.com/guides/data-repos (follow links from there). Here's info on how to fetch the data you mentioned: https://docs.bsky.app/docs/category/tutorials

1 reply

DavidBuchanan314 Mar 28, 2024

To add to this, for getting thread context, you might want to call out to an AppView for that - getPostThread in particular https://docs.bsky.app/docs/api/app-bsky-feed-get-post-thread

But, you wouldn't be able to do this for every record you see else you'll hit rate limits.

enn-nafnlaus · 2024-03-28T16:54:26Z

enn-nafnlaus
Mar 28, 2024
Author

Thanks for that, but I'm a bit confused - the link you provided is indeed what I've been using already, and the result is CARs, MST nodes, etc. :) Is there something higher level out there? Also, where do rate limits apply? Building effective moderation tools is kinda hard if you can't look at everything. Thanks! Þann fim., 28. mar. 2024, 16:09 Ryan Barrett skrifaði < ***@***.***>:

…

If you don't need to work with CARs, MST nodes, etc, I'd recommend you use a higher level library like https://atproto.blue/en/latest/atproto_firehose/ to subscribe to the firehose and get user-level data. To answer your specific questions, the nodes you're seeing are repo commits and MST nodes. Details in https://atproto.com/guides/data-repos (follow links from there). Here's info on how to fetch the data you mentioned: https://docs.bsky.app/docs/category/tutorials — Reply to this email directly, view it on GitHub <#2368 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3XG2H66L2PL2XT7ZGC6G4LY2Q6DNAVCNFSM6AAAAABFLILAGKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSNBSG4ZTE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

bnewbold · 2024-03-28T18:20:52Z

bnewbold
Mar 28, 2024
Maintainer

You might be interested in looking at automod and hepa, written in golang:

This is a tool that listens to the firehose, unpacks user data, and runs a set of "rules" on each record. It handles things like caching account-level metadata (profile records), and resolving identities (handles). You may still run in to some rate-limits when initially warming up an instance, but the service will keep chugging and after a day or two things should stabilize.

This tool can pull images from posts, but generally doesn't try to pull a full reply thread for each reply post: that would be pretty expensive! It does have some helpers to query things like graph relationships (does the posting account follow the parent or OP account, or vica-versa).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help a n00b understand how build up a thread from a CAR? :) #2368

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Help a n00b understand how build up a thread from a CAR? :) #2368

enn-nafnlaus Mar 27, 2024

Replies: 3 comments · 1 reply

snarfed Mar 28, 2024

DavidBuchanan314 Mar 28, 2024

enn-nafnlaus Mar 28, 2024 Author

bnewbold Mar 28, 2024 Maintainer

enn-nafnlaus
Mar 27, 2024

Replies: 3 comments 1 reply

snarfed
Mar 28, 2024

enn-nafnlaus
Mar 28, 2024
Author

bnewbold
Mar 28, 2024
Maintainer