Help a n00b understand how build up a thread from a CAR? :) #2368
Replies: 3 comments 1 reply
-
If you don't need to work with CARs, MST nodes, etc, I'd recommend you use a higher level library like https://atproto.blue/en/latest/atproto_firehose/ or https://lexrpc.readthedocs.io/en/latest/#client to subscribe to the firehose and get user-level data. To answer your specific questions, the nodes you're seeing are repo commits and MST nodes. Details in https://atproto.com/guides/data-repos (follow links from there). Here's info on how to fetch the data you mentioned: https://docs.bsky.app/docs/category/tutorials |
Beta Was this translation helpful? Give feedback.
-
Thanks for that, but I'm a bit confused - the link you provided is indeed
what I've been using already, and the result is CARs, MST nodes, etc. :)
Is there something higher level out there?
Also, where do rate limits apply? Building effective moderation tools is
kinda hard if you can't look at everything.
Thanks!
Þann fim., 28. mar. 2024, 16:09 Ryan Barrett skrifaði <
***@***.***>:
… If you don't need to work with CARs, MST nodes, etc, I'd recommend you use
a higher level library like
https://atproto.blue/en/latest/atproto_firehose/ to subscribe to the
firehose and get user-level data.
To answer your specific questions, the nodes you're seeing are repo
commits and MST nodes. Details in https://atproto.com/guides/data-repos
(follow links from there). Here's info on how to fetch the data you
mentioned: https://docs.bsky.app/docs/category/tutorials
—
Reply to this email directly, view it on GitHub
<#2368 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3XG2H66L2PL2XT7ZGC6G4LY2Q6DNAVCNFSM6AAAAABFLILAGKVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSNBSG4ZTE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
You might be interested in looking at automod and hepa, written in golang:
This is a tool that listens to the firehose, unpacks user data, and runs a set of "rules" on each record. It handles things like caching account-level metadata (profile records), and resolving identities (handles). You may still run in to some rate-limits when initially warming up an instance, but the service will keep chugging and after a day or two things should stabilize. This tool can pull images from posts, but generally doesn't try to pull a full reply thread for each reply post: that would be pretty expensive! It does have some helpers to query things like graph relationships (does the posting account follow the parent or OP account, or vica-versa). |
Beta Was this translation helpful? Give feedback.
-
So I'd like to develop a Python moderation tool for suspect posts from the firehose, but to do it well, one really needs to have the full thread, including images, along with each user's profile (username, header, avatar, description), because as we all know, people also can put awful stuff in those elements as well, or there can be info that adds context that may affect a moderation decision.
I've got a working firehose setup here. I'm extracting CARs from the firehose. So I'm now looking at the root and the blocks. In the blocks I see a lot of CIDs for some sort of cryptic "k", "p", "t" "v" thing; one containing useful info like the did; and then the main post, such as:
Given this, how would one:
A) Get the image(s) from the ref(s)?
B) Get the user's profile info (name, avatar, header, info)?
C) Get the parent record of the thread, if any?
Sorry, I'm sure these are total n00b questions - I have only done relatively basic stuff with ATProto before :)
Beta Was this translation helpful? Give feedback.
All reactions