Idea: meta-repos #3243
Replies: 1 comment
-
A couple first impressions:
We considered doing some kind of "higher tree" over account trees earlier on in the sync process. In particular, as a way to do periodic batch synchronization via a bi-directional wire protocol similar to the "git protocol", where two parties recursively figure out where they are out of sync, and then transfer just those blocks. One big-picture problem with this is that the top of the tree is churning way too fast for even a regular-large (100k account) PDS instance, let alone a full-network relay. Having account repos be processed concurrently is important for performance and scaling. "Sequencing" firehoses out of PDS instances (or the relay) can already be a scaling challenge for implementations (definitely doable with the right architecture, but naive implementations can get clogged or deadlock). |
Beta Was this translation helpful? Give feedback.
-
TL;DR: a special repo that describes the current state of all other repos hosted by a PDS
Every PDS has a DID (usually a did:web) (returned by
describeServer
), but that DID isn't really used for anything at the moment.What if each PDS also hosted a special "meta" repo, identified by the PDS's own DID?
The meta-repo would house a collection named something like
com.atproto.sync.repoMeta
, with records of schema similar to the individual entries ofcom.atproto.sync.listRepos
, i.e.(with one difference, I added a
handle
field. it can be omitted if the handle is unset or invalid)The record keys would be DIDs (with some escaping scheme to ensure they're valid rkeys). There would be one record for every other repo hosted by the PDS. Every time a regular repo is updated, the meta-repo is updated likewise. Normally, these meta-repo updates would not be broadcast on the firehose, because the information would be redundant.
But if the
active
orstatus
fields changed, that diff would be broadcast - replacing#account
events on the firehose.Similarly, if
handle
changes, that would be broadcast, replacing#identity
events on the firehose.listRepos
is now redundant, since you can just calllistRecords
on the meta-repo.The
rev
andhead
of the meta-repo itself could additionally be returned in thedescribeServer
response.There are two big benefits to this system:
#identity
and#account
events (or rather, their equivalents) are now signed by PDSes (via merkle proofs) (and so there's no need for downstream services to double-check them by querying the PDS).A Relay could also have a meta-meta-repo (!), storing
com.atproto.sync.repoMeta
records for the meta-repos of each PDS it's syncing from. Thus, the whole state of the known network (from the relay's PoV) at any given time be boiled down to a single merkle root. The meta-meta-repo might be challenging to implement from a performance perspective, however.Beta Was this translation helpful? Give feedback.
All reactions