You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, in subscribeRepos, we query the datastore separately for each connected subscriber (client). This is fine for historical blocks, but it's duplicative for new blocks. For ongoing subscribers, ideally we should only do a given datastore query once, and then fan out the results to all subscribers.
Current design:
Events to emit are stored in a thread-safe ring buffer
Live iterator that returns new events as they happen, blocking, thread-safe
Rollback iterator that starts reading events from a given seq, then switches to live
New singleton thread that runs current code in xrpc_sync.subscribe_repos, reads blocks by seq from the datastore, assembles them into events, and stores those events in the ring buffer
load entire rollback window eagerly, on startup? or lazily, on demand?
phase two: start and stop this thread and its datastore query on demand
new minimal subscribeRepos handler that reads from the ring buffer
This would take some rearchitecting. Right now, we do all of this inside the request handler, per client:
We'd need to start a separate, shared thread for the realtime datastore queries, collect the resulting blocks into events in memory, and have each client's request handler read and emit from there.
The text was updated successfully, but these errors were encountered:
This may be getting acute, Bridgy Fed's atproto-hub is capped out on CPU serving 8 subscribeRepos clients, and it's falling behind processing Bluesky's own firehose. 😕
snarfed
added a commit
to snarfed/bridgy-fed
that referenced
this issue
Dec 19, 2024
Right now, in
subscribeRepos
, we query the datastore separately for each connected subscriber (client). This is fine for historical blocks, but it's duplicative for new blocks. For ongoing subscribers, ideally we should only do a given datastore query once, and then fan out the results to all subscribers.Current design:
xrpc_sync.subscribe_repos
, reads blocks by seq from the datastore, assembles them into events, and stores those events in the ring buffersubscribeRepos
handler that reads from the ring bufferThis would take some rearchitecting. Right now, we do all of this inside the request handler, per client:
arroba/arroba/xrpc_sync.py
Lines 199 to 206 in 351d43f
We'd need to start a separate, shared thread for the realtime datastore queries, collect the resulting blocks into events in memory, and have each client's request handler read and emit from there.
The text was updated successfully, but these errors were encountered: