Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify subscribeRepos datastore queries? #30

Open
7 tasks
snarfed opened this issue Aug 3, 2024 · 2 comments
Open
7 tasks

Unify subscribeRepos datastore queries? #30

snarfed opened this issue Aug 3, 2024 · 2 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Aug 3, 2024

Right now, in subscribeRepos, we query the datastore separately for each connected subscriber (client). This is fine for historical blocks, but it's duplicative for new blocks. For ongoing subscribers, ideally we should only do a given datastore query once, and then fan out the results to all subscribers.

Current design:

  • Events to emit are stored in a thread-safe ring buffer
    • Live iterator that returns new events as they happen, blocking, thread-safe
    • Rollback iterator that starts reading events from a given seq, then switches to live
  • New singleton thread that runs current code in xrpc_sync.subscribe_repos, reads blocks by seq from the datastore, assembles them into events, and stores those events in the ring buffer
    • load entire rollback window eagerly, on startup? or lazily, on demand?
    • phase two: start and stop this thread and its datastore query on demand
  • new minimal subscribeRepos handler that reads from the ring buffer

This would take some rearchitecting. Right now, we do all of this inside the request handler, per client:

arroba/arroba/xrpc_sync.py

Lines 199 to 206 in 351d43f

# serve new events as they happen
logger.info(f'serving new events')
while True:
with new_events:
new_events.wait(NEW_EVENTS_TIMEOUT.total_seconds())
for commit_data in server.storage.read_events_by_seq(start=last_seq + 1):
yield handle(commit_data)

We'd need to start a separate, shared thread for the realtime datastore queries, collect the resulting blocks into events in memory, and have each client's request handler read and emit from there.

@snarfed snarfed changed the title Unify subscribeRepos datastore queries Unify subscribeRepos datastore queries? Nov 6, 2024
@snarfed
Copy link
Owner Author

snarfed commented Nov 21, 2024

Added a draft design to the top description.

@snarfed snarfed added the now label Dec 19, 2024
@snarfed
Copy link
Owner Author

snarfed commented Dec 19, 2024

This may be getting acute, Bridgy Fed's atproto-hub is capped out on CPU serving 8 subscribeRepos clients, and it's falling behind processing Bluesky's own firehose. 😕

snarfed added a commit to snarfed/bridgy-fed that referenced this issue Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant