Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interop: Supervisor Sync Speed is Insufficient #12903

Open
axelKingsley opened this issue Nov 12, 2024 · 1 comment
Open

interop: Supervisor Sync Speed is Insufficient #12903

axelKingsley opened this issue Nov 12, 2024 · 1 comment

Comments

@axelKingsley
Copy link
Contributor

axelKingsley commented Nov 12, 2024

Summary

The Supervisor is unable to backfill data into the logsdb at a sufficient rate, and needs to be improved.

Sync Speed

Across multiple measurements, the speed when backfilling logs into the supervisor is very low

Spot check while attempting to recover the MVP Devnet was about 25 blocks per minute (during an early portion of sync)

Spot checking ArgoCD logs while writing this ticket:

  • 17:58 - 67982 to 19:08 - 68224 = 70min for 242 blocks = 3.4 blocks per minute

Estimating overall rate based on the sync start time to now:

  • Nov 8 ~1:30pm restarted Supervisor to sync
  • Nov 12 ~12:30 at 68224
  • ~4 days for ~68k blocks = 11.8 blocks per minute

This demonstrates that as we synchronize, we actually slow down over time. However, in all cases we are syncing so slowly that any natural network progression will outpace the Supervisor's backfill.

Theories and Hypotheses (Potential Solutions)

The supervisor also has minimal heuristics around fetching techniques, and just tries to backfill block-by-block. I would guess that this naive behavior is causing backoffs and throttles, because each receipt fetching is a combination of "Get Block Hash", "Get Block and Txs", and then "Get Block Receipts".

By batching some or all of these calls, we can reduce the proportion of calls being made for a large range during sync.

Fixes:

  • The Supervisor should be given more sophisticated backfilling techniques, including batch calls when missing large gaps. Batch calls alone would probably fix this issue, as 10xs of blocks worth of receipts could be fetched per API call.
  • The Supervisor could also use multi-threaded worker pools to speed up requests. Here's an old PR which used exactly these techniques for a downloader which fetches receipt data: 5a2ac1b
  • The Supervisor could finally have multiple Execution Endpoints to call to further spread the request load

Testing

We'll have to try purging a local devnet DB after a long period of activity to observe the RPC behavior from the supervisor. This issue may not present in local environments due to local infrastructure making network calls cheaper.

Priority

Critical for a stable devnet. Not blocking local devnets or other testing.

@axelKingsley
Copy link
Contributor Author

Here's a PR which puts Batch Functionality onto the Receipt Provider:
#12992

And a stacked one for the Supervisor to use it:
#13004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant