You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Supervisor is unable to backfill data into the logsdb at a sufficient rate, and needs to be improved.
Sync Speed
Across multiple measurements, the speed when backfilling logs into the supervisor is very low
Spot check while attempting to recover the MVP Devnet was about 25 blocks per minute (during an early portion of sync)
Spot checking ArgoCD logs while writing this ticket:
17:58 - 67982 to 19:08 - 68224 = 70min for 242 blocks = 3.4 blocks per minute
Estimating overall rate based on the sync start time to now:
Nov 8 ~1:30pm restarted Supervisor to sync
Nov 12 ~12:30 at 68224
~4 days for ~68k blocks = 11.8 blocks per minute
This demonstrates that as we synchronize, we actually slow down over time. However, in all cases we are syncing so slowly that any natural network progression will outpace the Supervisor's backfill.
Theories and Hypotheses (Potential Solutions)
The supervisor also has minimal heuristics around fetching techniques, and just tries to backfill block-by-block. I would guess that this naive behavior is causing backoffs and throttles, because each receipt fetching is a combination of "Get Block Hash", "Get Block and Txs", and then "Get Block Receipts".
By batching some or all of these calls, we can reduce the proportion of calls being made for a large range during sync.
Fixes:
The Supervisor should be given more sophisticated backfilling techniques, including batch calls when missing large gaps. Batch calls alone would probably fix this issue, as 10xs of blocks worth of receipts could be fetched per API call.
The Supervisor could also use multi-threaded worker pools to speed up requests. Here's an old PR which used exactly these techniques for a downloader which fetches receipt data: 5a2ac1b
The Supervisor could finally have multiple Execution Endpoints to call to further spread the request load
Testing
We'll have to try purging a local devnet DB after a long period of activity to observe the RPC behavior from the supervisor. This issue may not present in local environments due to local infrastructure making network calls cheaper.
Priority
Critical for a stable devnet. Not blocking local devnets or other testing.
The text was updated successfully, but these errors were encountered:
Summary
The Supervisor is unable to backfill data into the
logsdb
at a sufficient rate, and needs to be improved.Sync Speed
Across multiple measurements, the speed when backfilling logs into the supervisor is very low
Spot check while attempting to recover the MVP Devnet was about 25 blocks per minute (during an early portion of sync)
Spot checking ArgoCD logs while writing this ticket:
17:58 - 67982
to19:08 - 68224
=70min for 242 blocks
= 3.4 blocks per minuteEstimating overall rate based on the sync start time to now:
Nov 8 ~1:30pm restarted Supervisor to sync
Nov 12 ~12:30 at 68224
~4 days for ~68k blocks
= 11.8 blocks per minuteThis demonstrates that as we synchronize, we actually slow down over time. However, in all cases we are syncing so slowly that any natural network progression will outpace the Supervisor's backfill.
Theories and Hypotheses (Potential Solutions)
The supervisor also has minimal heuristics around fetching techniques, and just tries to backfill block-by-block. I would guess that this naive behavior is causing backoffs and throttles, because each receipt fetching is a combination of "Get Block Hash", "Get Block and Txs", and then "Get Block Receipts".
By batching some or all of these calls, we can reduce the proportion of calls being made for a large range during sync.
Fixes:
Testing
We'll have to try purging a local devnet DB after a long period of activity to observe the RPC behavior from the supervisor. This issue may not present in local environments due to local infrastructure making network calls cheaper.
Priority
Critical for a stable devnet. Not blocking local devnets or other testing.
The text was updated successfully, but these errors were encountered: