-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/internal/ingest: Prevent redundant and concurrent state verification runs #4821
Conversation
e9802d5
to
1655de0
Compare
bbc91a9
to
3018fcd
Compare
3018fcd
to
bf49aaf
Compare
this is a significant feature addition, seems like qualifies for an issue/ticket to provide visibility of the effort on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work!
b2b5a5d
to
688a8fe
Compare
After some tests in staging I discovered that state verification does not work on the read replica because it keeps failing with the following error: cancelling statement due to conflict with recovery So, I have removed that feature from this PR. Apart from that everything looked good |
to specifiy how often state verificaition runs and a timeout for capping the duration of a state verification run.
688a8fe
to
fff78d8
Compare
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Close #4833
As a (most likely unintentional) side effect of #4204 , it is possible for multiple ingesting nodes to perform state verification concurrently. This commit introduces a state verification lock which ensures that only one ingesting node can perform state verification at a time.
This commit also adds two new configuration variables:
--ingest-state-verification-frequency
which specifies the frequency in checkpoints for how often state verification is run--ingest-state-verification-timeout
which specifies a timeout on how long state verification can runAlso, state verification is now running on the read replica if there is one configured.
Why
We have noticed that State verification is taking more than 30 minutes to run. During this time state verification has an open repeatable read transaction. Maintaining multiple of these transactions for such an extended period of time may have a negative effect on ingestion performance. Not to mention that multiple state verification runs for the same checkpoint ledger is wasteful because it's performing the same check multiple times.
Also, if state verification is having a negative impact on ingestion performance it is important that we have configuration knobs to limit how frequently it is run. e.g. instead of running state verification on every checkpoint we can instead run it on every 288th checkpoint (approximately once per day).
Given that state verification is a read only operation we can also move it to run on the read replica which will minimize the impact of the rw database used by ingestion.
Known limitations
We need to verify that running state verification on read replicas will not significantly impact the performance of request serving horizon instances which also use the read replica db.