-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest: Reuse Stellar-Core on-disk DB in online mode #4471
ingest: Reuse Stellar-Core on-disk DB in online mode #4471
Conversation
if err != nil { | ||
r.log.Infof("Error running offline-info: %v, removing existing storage-dir contents", err) | ||
removeStorageDir = true | ||
} else if uint32(info.Info.Ledger.Num) != from { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know how core maintains info.Info.Ledger.Num
, i.e. does it only bump it when it knows the meta record for that sequence was read off the pipe? wondering if info.Info.Ledger.Num
will tend to be farther ahead than from
which represents the last sequence that horizon read off the pipe(and serialized to history), if it does drift asynchronously from meta pipe reader activity(horizon), then this condition won't get hit much, right, result being it ends up in same routine of new-db/catchup
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my best knowledge and some experimenting it seems that Stellar-Core only closes the ledger once it's read from meta pipe. This leaves us with two cases:
- Horizon is catching up (after restart or state build) - it this case
bufferedLedgerMetaReader
can read ledgers from meta pipe upfront which will make the Horizon to be behind. In this case, when Horizon is stopped with ledgers in the buffer the solution in this PR will not work because the ledger sequences in will not match on restart. We can try removingbufferedLedgerMetaReader
in online mode but I'm not sure about performance of this change. We can explore it in a separate PR. - Horizon is ingesting latest ledgers - in this case the
bufferedLedgerMetaReader
will contain up to one ledger but if Horizon is shutdown gracefully it will process this ledger before shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that's interesting, meaning there's only one ledger of data present in that pipe at any time, sounds like core writer blocks until it's empty, which is the signal that prior ledger was read, but, this at least recovers from any out-of-sync case and worst outcome is it does the same as current day of full removal first and init first.
return errors.Wrap(err, "error initializing core db") | ||
// Check if on-disk core DB exists and what's the LCL there. If not what | ||
// we need remove storage dir and start from scratch. | ||
removeStorageDir := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worthwhile to add a unit test in stellar_core_runner_test.go
to assert this new outcome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick update on this: I'm working on refactoring stellarCoreRunner
to allow writing better unit tests. I'll have a new commit ready by the end of today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually while refactoring I changed some other parts of stellarCoreRunner
that seemed inconsistent. Would you mind 👍 this PR (if there is nothing else that requires changes) and I'll open another PR with refactoring and tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice solution with minimal coding!
This commit changes the behaviour of `stellarCoreRunner` when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by using `stellar-core offline-info` command) if the LCL of Stellar-Core matches the requested ledger in `startFrom`. This was done because while applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases. Close stellar#4454.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
This commit changes the behaviour of
stellarCoreRunner
when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by usingstellar-core offline-info
command) if the LCL of Stellar-Core matches the requested ledger instartFrom
.Close #4454.
Why
While applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases.
Known limitations
[TODO or N/A]