ingest/ledgerbackend: Make sure Stellar-Core is not started before previous instance termination #4020
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Add a code to
CaptiveCoreBackend.startPreparingRange
that ensures that previously started Stellar-Core instance: check ifgetProcessExitError
returnstrue
which means that Stellar-Core process is fully terminated. This prevents a situation in which a new instance is started and clashes with the previous one.Why
The existing code contains a bug, likely introduced in 0f2d08b. The context returned by
stellarCoreRunner.context()
is cancelled instellarCoreRunner.close()
which initiates the termination process. At the same timeCaptiveCoreBackend.PrepareRange()
callsCaptiveCoreBackend.isClosed()
internally that checks which return value depends onstellarCoreRunner
context being closed. This is wrong because it's possible that Stellar-Core is still not closed even when aforementioned context is cancelled - it can be still closing so the process can be still running.Because of this the following chain of events can lead to two Stellar-Core instances running (briefly) at the same time:
fileWatcher
to callstellarCoreRunner.close()
which cancels thestellarCoreRunner.context()
.CaptiveBackend.IsPrepared()
is called, which returnsfalse
becausestellarCoreRunner.context()
is canceled and then callsCaptiveBackend.PrepareRange()
to restart Stellar-Core.PrepareRange()
also checks ifstellarCoreRunner.context()
is cancelled (it is but Stellar-Core process can still run shutdown procedure) and then attempts to start a new instance.Known limitations
This commit is really a quick fix. Code before 0f2d08b was simpler because it was calling
Kill()
on a process so "terminating" and "terminated" were exactly the same state. After 0f2d08b there are now two events associated with a Stellar-Core process (as above). Because of this the code requires a larger refactoring. We may reconsider usingtomb
package I tried in #3258 that was later closed in favour of: #3278.