exp/ingest/pipeline: Fix pipeline data race during shutdown #2058
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
This commit fixes data race in
exp/ingest/pipeline
that can occur whenLiveSession
(and Horizon) is shut down.It also removes
updateStats
method that was known to have a data race (see comment in that method). It is not actively used right now but was being reported by race detector.Fix #2046.
Why
Previous code handling shutdown signal in
LiveSession
can be found below:go/exp/ingest/live_session.go
Lines 196 to 217 in 5e4d247
The problem is when shutdown signal is received,
Resume
returnsnil
so Horizon starts it's shutdown code which callsRollback()
(using internaltx
object) but at the same time pipeline is still running until the code receiving fromctx.Done
channel is executed. It means that pipeline processors can execute transactions usingtx
transaction object in DB session. See #2046 for examples.To fix this:
select
ingest session shutdown signal when waiting for pipeline to finish processing.Shutdown
on pipelines insideLiveSession.Shutdown
.Pipeline.IsRunning
method.close(s.shutdown)
insideexpingest/System.Shutdown()
.So the components now shut down exactly in the following order:
One comment on
-1
change in tests. WheningestSession.Run()
returnsnil
we shouldn't continue toingestSession.Resume()
becausenil
value means that session ended. I updated the comment inLiveSession
and also fixed Horizon code.Known limitations
Pipeline design is very powerful but it's also very easy to introduce data races like this one. We may want to refactor this as noted previously in #2050.