-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix (almost) infinite loop in Fileset writer when previous fileset encountered an error writing out index files #2058
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2058 +/- ##
==========================================
- Coverage 73.9% 58.1% -15.9%
==========================================
Files 1013 1008 -5
Lines 103993 134996 +31003
==========================================
+ Hits 76954 78553 +1599
- Misses 22198 49862 +27664
- Partials 4841 6581 +1740
Continue to review full report at Codecov.
|
1a11433
to
5199fda
Compare
src/dbnode/persist/fs/write.go
Outdated
@@ -161,6 +161,11 @@ func (w *writer) Open(opts DataWriterOpenOptions) error { | |||
w.currIdx = 0 | |||
w.currOffset = 0 | |||
w.err = nil | |||
// This happens after writing the previous set of files index files, however, do it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mind grouping the reseting code into a new func (w *writer) reset()
fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangentially - should indexEntries
be bounded/randomly re-allocated if it exceeds certain size to reduce memory usage? not for this PR, just curious myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prateek sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdarulis Theoretically yeah it probably should. In practice we only keep one of these around per node (for the most part) and the size of this slice will never exceed the number of series for a given block/shard combination so in practice its usually in the 10s of thousands tops and each item in the slice is not that big so I think its really unlikely it would become an issue. There are definitely other things like this in the code-base though where we're more paranoid because they could become an issue if left unchecked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/dbnode/persist/fs/write.go
Outdated
@@ -161,6 +161,11 @@ func (w *writer) Open(opts DataWriterOpenOptions) error { | |||
w.currIdx = 0 | |||
w.currOffset = 0 | |||
w.err = nil | |||
// This happens after writing the previous set of files index files, however, do it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangentially - should indexEntries
be bounded/randomly re-allocated if it exceeds certain size to reduce memory usage? not for this PR, just curious myself.
This P.R fixes a bug in the fileset writer that would trigger a (near) infinite loop when the writer was reused after the previous set of files encountered an error trying to write out their index files. The P.R includes a regression test to ensure the issue doesn't crop up again and verify the included fix.
The bug was caused by the following sequence of events:
For some reason (root cause still pending on that) writing out a set of fileset files encountered an error during the call to
Close()
, likely that duplicate series IDs had been written into the fileset file which will cause the writer to error out when it tries to write its index-related files.The writer is reused for writing out an entirely different set of fileset files. Normally this is fine since the call to
Open()
performs an implicit reset of the writers' state, however, the call toOpen()
has a bug where it does not reset the state ofw.indexEntries
which is a slice of all the time series that were written into the file. So now the state is that we're writing out filesets for files X but we're still holding on toindexEntries
from files Y.The fileset that we're writing just so happens to not have any time series for the current block start. This is a normal scenario that can happen when the M3DB nodes are not receiving any writes or briefly after topology changes where a flush may occur for a shard that was recently closed.
After writing 0 time series into the files,
Close
is called on the fileset writer which triggers the following block of code:Due to the implementation of
EstimateFalsePositiveRate
, passing a value of0
forn
will result in9223372036854775808
being returned for the value ofk
(the number of hash functions the bloom filter will run for each value that is added to the bloom filter).Normally this isn't a big deal because when the value of
n
is 0 there are no time series IDs to add to the bloom filter anyways andbloomfilter.Add()
never gets called.However, due to the aforementioned error writing out the previous set of files + the bug with
indexEntries
not properly being reset, the call towriteIndexFileContents
will run the following function:Since
w.indexEntries
was never properly reset,bloomfilter.Add()
will be called and the goroutine will get stuck in a near infinite loop where it tries to run9223372036854775808
hash functions.This issue was extremely hard to debug because it manifested as the M3DB processes turning into "zombies" with 1 CPU core constantly pegged, but the nodes would not respond to any RPCs or networking in general so standard pprof tooling could not be used.
The reason this happened is because all of the function calls within
bloomfilter.Add()
are inlined making the entire function call unpre-emptible by the G.C until all of the9223372036854775808
hash functions had been completed.So when a stop the world G.C was started by the Go runtime it shut down all active goroutines that could have served any network requests and then hung forever waiting for the goroutine running the
bloomfilter.Add()
to complete so it could begin garbage collection.This is clearly visible in the output from
sudo perf top
which shows the Go runtime stuck trying to start a stop the world G.C as well as demonstrates thebloomfilter.Add
is clearly stuck in a very long loop based on how much time is being spent on the highlighted assembly instructions.This P.R likely requires several other followups:
EstimateFalsePositiveRate
from returning absurdly large values of K when the value ofn
is zero.However we will get this P.R merged ASAP to prevent the nodes from getting stuck into undebuggable states.