Remove nested read lock to prevent deadlock #2128

justinjc · 2020-01-30T16:42:19Z

What this PR does / why we need it:
Fixes #2127

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

prateek · 2020-01-30T17:24:10Z

notes from our chat:

rewrite the fix to not require multiple acquisitions of the shard read lock
TestShardTickWriteRace to use a property for tickBatchSize <- (0,10)
TestShardTickWriteRace to have tick wait interval be millisecond
TestShardTickWriteRace run the internal test method 100 times and mark it big (to ensure race detector)
can finally close [WIP] Shard Tick/Write race #506
file an issue to create a debug/test lock which checks for recursive gets and fails; and/or a linter that doesn't suck at this (https://github.com/gnieto/mulint ain't it)

src/dbnode/storage/shard.go

prateek · 2020-01-30T22:17:12Z

src/dbnode/storage/shard.go

@@ -385,8 +385,12 @@ func (s *dbShard) RetrievableBlockColdVersion(blockStart time.Time) (int, error)
 // BlockStatesSnapshot implements series.QueryableBlockRetriever
 func (s *dbShard) BlockStatesSnapshot() series.ShardBlockStateSnapshot {
 	s.RLock()
+	defer s.RUnlock()
+	return s.blockStatesSnapshotWithRLock()


is it safe to acquire the flushState RLock while holding the shard RLock?

Yep, I checked usages of the flushState Rlock and they don't conflict have a relationship anywhere else in the code, so this should be good.

codecov · 2020-01-30T23:27:51Z

Codecov Report

Merging #2128 into master will decrease coverage by 18.7%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master   #2128      +/-   ##
=========================================
- Coverage    69.9%   51.2%   -18.8%     
=========================================
  Files        1001     828     -173     
  Lines       86476   75886   -10590     
=========================================
- Hits        60478   38863   -21615     
- Misses      21715   33702   +11987     
+ Partials     4283    3321     -962

Flag	Coverage Δ
#aggregator	`68% <ø> (+4.7%)`	⬆️
#cluster	`77.3% <ø> (+1.8%)`	⬆️
#collector	`48.8% <ø> (-7.2%)`	⬇️
#dbnode	`64.5% <100%> (+0.2%)`	⬆️
#m3em	`44.2% <ø> (-8.3%)`	⬇️
#m3ninx	`56.8% <ø> (-4.9%)`	⬇️
#m3nsch	`100% <ø> (+71.5%)`	⬆️
#metrics	`17.6% <ø> (-82.4%)`	⬇️
#msg	`72.9% <ø> (-1.5%)`	⬇️
#query	`26.5% <ø> (-17.5%)`	⬇️
#x	`?`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 217cfe8...7f78cd3. Read the comment docs.

prateek

LGTM, good stuff

src/dbnode/storage/shard_race_prop_test.go

Remove nested read lock to prevent deadlock

7307bd2

justinjc requested a review from prateek January 30, 2020 16:42

justinjc added 3 commits January 30, 2020 14:17

Improve tick and write race condition prop test

1350c7c

Make test 'big' and more iterations of prop test

1da3da2

Respect shardIterateBatchMinSize

5fd9519

prateek reviewed Jan 30, 2020

View reviewed changes

src/dbnode/storage/shard.go Outdated Show resolved Hide resolved

prateek reviewed Jan 30, 2020

View reviewed changes

Add comment on test sleep duration; fix test

7f78cd3

prateek approved these changes Jan 30, 2020

View reviewed changes

prateek reviewed Jan 31, 2020

View reviewed changes

src/dbnode/storage/shard_race_prop_test.go Show resolved Hide resolved

justinjc merged commit af17524 into master Jan 31, 2020

justinjc deleted the juchan/deadlock-fix branch January 31, 2020 00:20

justinjc mentioned this pull request Jan 31, 2020

Make number of series a property of prop test #2130

Merged

robskillington mentioned this pull request Feb 13, 2020

[WIP] Shard Tick/Write race #506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove nested read lock to prevent deadlock #2128

Remove nested read lock to prevent deadlock #2128

justinjc commented Jan 30, 2020

prateek commented Jan 30, 2020 •

edited by justinjc

Loading

prateek Jan 30, 2020

justinjc Jan 30, 2020

codecov bot commented Jan 30, 2020 •

edited

Loading

prateek left a comment

Remove nested read lock to prevent deadlock #2128

Remove nested read lock to prevent deadlock #2128

Conversation

justinjc commented Jan 30, 2020

prateek commented Jan 30, 2020 • edited by justinjc Loading

prateek Jan 30, 2020

Choose a reason for hiding this comment

justinjc Jan 30, 2020

Choose a reason for hiding this comment

codecov bot commented Jan 30, 2020 • edited Loading

Codecov Report

prateek left a comment

Choose a reason for hiding this comment

prateek commented Jan 30, 2020 •

edited by justinjc

Loading

codecov bot commented Jan 30, 2020 •

edited

Loading