Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

storage: fix LazyChunkReader.join potential deadlock #1670

Merged
merged 1 commit into from
Aug 15, 2019
Merged

Conversation

janos
Copy link
Member

@janos janos commented Aug 14, 2019

A goroutine leak was observed on a node in private cluster. Full goroutine dump was attached. The problem was a wait on wait group.

goroutine 16761764 [semacquire, 254 minutes]:
sync.runtime_Semacquire(0xc00225a238)
    /usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc00225a230)
    /usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/ethersphere/swarm/storage.(*LazyChunkReader).join(0xc00258b180, 0x126f1a0, 0xc0042e8090, 0xc0045e6000, 0x20000, 0x20000, 0xde0000, 0xe00000, 0x2, 0x80000, ...)
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:559 +0x41f
github.com/ethersphere/swarm/storage.(*LazyChunkReader).join.func1(0xc000cf2090, 0x68, 0x70, 0xc00258b180, 0x126f1a0, 0xc0042e8090, 0xc0026e24e0, 0xde0000, 0xc001f00088, 0xc0026e2120, ...)
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:556 +0x73f
created by github.com/ethersphere/swarm/storage.(*LazyChunkReader).join
    /swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:533 +0x366
goroutine 17801050 [semacquire, 249 minutes]:
sync.runtime_Semacquire(0xc002dcf7f8)
	/usr/local/go/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc002dcf7f0)
	/usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/ethersphere/swarm/storage.(*LazyChunkReader).ReadAt.func2(0xc002dcf7f0, 0xc002b7e180)
	/swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:470 +0x2b
created by github.com/ethersphere/swarm/storage.(*LazyChunkReader).ReadAt
	/swarm/build/_workspace/src/github.com/ethersphere/swarm/storage/chunker.go:469 +0x47f

It is possible that in case of error in LazyChunkReader.join wait group counter is not decremented. This PR adds a simple fix.

@janos janos requested review from zelig and acud August 14, 2019 18:13
@zelig zelig requested a review from jmozah August 14, 2019 18:35
@zelig zelig added this to the 0.5.0 milestone Aug 14, 2019
Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@acud acud merged commit ec471e8 into master Aug 15, 2019
@acud acud deleted the lcr-join-deadlock branch August 15, 2019 08:26
chadsr added a commit to chadsr/swarm that referenced this pull request Sep 23, 2019
* 'master' of github.com:ethersphere/swarm:
  chunk, storage: chunk.Store multiple chunk Put (ethersphere#1681)
  storage: fix pyramid chunker and hasherstore possible deadlocks (ethersphere#1679)
  pss: Use distance to determine single guaranteed recipient (ethersphere#1672)
  storage: fix possible hasherstore deadlock on waitC channel (ethersphere#1674)
  network: Add adaptive capabilities message (ethersphere#1619)
  p2p/protocols, p2p/testing; conditional propagation of context (ethersphere#1648)
  api/http: remove unnecessary conversion (ethersphere#1673)
  storage: fix LazyChunkReader.join potential deadlock (ethersphere#1670)
  HTTP API support for pinning contents (ethersphere#1658)
  pot: Add Distance methods with tests (ethersphere#1621)
  README.md: Update Vendored Dependencies section (ethersphere#1667)
  network, p2p, vendor: move vendored p2p/testing under swarm (ethersphere#1647)
  build, vendor: use go modules for vendoring (ethersphere#1532)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants