Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TBufferedServer: Avoid channel close/send race on Stop #2583

Merged

Conversation

chlunde
Copy link
Contributor

@chlunde chlunde commented Oct 22, 2020

TBufferedServer.Close() may close the data channel before
Serve detects it. Move chan close() to Serve to prevent
a panic caused by sending on a closed channel.

See also golang/go#27769 (comment)

=== RUN   TestTBufferedServer_SendReceive
==================
WARNING: DATA RACE
Write at 0x00c000074850 by goroutine 9:
  runtime.closechan()
      /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/chan.go:352 +0x0
  github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Stop()
...
Previous read at 0x00c000074850 by goroutine 10:
  runtime.chansend()
      /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/chan.go:158 +0x0
  github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Serve()
      /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go:97 +0x264
...
==================
    testing.go:1042: race detected during execution of test
--- FAIL: TestTBufferedServer_SendReceive (0.01s)

Closes #2577

Signed-off-by: Carl Henrik Lunde [email protected]

TBufferedServer.Close() may close the data channel before
Serve detects it. Move chan close() to Serve to prevent
a panic caused by sending on a closed channel.

See also golang/go#27769 (comment)

	=== RUN   TestTBufferedServer_SendReceive
	==================
	WARNING: DATA RACE
	Write at 0x00c000074850 by goroutine 9:
	  runtime.closechan()
	      /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/chan.go:352 +0x0
	  github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Stop()
	...
	Previous read at 0x00c000074850 by goroutine 10:
	  runtime.chansend()
	      /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/chan.go:158 +0x0
	  github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Serve()
	      /home/travis/gopath/src/github.com/jaegertracing/jaeger/cmd/agent/app/servers/tbuffered_server.go:97 +0x264
	...
	==================
	    testing.go:1042: race detected during execution of test
	--- FAIL: TestTBufferedServer_SendReceive (0.01s)

Closes jaegertracing#2577

Signed-off-by: Carl Henrik Lunde <[email protected]>
@chlunde chlunde requested a review from a team as a code owner October 22, 2020 19:06
@chlunde chlunde requested a review from vprithvi October 22, 2020 19:06
@mergify mergify bot requested a review from jpkrohling October 22, 2020 19:06
@chlunde
Copy link
Contributor Author

chlunde commented Oct 22, 2020

The downside is that the channel will not be closed if Serve is never called, but I can't see any way this will happen with the current code.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR is changing how Close and Read are interacting, not Close and Send.

@chlunde
Copy link
Contributor Author

chlunde commented Oct 23, 2020

Not sure if I follow, I believe this is what happens on dataChan:

Goroutine 1                                                  | Goroutine 2
-------------------------------------------------------------|------------------------------
// Serve initiates the readers and starts serving traffic    |
func (s *TBufferedServer) Serve() {                          |
  atomic.StoreUint32(&s.serving, 1)                          |
  for atomic.LoadUint32(&s.serving) == 1 { // IsServing      |
    readBuf := s....                                         |
    // ... context switch ....  ---->                        | // Stop stops the serving of traffic
                                                             | // and waits until the queue is
                                                             | // emptied by the readers
                                                             | func (s *TBufferedServer) Stop() {
                                                             |   atomic.StoreUint32(&s.serving, 0)
                                                             |   _ = s.transport.Close()
                                                             |   close(s.dataChan)
                                                             | // <----- ... context switch back ...
                                                             |
    s.dataChan <- readBuf // line 97 - panic, send on closed |
  }                                                          |
}                                                            |

After patch, 4000 runs no error

go test -test.count=4000 ./cmd/agent/app/servers/
ok  	github.com/jaegertracing/jaeger/cmd/agent/app/servers	45.526s

Before patch, typically it fails before 30 runs:

go test -test.count=30 ./cmd/agent/app/servers/
panic: send on closed channel

goroutine 9 [running]:
github.com/jaegertracing/jaeger/cmd/agent/app/servers.(*TBufferedServer).Serve(0xc000170bd0)
	.../src/jaeger/cmd/agent/app/servers/tbuffered_server.go:97 +0x129
created by github.com/jaegertracing/jaeger/cmd/agent/app/servers.TestTBufferedServer_SendReceive
	.../src/jaeger/cmd/agent/app/servers/tbuffered_server_test.go:47 +0x1bb
FAIL	github.com/jaegertracing/jaeger/cmd/agent/app/servers	0.017s
FAIL

@chlunde chlunde requested a review from yurishkuro October 23, 2020 14:02
@yurishkuro yurishkuro dismissed their stale review October 25, 2020 20:40

you're correct

@codecov
Copy link

codecov bot commented Oct 25, 2020

Codecov Report

Merging #2583 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2583      +/-   ##
==========================================
- Coverage   95.31%   95.30%   -0.02%     
==========================================
  Files         208      208              
  Lines        9281     9281              
==========================================
- Hits         8846     8845       -1     
- Misses        356      357       +1     
  Partials       79       79              
Impacted Files Coverage Δ
cmd/agent/app/servers/tbuffered_server.go 100.00% <100.00%> (ø)
...lugin/sampling/strategystore/adaptive/processor.go 99.07% <0.00%> (-0.93%) ⬇️
plugin/storage/integration/integration.go 77.90% <0.00%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f954fe...0d43fff. Read the comment docs.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurishkuro yurishkuro merged commit 2bce5ad into jaegertracing:master Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race in unit test
3 participants