Skip to content
This repository has been archived by the owner on Aug 30, 2019. It is now read-only.

write: fix test races and enable CI -race flag #500

Merged
merged 1 commit into from
Oct 23, 2018
Merged

write: fix test races and enable CI -race flag #500

merged 1 commit into from
Oct 23, 2018

Conversation

gbbr
Copy link
Contributor

@gbbr gbbr commented Oct 22, 2018

No description provided.

@gbbr gbbr added this to the 6.7.0 milestone Oct 22, 2018
@gbbr gbbr requested a review from AlexJF October 22, 2018 13:21
@@ -336,7 +336,7 @@ func (s *QueuablePayloadSender) flushQueue() error {
func (s *QueuablePayloadSender) removeQueuedPayload(e *list.Element) *list.Element {
next := e.Next()
payload := e.Value.(*Payload)
s.currentQueuedSize -= int64(len(payload.Bytes))
atomic.AddInt64(&s.currentQueuedSize, -int64(len(payload.Bytes)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find a better way here to stop the race from happening. I think the "correct" way would be to refactor a bit and improve the overall testing of the package, but that seems like a rather large job for this PR alone. The scope here was simply to add the -race flag into CI. I do plan however to separately improve the testing and perhaps help get rid of the syncBarrier channel too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's causing the race here? The idea when I implemented this was that currentQueuedSize was only accessed from the sender routine.

EDIT: Oh I see, it's because you started accessing it from the test. Could we remove the atomic and use the syncBarrier on the sender to get around the race? We'll probably also have to drop the NumQueuedPayloads method or then we'll have to add a lock to s.QueuedPayloads.

statsd/statsd.go Outdated

// Client returns the global StatsClient.
func Client() StatsClient {
mu.RLock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those cases where 100% race correctness bugs me. In normal operation you initialize Client during app bootstrap and never set it again so we could access it lock free everywhere.

However, because we call SetClient on tests, we now have to introduce a lock on what was before a lock-free operation. I'm not sure if the Go compiler is smart enough to realize it can ditch the lock outside of test runs but I suspect this could have a non-negligible toll on hot and loopy code paths where Client() is called often.

I wonder if there's a different way to appease -race in these cases by triggering some global coroutine sync (or something like C/C++ macros, where we could disable some locks at compile time if we detect we are not compiling for a test run).

Anyway, -race rant mode off 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not nice to introduce locks here for the sake of tests. I should've spent more time on this. I'll try again tomorrow by using TestMain(m *testing.M) and initializing the mock statsd client only once. That should work and it should eliminate the race.

assert.Equal(t, "ping2", msg2.(string))
assert.ElementsMatch(t,
[]string{"ping1", "ping2"},
[]string{(<-multi.mch).(string), (<-multi.mch).(string)},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is execution order guaranteed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the slice expression is not evaluated until both receives complete. As for assert.ElementsMatch matches elements regardless of their order.

@@ -336,7 +336,7 @@ func (s *QueuablePayloadSender) flushQueue() error {
func (s *QueuablePayloadSender) removeQueuedPayload(e *list.Element) *list.Element {
next := e.Next()
payload := e.Value.(*Payload)
s.currentQueuedSize -= int64(len(payload.Bytes))
atomic.AddInt64(&s.currentQueuedSize, -int64(len(payload.Bytes)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's causing the race here? The idea when I implemented this was that currentQueuedSize was only accessed from the sender routine.

EDIT: Oh I see, it's because you started accessing it from the test. Could we remove the atomic and use the syncBarrier on the sender to get around the race? We'll probably also have to drop the NumQueuedPayloads method or then we'll have to add a lock to s.QueuedPayloads.

@gbbr gbbr force-pushed the gbbr/races branch 2 times, most recently from 3192e40 to cc28b52 Compare October 23, 2018 09:24
@gbbr
Copy link
Contributor Author

gbbr commented Oct 23, 2018

@AlexJF PTAL, I've used TestMain to set up the statsd.Client, which worked. For the other race, I've checked the length of the queue after stopping the sender, which also worked. I reckon that should be fine, no?

Copy link

@AlexJF AlexJF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gbbr gbbr merged commit d915c49 into master Oct 23, 2018
@gbbr gbbr deleted the gbbr/races branch October 23, 2018 12:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants