Skip to content
This repository has been archived by the owner on May 23, 2024. It is now read-only.

Add mutex to adaptiveSampler #124

Merged
merged 6 commits into from
Mar 22, 2017
Merged

Add mutex to adaptiveSampler #124

merged 6 commits into from
Mar 22, 2017

Conversation

black-adder
Copy link
Contributor

Fixes #69

A panic might occur when multiple threads try to create a new sampler in AdaptiveSampler.IsSampled()

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling d5adeb8 on granular_lock into ** on master**.

@yurishkuro
Copy link
Member

yurishkuro commented Mar 22, 2017

So to restate the scenario:

  • remove sampler takes a read lock and calls sampler.IsSampled()
  • because it's a read lock, another goroutine can do the same and also call sampler.IsSampled()
  • which internally creates a race in the adaptive sampler, since its IsSampled() is a R/W operation

sampler_test.go Outdated
}
}

func updateSampler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain better what this is doing? It looks like it's constantly causing adaptive sampler to discard half of the endpoints.

@black-adder
Copy link
Contributor Author

yup, that's the gist of it

sampler_test.go Outdated
&sampling.SamplingStrategyResponse{OperationSampling: strategiesB},
)
}()
time.Sleep(10 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a test that runs 10 seconds is a very bad thing. Especially since it doesn't actually prove the lack of race condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how else to check for a panic. Maybe I can keep the test but use t.skip() so that if you want to sanity check that everything is fine, you can run it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we need to do that check through end-to-end testing via remote sampler? Looks like we could do it only on the adaptive sampler.
  2. if we do the latter, what if you just run those routines a lot faster? I.e. why sleep 10us / 100us? Can we instead call Gosched() (https://golang.org/pkg/runtime/#Gosched) to yield from each goroutine, and have a small overall wait interval of say 10ms to terminate the test. If you run such test without your fix and it panics, then it's ok to keep it running permanently, 10ms isn't a big deal.

sampler_test.go Outdated
func isSampled(t *testing.T, remoteSampler *RemotelyControlledSampler, numOperations int, operationNamePrefix string) {
for i := 0; i < numOperations; i++ {
sampled, _ := remoteSampler.IsSampled(TraceID{}, generateRandomOperationName(numOperations, operationNamePrefix))
assert.True(t, sampled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you must yield in each iteration, otherwise there's no race, one goroutine will just run unopposed through the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, when I print comments it seems like both are running simultaneously. Ill add the yield

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you may want to use runtime.LockOSThread(), otherwise there's still no guarantee you'd have two routines running in parallel.

Have you been able to reproduce the panic with this test?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 69738a4 on granular_lock into ** on master**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling ec6dc65 on granular_lock into ** on master**.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling ec6dc65 on granular_lock into ** on master**.

@black-adder black-adder merged commit e569aa6 into master Mar 22, 2017
@black-adder black-adder deleted the granular_lock branch March 22, 2017 20:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants