-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
randutil: adds NewTestRand to provide more deterministic random number generation #71350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is NewTestPseudoRand
not sufficient?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rharding6373, and @shermanCRL)
pkg/util/randutil/rand.go, line 62 at r1 (raw file):
// rand. This rand.Rand is useful in testing for deterministic behavior. func NewTestRand() (*rand.Rand, int64) { seed := rand.Int63()
What's the logic behind using the global rand? Even if the global rand is deterministically seeded (via SeedForTests()
), the seed we get here will depend on how many times the global rand was used (which would depend on which tests in that package ran before). It would be impossible to reproduce a single test if the failure occurred when the entire package was tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is discussion about the motivation behind this and various options in this thread: https://cockroachlabs.slack.com/archives/C0HM2DZ34/p1632944127270600
NewTestPseudoRand
relies on NewPseudoRand
which gets its seed from crypto_rand.Reader
, which doesn't have deterministic behavior across test runs. NewTestPseudoRand
logs the seed it uses, but many tests create multiple RNGs, as well as using the global rand
with its own seed, so it's not useful in producing deterministic randomized tests.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @RaduBerinde, and @shermanCRL)
pkg/util/randutil/rand.go, line 62 at r1 (raw file):
The rationale behind using the global rand is that a test could be reproduced using a single seed: the one seeding global rand. It allows RNGs in the same test to have different seeds.
It would be impossible to reproduce a single test if the failure occurred when the entire package was tested.
That's definitely a downside to this, and I'm open to suggestions to make it better! (e.g., being disciplined about one RNG per test could be helpful, but doesn't help in the cases where a test has to use multiple RNGs or the test also relies on global rand). This change doesn't solve the deterministic test problem for all situations, but provides the possibility of easily reproducing test conditions that wasn't possible before this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @rharding6373, and @shermanCRL)
pkg/util/randutil/rand.go, line 62 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
The rationale behind using the global rand is that a test could be reproduced using a single seed: the one seeding global rand. It allows RNGs in the same test to have different seeds.
It would be impossible to reproduce a single test if the failure occurred when the entire package was tested.
That's definitely a downside to this, and I'm open to suggestions to make it better! (e.g., being disciplined about one RNG per test could be helpful, but doesn't help in the cases where a test has to use multiple RNGs or the test also relies on global rand). This change doesn't solve the deterministic test problem for all situations, but provides the possibility of easily reproducing test conditions that wasn't possible before this change.
I see.. How about setting up a separate, internal*rand.Rand
instance that is only used for generating the seeds for NewTestRand()
? That way at least it won't matter if tests use the global rand.
We could add a function that you can call at the beginning of your test and it reports the current state of this rand instance, with a way to set it to be the same in the future if you want to repro just that one test. We could even do this automatically in NewTestRand
- we can inspect the stack (runtime.Callers
) to find the test name and print it out the first time we see a test.
One question is if it's worth having both this and NewTestPseudoRand()
. I didn't even know there were two variants, I just used whatever I saw in use elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @knz, and @RaduBerinde)
pkg/util/randutil/rand.go, line 62 at r1 (raw file):
These are interesting ideas. I changed the implementation to try to improve reproducibility at the individual test level, incorporating these ideas. The seeds for NewTestRand
are produced by an RNG that is reseeded using the global seed every time a new test is detected. We use a global seed because in some packages an RNG is used in TestMain as well as the actual test, and it also means we only need to apply one seed with COCKROACH_RANDOM_SEED
.
This approach has some limitations: 1) Tests that use t.Parallel()
won't be deterministic (this only appears to affect TestFastIntSet and TestFastIntMap; otherwise, t.Parallel()
is not commonly used in the code base). 2) RNGs are reset at the test function level, so if a test function has multiple tests in it (e.g., TestDefaultAggregateFunc/hash/StringAgg vs TestDefaultAggregateFunc/ordered/StringAgg), running an individual test with the same seed as the entire test or at the package level may not produce the same results.
One question is if it's worth having both this and NewTestPseudoRand()
I had removed this in a separate commit in the same PR, but decided to squash the commits so it's easier to see all the changes together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, and @rharding6373)
pkg/util/randutil/rand.go, line 62 at r1 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
These are interesting ideas. I changed the implementation to try to improve reproducibility at the individual test level, incorporating these ideas. The seeds for
NewTestRand
are produced by an RNG that is reseeded using the global seed every time a new test is detected. We use a global seed because in some packages an RNG is used in TestMain as well as the actual test, and it also means we only need to apply one seed withCOCKROACH_RANDOM_SEED
.This approach has some limitations: 1) Tests that use
t.Parallel()
won't be deterministic (this only appears to affect TestFastIntSet and TestFastIntMap; otherwise,t.Parallel()
is not commonly used in the code base). 2) RNGs are reset at the test function level, so if a test function has multiple tests in it (e.g., TestDefaultAggregateFunc/hash/StringAgg vs TestDefaultAggregateFunc/ordered/StringAgg), running an individual test with the same seed as the entire test or at the package level may not produce the same results.One question is if it's worth having both this and NewTestPseudoRand()
I had removed this in a separate commit in the same PR, but decided to squash the commits so it's easier to see all the changes together.
Very cool!!
pkg/util/randutil/rand.go, line 69 at r4 (raw file):
func NewTestRand() (*rand.Rand, int64) { fxn := getTestName() if fxn != "" && lastTestName != fxn {
This is racy, rng and lastTestName need to be protected by a mutex. (I realize that if called concurrently, any hope of determinisim goes out of the window anyway, but the race detector will complain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewTestPseudoRand
NewTestPseudoRand
should be synchronized as per Radu's comment; otherwise, great work!
Reviewed all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1, 115 of 117 files at r2, 7 of 8 files at r3, 2 of 2 files at r4.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @dt, @knz, and @rharding6373)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! I'll wait until Monday to merge in case there's any other feedback.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @knz, @rharding6373, and @srosenberg)
pkg/util/randutil/rand.go, line 69 at r4 (raw file):
Previously, RaduBerinde wrote…
This is racy, rng and lastTestName need to be protected by a mutex. (I realize that if called concurrently, any hope of determinisim goes out of the window anyway, but the race detector will complain)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @knz, and @rharding6373)
pkg/util/randutil/rand.go, line 43 at r5 (raw file):
init
is never racy, and its execution happens-before any other (non-init) function in the same package [1],
Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned.
Thus, mutex is never needed inside init
.
[1] https://golang.org/ref/spec#Package_initialization
pkg/util/randutil/rand.go, line 74 at r5 (raw file):
// reproducible behavior. func NewTestRand() (*rand.Rand, int64) { mtx.Lock()
Yep. Theoretically, this yields a partial order on the sequence of seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @dt, @knz, @rharding6373, and @srosenberg)
pkg/util/randutil/rand.go, line 43 at r5 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
init
is never racy, and its execution happens-before any other (non-init) function in the same package [1],Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned.
Thus, mutex is never needed inside
init
.
Thanks for the reference!
a0c8154
to
89d6086
Compare
generator option for testing Previously, random number generators were created using either a pseudo random seed from `crypto_rand.Reader` or were seeded using the same global seed as the `rand` package. The former approach does not allow for deterministic test results, since the seed is not reproducible across runs. The latter approach, while it provides more determinism, does not provide as much randomness since all random number generators would derive from the same seed. This change adds a new random number generator creator to `randutil`, `NewTestRand`, which seeds its random number generators using a random int from a dedicated instance of `rand.Rand`. The seed-generating rand is re-seeded with the global seed whenever `NewTestRand` is called by a test with a different name than the previous caller. This allows for more determinism since the sequence of seeds is reproducible from a a global rand, provided no other sources of non-determinism during testing. It also allows for more randomness in a given test, since each random number generator returned by `NewTestRand` during the same test uses a different seed. We also migrate test use cases of NewPseudoRand and NewTestPseudoRand to NewTestRand. Release note: None
bors r+ |
Build succeeded: |
This PR adds NewTestRand, a random number generator seeded by a random number
taken from the global rand. This provides more determinism for a test than the existing
NewPseudoRand by using a seed that is reproducible, provided the test has no other
sources of non-determinism.
As part of this PR we also migrate test use cases of NewPseudoRand and all uses of
NewTestPseudoRand to NewTestRand.