-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various Gen refactors for better performance. #575
Conversation
There are five changes which I've made to help ScalaCheck generate values faster. They are: * Use custom buildableOfN implementation instead of sequence * Use Gen.stringOf and Gen.stringOfN instead of generic builders * Rewrite Gen[Char] instances to be faster * Remove as much indirection as possible * Stop using sieve/sieveCopy internally There are also some changes that clean things up: * Add Gen.unicodeChar and Gen.unicodeStr * Add a Buildable[T, Seq[T]] instance * Add type annotations on some methods without them * A few indentation and formatting changes The first 4 optimizations should not change any behavior users see (except possibly balancing out some unbalanced character distributions we had prevoiusly). The last change is a bit more controversial, and will be discussed below. Here are our benchmarks before the changes: Benchmark (genSize) (seedCount) Mode Cnt Score Error Units GenBench.asciiPrintableStr 100 100 avgt 3 1707.101 ± 2575.431 us/op GenBench.const_ 100 100 avgt 3 3.431 ± 1.417 us/op GenBench.double_ 100 100 avgt 3 13.103 ± 42.670 us/op GenBench.dynamicFrequency 100 100 avgt 3 1804.406 ± 753.188 us/op GenBench.eitherIntInt 100 100 avgt 3 42.499 ± 14.959 us/op GenBench.identifier 100 100 avgt 3 3493.749 ± 1272.181 us/op GenBench.int_ 100 100 avgt 3 13.364 ± 6.165 us/op GenBench.listOfInt 100 100 avgt 3 1536.995 ± 370.880 us/op GenBench.mapOfIntInt 100 100 avgt 3 3375.948 ± 588.872 us/op GenBench.oneOf 100 100 avgt 3 19.542 ± 14.573 us/op GenBench.optionInt 100 100 avgt 3 52.750 ± 2.814 us/op GenBench.sequence 100 100 avgt 3 215.349 ± 0.736 us/op GenBench.staticFrequency 100 100 avgt 3 1472.478 ± 655.671 us/op GenBench.zipIntInt 100 100 avgt 3 26.897 ± 5.523 us/op And after: Benchmark (genSize) (seedCount) Mode Cnt Score Error Units GenBench.asciiPrintableStr 100 100 avgt 3 204.613 ± 208.688 us/op GenBench.const_ 100 100 avgt 3 2.856 ± 6.079 us/op GenBench.double_ 100 100 avgt 3 11.058 ± 17.209 us/op GenBench.dynamicFrequency 100 100 avgt 3 498.426 ± 306.193 us/op GenBench.eitherIntInt 100 100 avgt 3 33.225 ± 10.462 us/op GenBench.identifier 100 100 avgt 3 96.309 ± 4.903 us/op GenBench.int_ 100 100 avgt 3 9.213 ± 2.990 us/op GenBench.listOfInt 100 100 avgt 3 448.545 ± 228.448 us/op GenBench.mapOfIntInt 100 100 avgt 3 1778.107 ± 1532.473 us/op GenBench.oneOf 100 100 avgt 3 18.858 ± 24.069 us/op GenBench.optionInt 100 100 avgt 3 45.695 ± 18.507 us/op GenBench.sequence 100 100 avgt 3 198.399 ± 164.050 us/op GenBench.staticFrequency 100 100 avgt 3 449.571 ± 97.772 us/op GenBench.zipIntInt 100 100 avgt 3 24.025 ± 27.260 us/op Some notable speed increases include: * asciiPrintableStr (8.5x faster) * dynamicFrequency (3.6x faster) * identifier (36.4x faster) * listOfInt (3.4x faster) * mapOfIntInt (1.9x faster) * staticFrequency (3.3x faster) The speed improvements mostly affects collections, particularly strings. Since these represent a significant portion of values ScalaCheck users are likely to generate, this will probably help shorten test runtime for our users. For example, running the Paiges tests in this branch resulted in a roughly 2x speed up in time (as measured by SBT). The one optimization we might chose to forgoe is deprecating sieves. Sieves were introduced to allow filtering predicates (introduced by calling .suchThat or .filter on Gen[T] values) to be preserved in the generated results. One strange consequence of that is that Gen.R holds onto values which might be filtered out -- the acutal filtering is done when .retrieve is called. Another is that many of the collection combinators include .forall checks to try to verify that their elements are legitimate. The reason sieves were added was to support shrinking. The shrinking code uses the sieve to filter the stream of shrunken values to try to ensure that only "valid" values are produced. Since users tend to avoid using filter (because of the issues around flaky test failures due to too many discarded cases) and since most actual Gen instances fail to shrink properly anyway, these sieves have likely not benefited too many users. However, there's a risk that for some users removing sieves will exacerbate shrinking issues they already have. I'd like to consider removing sieves, since as it stands I'm not sure they are consistently used, and using them doesn't solve the problems we have with shrinking. However, I'll also benchmark this branch with sieves put back in, and if the difference in performance is minor we may want to leave sieves alone for now.
implicit lazy val arbShort: Arbitrary[Short] = Arbitrary( | ||
Gen.chooseNum(Short.MinValue, Short.MaxValue) | ||
) | ||
implicit lazy val arbShort: Arbitrary[Short] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these lazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code was lazy and I didn't change it.
It's possible there are initialization issues? If we don't need lazy
I'm fine removing it.
|
||
/** Absolutely, totally, 100% arbitrarily chosen Unit. */ | ||
implicit lazy val arbUnit: Arbitrary[Unit] = Arbitrary(const(())) | ||
implicit lazy val arbUnit: Arbitrary[Unit] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why lazy?
r(None, seed).copy(l = labels) | ||
case Some(t) => | ||
val r = f(t) | ||
r.copy(l = labels ++ r.labels, sd = r.seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would labels | r.labels
be clearer that this is set union (and possibly faster)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and possibly -- seems like a good idea.
gen { (p, seed0) => | ||
var seed: Seed = seed0 | ||
val bldr = evb.builder | ||
val allowedFailures = Integer.max(10, n / 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of a bummer that this totally ad-hoc. I wonder if it should be in p
? Maybe as some kind of filter/retry strategy? Might require more thought, but maybe add a TODO
. It would be interesting to think about filter failure strategies in context of flatMap/zip
. Large flatmaps/zips cause exponential decrease in filter success rates if you retry the whole chain. But if you retry after each step, you can keep the success rate constant.
e.g.: if you zip N things, each successful with probability p
, the full zip has success p^N
, so, the expected number of retries to get a single item: (1/p)^N = exp(N ln(1/p))
. But if you retry each step in the chain K
times for a success, you get p_step = 1 - (1 - p)^K
. and 1/p_step^N = 1/(1 - (1 - p)^K))^N ~ exp(N(1 - p)^K)
If you choose K ~ log N
then exp(N (1 - p)^K) ~ exp(N exp(-pK)) = exp(N (1 / N) ^ p) ~ 1
So, what this means is if you have a zip chain N
long, and you retry each step O(log N)
times, you get O(1)
total success rate, which means we only do N log N
work.
For flatMap, we don't know how deep we are going to go, indeed, it can be dynamic, but for zip we statically know, so we could just do log_2(Zip width)
size retries.
By the way, this analysis suggests if you know you have N
items, like we do in this combinator, we want to retry O(log N)
not c N
times. I think with linear retries, which you have here, you should make failure of the entire generator exponentially low. Maybe linear is actually fine, since you have to do linear work already, the total work here is still linear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would love to be threading through something like a total retries parameter and then a used retries count. In the absence of that I've sort of erred on the side of minimizing filtering in the library itself while also adding retries.
I'm a bit nervous about adding retrying to flatMap/zip in this PR, so I'd rather wait and think about how best to do that. I do like the idea you had for zip of apportioning the retries across all the underlying generators, for example.
|
||
/** Generates a map with at most the given number of elements. This method | ||
* is equal to calling <code>containerOfN[Map,(T,U)](n,g)</code>. */ | ||
def mapOfN[T,U](n: Int, g: Gen[(T,U)]) = buildableOfN[Map[T,U],(T,U)](n,g) | ||
def mapOfN[T,U](n: Int, g: Gen[(T, U)]) = buildableOfN[Map[T, U],(T, U)](n, g) | ||
|
||
/** Generates an infinite stream. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a lie? Doesn't it stop at the first failure?
Again, here is a place for a retry strategy: we may try to retry some number of times.
Or: we could just call unfold(p, r.seed)
on the None
branch: presumably we will hit more items... It isn't clear to me why a single filter should kill the infinite stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the comment should have said "Generates a potentially infinite stream"
Retrying seems fine to me. I didn't change the existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry, Github's layout confused me and I left some comments here that didn't make sense, which I've since deleted.)
new Gen[Char] { | ||
def doApply(p: P, seed0: Seed): Gen.R[Char] = { | ||
val (x, seed1) = seed0.long | ||
val i = ((x & Long.MaxValue) % cs.length).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is biased. Do you care? It won't produce evenly over cs.length
since x
is uniform over longs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bias is very slight, since we have an 31-bit Int
denominator and a 63-bit Long
numerator. In the worst case we'll produce some values with one billionth more frequency than others. I can add a comment, but it seemed more straightforward not to add in the retry logic for this case (as opposed to Int % Int
where the bias can be significant in the worst case).
|
||
/** Generates a ASCII character, with extra weighting for printable characters */ | ||
def asciiChar: Gen[Char] = chooseNum(0, 127, 32 to 126:_*).map(_.toChar) | ||
val asciiChar: Gen[Char] = | ||
choose(0, 127).map(_.toChar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a different strategy here than say charSample
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think charSample
is best when you need to allocate a bunch of non-continuous Char
ranges that would require a lot of branching to support. For asciiChar
and asciiPrintableChar
there's a single contiguous range.
That said, since each only uses around 100 characters, there's no real problem if we did want to allocate a static array to sample from.
|
||
/** Generates a ASCII printable character */ | ||
def asciiPrintableChar: Gen[Char] = choose(32.toChar, 126.toChar) | ||
val asciiPrintableChar: Gen[Char] = | ||
choose(32.toChar, 126.toChar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why different from charSample
?
case Some(c) => | ||
sb += c | ||
case None => | ||
() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to do a string of at most n, not exactly n, right? That's a change isn't it? Again, here is a place for a retry strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point. In practice our built-in character generators don't fail but it's good to be defensive about user-provided generators.
def builder = new ArrayListBuilder[T] | ||
} | ||
|
||
implicit def buildableSeq[T]: Buildable[T, Seq[T]] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we do something like implicit def canBuild[A, B](implicit cbf: CanBuildFrom[Nothing, A, B]): Buildable[A, B]
and let CanBuildFrom do the work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I just chose to do the simplest thing in line with the existing code. It's worth experimenting with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking into this for a second I think the 2.13 collections makes relying on CanBuildFrom
for this a bit dicey. We only need to support Seq
and this is really only for internal use so I'd rather not try to do something more general until we need it.
In particular, this adds retrying to the infiniteStream, stringOf, and stringOfN combinators. It also slightly optimizes the loop conditions on buildableOfN, removes lazy from most arbitrary definitions, and cleans up a few other things. This commit does appear to have made some benchmarks slower, although it's possible my machine is just more busy than it was. I've also added a few more benchmarks. Benchmark (genSize) (seedCount) Mode Cnt Score Error Units GenBench.arbitraryString 100 100 avgt 3 600.706 ± 569.014 us/op GenBench.asciiPrintableStr 100 100 avgt 3 432.235 ± 229.533 us/op GenBench.const_ 100 100 avgt 3 2.775 ± 8.017 us/op GenBench.double_ 100 100 avgt 3 9.941 ± 3.020 us/op GenBench.dynamicFrequency 100 100 avgt 3 481.478 ± 253.262 us/op GenBench.eitherIntInt 100 100 avgt 3 30.911 ± 13.071 us/op GenBench.identifier 100 100 avgt 3 186.688 ± 327.920 us/op GenBench.int_ 100 100 avgt 3 11.266 ± 8.500 us/op GenBench.listOfInt 100 100 avgt 3 445.506 ± 403.799 us/op GenBench.mapOfIntInt 100 100 avgt 3 1910.653 ± 2974.722 us/op GenBench.oneOf 100 100 avgt 3 15.945 ± 10.462 us/op GenBench.optionInt 100 100 avgt 3 42.815 ± 18.030 us/op GenBench.sequence 100 100 avgt 3 205.571 ± 42.976 us/op GenBench.staticFrequency 100 100 avgt 3 510.956 ± 111.016 us/op GenBench.testFilter 100 100 avgt 3 1081.890 ± 607.106 us/op GenBench.zipIntInt 100 100 avgt 3 27.987 ± 22.614 us/op
@johnynek Thanks for the thorough review! I just pushed some code that addresses many of your comments, but not all of them. I do seem to have made some combinators a bit slower so I want to consider how to proceed next. |
Apparently changing |
I ran the benchmark for the fist commit, 126168f. Before (current master):
After (126168f):
Confirms the improvements Erik observed above, different only from some variation:
|
The outstanding work on this branch is:
I'd also like to try out this branch on more other repositories to see what the total wall clock time improvement using this branch is. |
I'm going to work on cherry-picking this apart so that it can get closer to crossing the finish line. I noticed a lot of the improvements here are not very controversial. Isolating them out and merging them will make that case clearer. This exercise also helps me study the changes. I don't have good intuition about this aspect of the library. Erik is likely the singular expert of the seed implementation that he contributed to ScalaCheck. Merging the changes separately will also make it easier to troubleshoot any potential regressions. Any bugs are most likely from the challenge of improving an old but popular property-testing library, not because of sloppy coding. Since the Travis build publishes snapshots, it's also worth going this route of merging the individual changes, should we need to verify the changes. |
I think all of these changes have now been incorporated. Feel free to reopen if this isn't the case. |
There are five changes which I've made to help ScalaCheck generate
values faster. They are:
There are also some changes that clean things up:
The first 4 optimizations should not change any behavior users see
(except possibly balancing out some unbalanced character distributions
we had prevoiusly). The last change is a bit more controversial, and
will be discussed below.
Here are our benchmarks before the changes:
And after:
Some notable speed increases include:
The speed improvements mostly affects collections, particularly
strings. Since these represent a significant portion of values
ScalaCheck users are likely to generate, this will probably help
shorten test runtime for our users. For example, running the Paiges
tests in this branch resulted in a roughly 2x speed up in time (as
measured by SBT).
The one optimization we might chose to forgoe is deprecating sieves.
Sieves were introduced to allow filtering predicates (introduced by
calling .suchThat or .filter on Gen[T] values) to be preserved in the
generated results. One strange consequence of that is that Gen.R holds
onto values which might be filtered out -- the acutal filtering is
done when .retrieve is called. Another is that many of the collection
combinators include .forall checks to try to verify that their
elements are legitimate.
The reason sieves were added was to support shrinking. The shrinking
code uses the sieve to filter the stream of shrunken values to try to
ensure that only "valid" values are produced. Since users tend to
avoid using filter (because of the issues around flaky test failures
due to too many discarded cases) and since most actual Gen instances
fail to shrink properly anyway, these sieves have likely not benefited
too many users. However, there's a risk that for some users removing
sieves will exacerbate shrinking issues they already have.
I'd like to consider removing sieves, since as it stands I'm not sure
they are consistently used, and using them doesn't solve the problems
we have with shrinking. However, I'll also benchmark this branch with
sieves put back in, and if the difference in performance is minor we
may want to leave sieves alone for now.