Add choose[BigInt] #636

dmurvihill · 2020-03-02T18:31:20Z

This is my partial implementation of Choose[BigInt], as we discussed on Friday (#631). I'd like some feedback before I proceed with BigDecimal.

One major issue. The new property check does not seem to be executing with sbt test. This can be verified by replacing the chooseBigInt.choose() method with ??? and observing that all of the tests still pass. Any ideas?

jvm/src/test/scala/org/scalacheck/GenSpecification.scala

dmurvihill · 2020-03-03T18:02:44Z

Thanks for the debugging help! I've just pushed the finished implementation. Look for a separate branch with BigDecimal. Anything else to change?

dmurvihill · 2020-03-04T17:23:04Z

Travis seems to have failed due to java.lang.NoClassDefFoundError: sbt/internal/util/ConsoleOut$. Configuration error in the CI server? I don't think I added any console output in this PR.

ashawley · 2020-03-04T17:29:11Z

Yeah, Travis has intermittent failures it seems.

ashawley · 2020-03-04T20:47:03Z

I see you reached in to the seed generator to add BigInt support. Is that necessary?

Would this internal implementation of chBigInt be sufficient (it's missing bound checks)?

    private def chBigInt(l: BigInt, h: BigInt)(p: P, seed: Seed): R[BigInt] = {
      var (n, s) = seed.double
      val x = (BigDecimal(h - l) * BigDecimal(n)).toBigInt + l
      r(Some(x), s)
    }

dmurvihill · 2020-03-05T16:46:02Z

The code that's in the seed generator definitely can be moved out of it. Stand by.

dmurvihill · 2020-03-05T17:37:19Z

Ok, BigInt generation has been moved out of Seed. Since there is more than one helper function added to Gen, I have made them all private members of the chooseBigInt object. Let me know if there is a different organization that you prefer.

I chose this implementation instead of the one offered above because it can generate all values between l and h. Unfortunately, we can't choose a 2,147,483,647 bit number with a single call to a 64-bit RNG.

ashawley · 2020-03-06T15:14:53Z

Thanks for doing that. Most types can have their values be built out of existing generators and one or more rng numbers of long or double. There's usually no need to add to the rng implementation.

ashawley · 2020-03-06T15:17:21Z

Indeed, 64 bits of entropy can't cover the range of values of a possible 2^31 bit value. However, there's a cost to full coverage. We need to weigh coverage with the need to have property tests run fast. Values need to be generated in constant or linear time.

I haven't tried to grok your implementation for BigInt, but it does seem to take a long time to generate values. I don't know a lot about BigInt, and I haven't profiled whether the time is because BigInt is slow for large values, or if it's your implementation. On my computer, this test takes 15 to 20s:

property("Gen.choose(BigInt( 2^(2^18 - 1)), BigInt(-2^(2^18 - 1)))") = {
  val (l, h) = (BigInt(-2).pow(262143),
                BigInt( 2).pow(262143))
  Prop.forAllNoShrink(Gen.choose(l, h)) { x =>
    l <= x && x <= h
  }
}

Upping it to 2^19 bit number takes over a minute. Can we improve this? Is there an upper limit to the scale of values we can reasonably expect to be able to support?

dmurvihill · 2020-03-06T15:46:30Z

This implementation is linear in the range's bit length (or logarithmic in its size) but it creates rather a lot of Java objects in each iteration. I'll see if I can save us some time.

Could you share the workflow you used to run your test property? Did you create a separate project or do you have a command that can be run from the SBT console?

ashawley · 2020-03-06T16:48:54Z

Sure, I'll send some pointers to my workflow in a private email. Our build is complex and isn't very well documented. Apologies for that.

ashawley · 2020-03-06T16:58:12Z

Yeah, it looks like the implementation is linear for bit-size. Maybe we can try to sacrifice complete coverage a little bit for better performance.

dmurvihill · 2020-03-06T21:45:05Z

The example you sent me, with l=BigInt(-2).pow(524287) and h=BigInt(2).pow(524287), took about 20s on my machine.

A new implementation has been pushed that feeds containerOfN[Array, Long] into a Java ByteBuffer, instantiating a constant number of Java objects. Here are the numbers from my machine:

l=BigInt(-2).pow(524287), h=BigInt(2).pow(524287) (2^19 bit numbers): 0.88s
l=BigInt(-2).pow(1073741825), h=BigInt(2).pow(1073741824) (2^30 bit numbers): 1,157 s (19m 14s)

ashawley · 2020-03-07T15:08:35Z

That's encouraging. Is it possible to support the same performance for 2^19 bits as 2^30 bits?

ashawley · 2020-03-07T15:09:41Z

It looks like there is an intermittent failure of one of the tests.

[info] ! Gen.choose-big-int: Exception raised on property evaluation.
[info] > ARG_0: 0
[info] > ARG_1: 0
[info] > Exception: java.lang.ArrayIndexOutOfBoundsException: 0

You may want to disable shrinking by using forAllNoShrink for this test. It may or may not be hiding the underlying issue. Although, disabling shrinking will print huge bigints and therefore take a long time to fail.

dmurvihill · 2020-03-09T15:08:06Z

Thanks for catching. The test failure occurs when the input numbers satusfy low == high. Since they are produced by nonEmptyContainerOf[Array, Byte](arbitrary[Byte]).map(BigInt(_)), I guess it's relatively rare that they would produce the same number (and mostly n=0).

dmurvihill · 2020-03-09T16:20:59Z

Here are some performance test results.

Test:

  property("choose(BigInt( 2^(2^18 - 1)), BigInt(-2^(2^18 - 1)))") = {
    val numBitsOfBits = 21
    val l = BigInt(-2).pow((2 << numBitsOfBits) - 1)
    val h = BigInt(2).pow((2 << numBitsOfBits) - 1)
    Prop.forAllNoShrink(Gen.choose(l, h)) { x =>
      true
//      l <= x && x <= h
    }
  }

The current implementation has two phases:

use containerOfN to generate the appropriate number of Longs
Write the Longs to a Java ByteBuffer and feed that to BigInt.apply.

Mean timing for 5 runs of 100 tests each:
Current version (commit ae25519): 4.4s
Phase 1 only (phase 2 always returs min): 2.3s
Phase 2 only (phase 1 fills only with Long.MinValue): 2.3s

time head -c 262144 /dev/urandom > /dev/null

real	0m0.009s
user 0m0.001s
sys	0m0.004s

So I think we are running up against significant JVM overhead here.

In any case, going from 2^19 bits to 2^30 bits will take about 2,048 times as long no matter what we do. Note that:

People who need generators for numbers exceeding 2^20 bits are extremely rare, if they even exist, and are probably willing to pay the price for a high-quality generator
Unless we do something very clever, any reduction in resolution will remove important cases like odd numbers and most primes.

How do you want to proceed?

ashawley · 2020-03-09T21:32:59Z

I agree, there is a limit to how quickly large values can be made. There is a lot of allocation happening in BigInt (or Java's BigInteger).

Here's a property test with no assertions and just a constant generator of the largest BigInt.
FWIW, it takes about 40 seconds on my computer (100 times).

property("const(BigInt.MaxValue)") = {
  Prop.forAll(Gen.const(BigInt(2).pow(Int.MaxValue - 1))) {
    true
  }
}

There should be a middle ground between coverage and performance. I'm not sure what BigInt operations to use to optimize this and get a reasonable amount of variability. I agree that naturally occurring odds and primes shouldn't be excluded.

dmurvihill · 2020-03-17T18:21:53Z

What if we use Seed.long to seed a random.Random object and then use BigInt.apply(numbits, random) to generate the BigInt? I haven't profiled it yet but it relies much more heavily on optimized APIs that run closer to the metal, so it might be closer to the 9ms figure above.

non · 2020-05-06T03:50:34Z

Hey folks, I saw this thread. I'd written code elsewhere to generate random big integers so I whipped this together. I've done some testing by hand but haven't convinced myself that I've got all the bugs shaken out yet, although the basic approach is sound.

I sort of geeked out on performance, here's some ad-hoc benchmarking (in milliseconds) of the mean time to generate (across 10 runs):

lower	upper	mean time to gen
(-2).pow(7)	2.pow(7)	0.01 ms
(-2).pow(255)	2.pow(255)	0.01 ms
(-2).pow(8191)	2.pow(8191)	0.06 ms
(-2).pow(262143)	2.pow(262143)	1.19 ms
(-2).pow(2097151)	2.pow(2097151)	6.22 ms

(It's possible the real times would be faster with proper warmup. These numbers just came from the REPL after I did a few runs to warm up the code.)

Here's a code excerpt:

    /**
     * Generate a random BigInt within [lower, lower + span).
     *
     * Note that unlike the choose method, whose bounds are inclusive,
     * this method's upper bound is exclusive. We determine how many
     * random bits we need (bitLen), and then round up to the nearest
     * number of bytes (byteLen). We generate the bytes, possibly
     * truncating the most significant byte (bytes(0)) if bitLen is
     * not evenly-divisible by 8.
     *
     * Finally, we check to see if the BigInt we ended up with is in
     * our range. If it is not, we restart this method. The likelihood
     * of needing to restart depends on span. In the worst case we
     * have almost a 50% chance of this (which occurs when span is a
     * power of 2 + 1) and in the best case we never restart (which
     * occurs when span is a power of 2).
     */
    private def genBig(lower: BigInt, span: BigInt, seed0: Seed): (BigInt, Seed) = {
      val bitLen = span.bitLength
      val byteLen = (bitLen + 7) / 8
      val bytes = new Array[Byte](byteLen)
      var seed = seed0
      var i = 0
      while (i < bytes.length) {

        // generate a random long value (i.e. 8 random bytes)
        val (x0, seed1) = seed.long
        var x = x0
        seed = seed1

        // extract each byte in turn and add them to our byte array
        var j = 0
        while (j < 8 && i < bytes.length) {
          val b = (x & 0xff).toByte
          bytes(i) = b
          x = x >>> 8
          i += 1
          j += 1
        }
      }

      // we may not need all 8 bits of our most significant byte. if
      // not, mask off any unneeded upper bits.
      val bitRem = bitLen & 7
      if (bitRem != 0) {
        val mask = 0xff >>> (8 - bitRem)
        bytes(0) = (bytes(0) & mask).toByte
      }

      // construct a BigInteger and see if its valid. if so, we're
      // done. otherwise, we need to restart using our new seed.
      val big = new BigInteger(1, bytes)
      if (big.compareTo(span) < 0) {
        (BigInt(big) + lower, seed)
      } else {
        genBig(lower, span, seed)
      }
    }

    implicit val chooseBigInt: Choose[BigInt] =
      new Choose[BigInt] {
        def choose(low: BigInt, high: BigInt): Gen[BigInt] = {
          val span = high - low + 1
          gen { (p, seed0) =>
            val (big, seed1) = genBig(low, span, seed0)
            r(Some(big), seed1)
          }
        }
      }

non · 2020-05-06T13:45:50Z

@dmurvihill @ashawley Apologies for jumping here out of the blue after a long absence. Would you be happier if I continued developing this code and submitted an independent PR, or should I let you all take over? I think the approach I'm using here is going to be fast enough and be able to support any reasonable range of BigInt values. (The values I tested with are so big that if you accidentally print them or call .toString the time to render them massively dominates any actual work.)

ashawley · 2020-05-06T19:19:12Z

Sorry, that I lost track of this pull request. I'm happy you're here to weigh in, @non. One possibility for moving forward is for @dmurvihill to get the code here to compile and make the build green, then we could merge it. This code contains an implementation and tests that could then be further refined.

This PR had a lot of good investigation and discussion, but further investigation and improvements could be done in a follow-up PR (or PRs).

I'll defer to @dmurvihill's wishes.

ashawley · 2020-05-26T14:09:10Z

We haven't heard back from @dmurvihill. I'm going to try and fix the compiler error...

ashawley · 2020-05-26T14:26:35Z

We'll take up the improvements in #664.

dmurvihill · 2020-06-08T22:18:13Z

Hey folks, thanks for carrying this through.

Add choose big decimal

bcf748a

ashawley reviewed Mar 2, 2020

View reviewed changes

jvm/src/test/scala/org/scalacheck/GenSpecification.scala Outdated Show resolved Hide resolved

Finish Choose[BigInt] test and implementation

ea070ce

dmurvihill changed the title ~~Add choose[BigInt] (incomplete)~~ Add choose[BigInt] Mar 3, 2020

dmurvihill mentioned this pull request Mar 3, 2020

Add Choose[BigDecimal] #637

Closed

Test Choose[BigInt] with bigger limits

e27c145

dmurvihill added 2 commits March 5, 2020 12:28

Move BigInt creation from Seed to Gen

8ca04d1

Restore previous alignment of unrelated lines

5e73014

Improve choose[BigInt] performance

4a522dc

Fix Choose[BigInt] with 0 range

ae25519

ashawley mentioned this pull request May 6, 2020

Add Choose[BigInt] #631

Closed

Fix compiler error

5da0cd4

ashawley mentioned this pull request May 26, 2020

Improve choose[BigInt] #664

Closed

ashawley merged commit 433b7a2 into typelevel:master May 26, 2020

dmurvihill deleted the choose-big-int branch June 8, 2020 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add choose[BigInt] #636

Add choose[BigInt] #636

dmurvihill commented Mar 2, 2020

dmurvihill commented Mar 3, 2020

dmurvihill commented Mar 4, 2020 •

edited

Loading

ashawley commented Mar 4, 2020

ashawley commented Mar 4, 2020

dmurvihill commented Mar 5, 2020

dmurvihill commented Mar 5, 2020 •

edited

Loading

ashawley commented Mar 6, 2020

ashawley commented Mar 6, 2020

dmurvihill commented Mar 6, 2020 •

edited

Loading

ashawley commented Mar 6, 2020

ashawley commented Mar 6, 2020

dmurvihill commented Mar 6, 2020

ashawley commented Mar 7, 2020

ashawley commented Mar 7, 2020

dmurvihill commented Mar 9, 2020

dmurvihill commented Mar 9, 2020 •

edited

Loading

ashawley commented Mar 9, 2020

dmurvihill commented Mar 17, 2020

non commented May 6, 2020 •

edited

Loading

non commented May 6, 2020

ashawley commented May 6, 2020

ashawley commented May 26, 2020

ashawley commented May 26, 2020

dmurvihill commented Jun 8, 2020

Add choose[BigInt] #636

Add choose[BigInt] #636

Conversation

dmurvihill commented Mar 2, 2020

dmurvihill commented Mar 3, 2020

dmurvihill commented Mar 4, 2020 • edited Loading

ashawley commented Mar 4, 2020

ashawley commented Mar 4, 2020

dmurvihill commented Mar 5, 2020

dmurvihill commented Mar 5, 2020 • edited Loading

ashawley commented Mar 6, 2020

ashawley commented Mar 6, 2020

dmurvihill commented Mar 6, 2020 • edited Loading

ashawley commented Mar 6, 2020

ashawley commented Mar 6, 2020

dmurvihill commented Mar 6, 2020

ashawley commented Mar 7, 2020

ashawley commented Mar 7, 2020

dmurvihill commented Mar 9, 2020

dmurvihill commented Mar 9, 2020 • edited Loading

ashawley commented Mar 9, 2020

dmurvihill commented Mar 17, 2020

non commented May 6, 2020 • edited Loading

non commented May 6, 2020

ashawley commented May 6, 2020

ashawley commented May 26, 2020

ashawley commented May 26, 2020

dmurvihill commented Jun 8, 2020

dmurvihill commented Mar 4, 2020 •

edited

Loading

dmurvihill commented Mar 5, 2020 •

edited

Loading

dmurvihill commented Mar 6, 2020 •

edited

Loading

dmurvihill commented Mar 9, 2020 •

edited

Loading

non commented May 6, 2020 •

edited

Loading