Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add choose[BigInt] #636

Merged
merged 8 commits into from
May 26, 2020
Merged

Add choose[BigInt] #636

merged 8 commits into from
May 26, 2020

Conversation

dmurvihill
Copy link
Contributor

This is my partial implementation of Choose[BigInt], as we discussed on Friday (#631). I'd like some feedback before I proceed with BigDecimal.

One major issue. The new property check does not seem to be executing with sbt test. This can be verified by replacing the chooseBigInt.choose() method with ??? and observing that all of the tests still pass. Any ideas?

@dmurvihill dmurvihill changed the title Add choose[BigInt] (incomplete) Add choose[BigInt] Mar 3, 2020
@dmurvihill
Copy link
Contributor Author

Thanks for the debugging help! I've just pushed the finished implementation. Look for a separate branch with BigDecimal. Anything else to change?

@dmurvihill dmurvihill mentioned this pull request Mar 3, 2020
@dmurvihill
Copy link
Contributor Author

dmurvihill commented Mar 4, 2020

Travis seems to have failed due to java.lang.NoClassDefFoundError: sbt/internal/util/ConsoleOut$. Configuration error in the CI server? I don't think I added any console output in this PR.

@ashawley
Copy link
Contributor

ashawley commented Mar 4, 2020

Yeah, Travis has intermittent failures it seems.

@ashawley
Copy link
Contributor

ashawley commented Mar 4, 2020

I see you reached in to the seed generator to add BigInt support. Is that necessary?

Would this internal implementation of chBigInt be sufficient (it's missing bound checks)?

    private def chBigInt(l: BigInt, h: BigInt)(p: P, seed: Seed): R[BigInt] = {
      var (n, s) = seed.double
      val x = (BigDecimal(h - l) * BigDecimal(n)).toBigInt + l
      r(Some(x), s)
    }

@dmurvihill
Copy link
Contributor Author

The code that's in the seed generator definitely can be moved out of it. Stand by.

@dmurvihill
Copy link
Contributor Author

dmurvihill commented Mar 5, 2020

Ok, BigInt generation has been moved out of Seed. Since there is more than one helper function added to Gen, I have made them all private members of the chooseBigInt object. Let me know if there is a different organization that you prefer.

I chose this implementation instead of the one offered above because it can generate all values between l and h. Unfortunately, we can't choose a 2,147,483,647 bit number with a single call to a 64-bit RNG.

@ashawley
Copy link
Contributor

ashawley commented Mar 6, 2020

Thanks for doing that. Most types can have their values be built out of existing generators and one or more rng numbers of long or double. There's usually no need to add to the rng implementation.

@ashawley
Copy link
Contributor

ashawley commented Mar 6, 2020

Indeed, 64 bits of entropy can't cover the range of values of a possible 2^31 bit value. However, there's a cost to full coverage. We need to weigh coverage with the need to have property tests run fast. Values need to be generated in constant or linear time.

I haven't tried to grok your implementation for BigInt, but it does seem to take a long time to generate values. I don't know a lot about BigInt, and I haven't profiled whether the time is because BigInt is slow for large values, or if it's your implementation. On my computer, this test takes 15 to 20s:

property("Gen.choose(BigInt( 2^(2^18 - 1)), BigInt(-2^(2^18 - 1)))") = {
  val (l, h) = (BigInt(-2).pow(262143),
                BigInt( 2).pow(262143))
  Prop.forAllNoShrink(Gen.choose(l, h)) { x =>
    l <= x && x <= h
  }
}

Upping it to 2^19 bit number takes over a minute. Can we improve this? Is there an upper limit to the scale of values we can reasonably expect to be able to support?

@dmurvihill
Copy link
Contributor Author

dmurvihill commented Mar 6, 2020

This implementation is linear in the range's bit length (or logarithmic in its size) but it creates rather a lot of Java objects in each iteration. I'll see if I can save us some time.

Could you share the workflow you used to run your test property? Did you create a separate project or do you have a command that can be run from the SBT console?

@ashawley
Copy link
Contributor

ashawley commented Mar 6, 2020

Sure, I'll send some pointers to my workflow in a private email. Our build is complex and isn't very well documented. Apologies for that.

@ashawley
Copy link
Contributor

ashawley commented Mar 6, 2020

Yeah, it looks like the implementation is linear for bit-size. Maybe we can try to sacrifice complete coverage a little bit for better performance.

@dmurvihill
Copy link
Contributor Author

The example you sent me, with l=BigInt(-2).pow(524287) and h=BigInt(2).pow(524287), took about 20s on my machine.

A new implementation has been pushed that feeds containerOfN[Array, Long] into a Java ByteBuffer, instantiating a constant number of Java objects. Here are the numbers from my machine:

l=BigInt(-2).pow(524287), h=BigInt(2).pow(524287) (2^19 bit numbers): 0.88s
l=BigInt(-2).pow(1073741825), h=BigInt(2).pow(1073741824) (2^30 bit numbers): 1,157 s (19m 14s)

@ashawley
Copy link
Contributor

ashawley commented Mar 7, 2020

That's encouraging. Is it possible to support the same performance for 2^19 bits as 2^30 bits?

@ashawley
Copy link
Contributor

ashawley commented Mar 7, 2020

It looks like there is an intermittent failure of one of the tests.

[info] ! Gen.choose-big-int: Exception raised on property evaluation.
[info] > ARG_0: 0
[info] > ARG_1: 0
[info] > Exception: java.lang.ArrayIndexOutOfBoundsException: 0

You may want to disable shrinking by using forAllNoShrink for this test. It may or may not be hiding the underlying issue. Although, disabling shrinking will print huge bigints and therefore take a long time to fail.

@dmurvihill
Copy link
Contributor Author

Thanks for catching. The test failure occurs when the input numbers satusfy low == high. Since they are produced by nonEmptyContainerOf[Array, Byte](arbitrary[Byte]).map(BigInt(_)), I guess it's relatively rare that they would produce the same number (and mostly n=0).

@dmurvihill
Copy link
Contributor Author

dmurvihill commented Mar 9, 2020

Here are some performance test results.

Test:

  property("choose(BigInt( 2^(2^18 - 1)), BigInt(-2^(2^18 - 1)))") = {
    val numBitsOfBits = 21
    val l = BigInt(-2).pow((2 << numBitsOfBits) - 1)
    val h = BigInt(2).pow((2 << numBitsOfBits) - 1)
    Prop.forAllNoShrink(Gen.choose(l, h)) { x =>
      true
//      l <= x && x <= h
    }
  }

The current implementation has two phases:

  1. use containerOfN to generate the appropriate number of Longs
  2. Write the Longs to a Java ByteBuffer and feed that to BigInt.apply.

Mean timing for 5 runs of 100 tests each:
Current version (commit ae25519): 4.4s
Phase 1 only (phase 2 always returs min): 2.3s
Phase 2 only (phase 1 fills only with Long.MinValue): 2.3s

time head -c 262144 /dev/urandom > /dev/null

real	0m0.009s
user 0m0.001s
sys	0m0.004s

So I think we are running up against significant JVM overhead here.

In any case, going from 2^19 bits to 2^30 bits will take about 2,048 times as long no matter what we do. Note that:

  1. People who need generators for numbers exceeding 2^20 bits are extremely rare, if they even exist, and are probably willing to pay the price for a high-quality generator
  2. Unless we do something very clever, any reduction in resolution will remove important cases like odd numbers and most primes.

How do you want to proceed?

@ashawley
Copy link
Contributor

ashawley commented Mar 9, 2020

I agree, there is a limit to how quickly large values can be made. There is a lot of allocation happening in BigInt (or Java's BigInteger).

Here's a property test with no assertions and just a constant generator of the largest BigInt.
FWIW, it takes about 40 seconds on my computer (100 times).

property("const(BigInt.MaxValue)") = {
  Prop.forAll(Gen.const(BigInt(2).pow(Int.MaxValue - 1))) {
    true
  }
}

There should be a middle ground between coverage and performance. I'm not sure what BigInt operations to use to optimize this and get a reasonable amount of variability. I agree that naturally occurring odds and primes shouldn't be excluded.

@dmurvihill
Copy link
Contributor Author

What if we use Seed.long to seed a random.Random object and then use BigInt.apply(numbits, random) to generate the BigInt? I haven't profiled it yet but it relies much more heavily on optimized APIs that run closer to the metal, so it might be closer to the 9ms figure above.

@non
Copy link
Contributor

non commented May 6, 2020

Hey folks, I saw this thread. I'd written code elsewhere to generate random big integers so I whipped this together. I've done some testing by hand but haven't convinced myself that I've got all the bugs shaken out yet, although the basic approach is sound.

I sort of geeked out on performance, here's some ad-hoc benchmarking (in milliseconds) of the mean time to generate (across 10 runs):

lower upper mean time to gen
(-2).pow(7) 2.pow(7) 0.01 ms
(-2).pow(255) 2.pow(255) 0.01 ms
(-2).pow(8191) 2.pow(8191) 0.06 ms
(-2).pow(262143) 2.pow(262143) 1.19 ms
(-2).pow(2097151) 2.pow(2097151) 6.22 ms

(It's possible the real times would be faster with proper warmup. These numbers just came from the REPL after I did a few runs to warm up the code.)

Here's a code excerpt:

    /**
     * Generate a random BigInt within [lower, lower + span).
     *
     * Note that unlike the choose method, whose bounds are inclusive,
     * this method's upper bound is exclusive. We determine how many
     * random bits we need (bitLen), and then round up to the nearest
     * number of bytes (byteLen). We generate the bytes, possibly
     * truncating the most significant byte (bytes(0)) if bitLen is
     * not evenly-divisible by 8.
     *
     * Finally, we check to see if the BigInt we ended up with is in
     * our range. If it is not, we restart this method. The likelihood
     * of needing to restart depends on span. In the worst case we
     * have almost a 50% chance of this (which occurs when span is a
     * power of 2 + 1) and in the best case we never restart (which
     * occurs when span is a power of 2).
     */
    private def genBig(lower: BigInt, span: BigInt, seed0: Seed): (BigInt, Seed) = {
      val bitLen = span.bitLength
      val byteLen = (bitLen + 7) / 8
      val bytes = new Array[Byte](byteLen)
      var seed = seed0
      var i = 0
      while (i < bytes.length) {

        // generate a random long value (i.e. 8 random bytes)
        val (x0, seed1) = seed.long
        var x = x0
        seed = seed1

        // extract each byte in turn and add them to our byte array
        var j = 0
        while (j < 8 && i < bytes.length) {
          val b = (x & 0xff).toByte
          bytes(i) = b
          x = x >>> 8
          i += 1
          j += 1
        }
      }

      // we may not need all 8 bits of our most significant byte. if
      // not, mask off any unneeded upper bits.
      val bitRem = bitLen & 7
      if (bitRem != 0) {
        val mask = 0xff >>> (8 - bitRem)
        bytes(0) = (bytes(0) & mask).toByte
      }

      // construct a BigInteger and see if its valid. if so, we're
      // done. otherwise, we need to restart using our new seed.
      val big = new BigInteger(1, bytes)
      if (big.compareTo(span) < 0) {
        (BigInt(big) + lower, seed)
      } else {
        genBig(lower, span, seed)
      }
    }

    implicit val chooseBigInt: Choose[BigInt] =
      new Choose[BigInt] {
        def choose(low: BigInt, high: BigInt): Gen[BigInt] = {
          val span = high - low + 1
          gen { (p, seed0) =>
            val (big, seed1) = genBig(low, span, seed0)
            r(Some(big), seed1)
          }
        }
      }

@non
Copy link
Contributor

non commented May 6, 2020

@dmurvihill @ashawley Apologies for jumping here out of the blue after a long absence. Would you be happier if I continued developing this code and submitted an independent PR, or should I let you all take over? I think the approach I'm using here is going to be fast enough and be able to support any reasonable range of BigInt values. (The values I tested with are so big that if you accidentally print them or call .toString the time to render them massively dominates any actual work.)

@ashawley ashawley mentioned this pull request May 6, 2020
@ashawley
Copy link
Contributor

ashawley commented May 6, 2020

Sorry, that I lost track of this pull request. I'm happy you're here to weigh in, @non. One possibility for moving forward is for @dmurvihill to get the code here to compile and make the build green, then we could merge it. This code contains an implementation and tests that could then be further refined.

This PR had a lot of good investigation and discussion, but further investigation and improvements could be done in a follow-up PR (or PRs).

I'll defer to @dmurvihill's wishes.

@ashawley
Copy link
Contributor

We haven't heard back from @dmurvihill. I'm going to try and fix the compiler error...

@ashawley ashawley mentioned this pull request May 26, 2020
@ashawley
Copy link
Contributor

We'll take up the improvements in #664.

@ashawley ashawley merged commit 433b7a2 into typelevel:master May 26, 2020
@dmurvihill
Copy link
Contributor Author

Hey folks, thanks for carrying this through.

@dmurvihill dmurvihill deleted the choose-big-int branch June 8, 2020 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants