Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigInt Fixes for Issues in Violet Tests #262

Open
wants to merge 7 commits into
base: biginteger
Choose a base branch
from

Conversation

mgriebling
Copy link

@mgriebling mgriebling commented Apr 19, 2023

This update fixes all known issues published by @LiarPrincess (see various pull requests for details)
Improves the LosslessStringConvertible init() run time by 35X (radix 10) and 310X (radix 16).
Improves the CustomStringConvertible toString() run time by 20X (radix 10) and 90X (radix 16).
Test suite has been updated to include tests for all known issues.
Latest update removes Foundation import and fixes additional problems noted by @LiarPrincess.

…de faster performance.

Removed unnecessary _digits and _digitRadix.
Fixed a cut/paste problem in "^=" function.
Added test cases for string conversion performance and logical operation tests.
…d into a negative.

Fixed division/remainder signs after a multi-word division.
…ersions by 35X to 310X times over original code.

New implementation for CustomStringConvertible toString(); speeds conversions by 20X to 90X times over original code.
@@ -9,6 +9,8 @@
//
//===----------------------------------------------------------------------===//

import Foundation
Copy link
Member

@stephentyrone stephentyrone Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numerics can't have a dependency on Foundation for layering reasons. Is it actually used anywhere in this change, or did it just sneak in on accident?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed the log() function to do a quick digit count.
I'll see if this can be done another way.

Copy link

@LiarPrincess LiarPrincess Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violet does it differently (without log). It uses magic numbers from V8 (google javascrtipt thinge). It is a really good approximation for the upper bound.

Unfortunately it is an upper bound, so sometimes it has a tiny overallocation. This is why after the filling of the buffer there is a if index != -1 check (with comment 'Check for invalid capacity estimation'). If we overallocate then we have to move the whole result (because we are filling from the back). Overallocation is not a problem (because it is tiny), the shift may be because of performance.

Btw. just so we are clear: I confirm that for radix 10 (most common case) this overallocation happens, which means that sometime we have to shift. But even with this penalty the performance is much better than the current main branch.

Btw2. There is a way to avoid this shifting. In Violet if your platform does not have String(unsafeUninitializedCapacity: capacity, initializingUTF8With: initializer) then we implement our own shim. At the end of this shim we call String(cString: ptr) // Borrows 'ptr' to create owned copy.. To avoid shifting we could just increment the ptr so that the buffer starts exactly where the digits start. Though, obviously if your platform has String(unsafeUninitializedCapacity: capacity, initializingUTF8With: initializer) then you have to correctly fill the provided buffer. Anyway it does not really matter...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LiarPrincess for all your comments. Hope you get better soon.
@stephentyrone, I've removed the Foundation dependency by using a table.

@LiarPrincess
Copy link

LiarPrincess commented Apr 26, 2023

Just for reference, original Violet code:

This is XsProMax version (the one without the inlined Int32), according to #256 [BigInt tests][No merge] rabbit How NOT to write performance tests it is slightly faster than standard Violet.

Violet versions have more comments than this PR, so it may be easier to read.


I don't want to start a formal review, because I have not compiled your code, but:

  • in public func toString(radix: Int, uppercase: Bool) -> String

    What if UInt.bitWidth = 64 and radix = 8?
    Then bitsPerChar = 3, so a single word contains 64/3=21 full groups. What happens to the remaining bit? It should be 'remembered' and used with high bits of the next word. I think that the current implementation just forgets it, but I may be wrong.

    See the radixIsPowerOfTwo_charCanUseMultipleWords in original Violet implementation.

  • in public func toString(radix: Int, uppercase: Bool) -> String

    The if #available(macOS 11.0, *) checks could be a little bit more robust like in Violet. Is there anything wrong with Violet version? I think that at least iOS should be also covered with fast path.

  • in public init?(_ description: String, radix: Int = 10)

    There is also string.withUTF8 which would allow you to use non-generic code.
    In Violet there is a comment why generic version is chosen and what is the performance trade-off.
    Is this just a copy-paste from Violet or conscious decision?

  • You report that string parsing when radix is a power of 2 is 310x faster than the old version. According to my performance tests Violet goes up to 345x, so it is basically the same.

    On the other hand, decimal parsing can be 190x faster and you report 35x. Is that because addition and multiplication in Violet are faster (4x and 1.4x respectively) or is there some other reason? You can always drop into UnsafeBuffer for performance. Or even write specialized AddMulBuffer, similar how Violet has a DivBuffer in toString.

    Btw. Violet performance tests are for both positive and negative numbers. You only test positive and with 2 complement negation means another pass. Though it is not that bad, because in the same operation we do division which is even more expensive.

It would be nice to run all of the Violet tests from all of the PRs on this code to check if there is something else hiding. I am still under covid, and with my mental-potato-state I would do more harm than good. Even writing this comment took me >1h. I may be able to do this during the weekend.

@mgriebling
Copy link
Author

mgriebling commented Apr 26, 2023

  • in public func toString(radix: Int, uppercase: Bool) -> String
    What if UInt.bitWidth = 64 and radix = 8?
    Then bitsPerChar = 3, so a single word contains 64/3=21 full groups. What happens to the remaining bit? It should be 'remembered' and used with high bits of the next word. I think that the current implementation just forgets it, but I may be wrong.
    See the radixIsPowerOfTwo_charCanUseMultipleWords in original Violet implementation.

Yes, mea culpa, I thought I reviewed the different binary cases and all divided evenly into 64.
As you point out, radix 8 (and 32) are oddballs with 3 (and 5) bits.
I don't have covid as an excuse -- just getting old. 😉 I'll fix the code.

  • in public func toString(radix: Int, uppercase: Bool) -> String
    The if #available(macOS 11.0, *) checks could be a little bit more robust like in Violet. Is there anything wrong with Violet version? I think that at least iOS should be also covered with fast path.

No, I was focused on the performance. I'm sure Apple has a tool to do the availability checks for us.

  • in public init?(_ description: String, radix: Int = 10)
    There is also string.withUTF8 which would allow you to use non-generic code.
    In Violet there is a comment why generic version is chosen and what is the performance trade-off.
    Is this just a copy-paste from Violet or conscious decision?

The most common radix in use is 10, so, while there are performance improvements in binary radices, it didn't seem worth all the code.

  • You report that string parsing when radix is a power of 2 is 310x faster than the old version. According to my performance tests Violet goes up to 345x, so it is basically the same.
    On the other hand, decimal parsing can be 190x faster and you report 35x. Is that because addition and multiplication in Violet are faster (4x and 1.4x respectively) or is there some other reason? You can always drop into UnsafeBuffer for performance. Or even write specialized AddMulBuffer, similar how Violet has a DivBuffer in toString.
    Btw. Violet performance tests are for both positive and negative numbers. You only test positive and with 2 complement negation means another pass. Though it is not that bad, because in the same operation we do division which is even more expensive.

I agree, I would love to spent all my days making this faster, but this seems quite good compared to what it was.

@LiarPrincess
Copy link

LiarPrincess commented Apr 27, 2023

Hmm... I can't compile the code from this PR:

/mnt/Storage/Programming/TEST/swift-numerics/Sources/BigIntModule/BigInt.swift:387:20: error: value of type 'UnsafeMutablePointer<UInt8>' has no member 'update'
            dstPtr.update(from: srcPtr, count: count)
            ~~~~~~ ^~~~~~

More or less here:

if index != -1 {
  count = stringMaxSize - index - 1
  let dstPtr = buffer.baseAddress!
  let srcPtr = dstPtr.advanced(by: index+1)
  dstPtr.update(from: srcPtr, count: count) // <-----
  dstPtr[count] = 0
}

In Violet this code looks like this:

// Check for invalid capacity estimation
if index != -1 {
  count = capacity - index - 1
  let dstPtr = ptr.baseAddress!
  let srcPtr = dstPtr.advanced(by: index + 1)
  dstPtr.assign(from: srcPtr, count: count) // <-----
  // For some reason we have to do this on macOS,
  // otherwise the tests will fail (Swift 5.5.2).
  dstPtr[count] = 0
}

So, the assign(from:count:) was changed to update(from:count:). From what I see assign(from:count:) is now depreciated, and renamed to update(from:count:). Internally it is the same thing:

@inlinable
@_alwaysEmitIntoClient
@available(*, deprecated, renamed: "update(from:count:)")
@_silgen_name("_swift_se0370_UnsafeMutablePointer_assign_from_count")
public func assign(from source: UnsafePointer<Pointee>, count: Int) {
  update(from: source, count: count)
}

Platforms:

  • Ubuntu 22.04.1 LTS 64-bit + Swift 5.7.3 (swift-5.7.3-RELEASE) from Swift.org

  • macOS BigSur (11.7.4) + Xcode 13.2.1

    swift --version
    swift-driver version: 1.26.21 Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)
    Target: x86_64-apple-macosx11.0
    

In both cases I am using swift test --enable-test-discovery to run.


Btw. If you want to have swift-numerics with all of the Violet PRs merged then you can use this ultra sophisticated script. Note that those PRs only add tests and do not modify the sources of the actual BigInt, so you can just copy the test files to your build without any conflicts. Or we can just wait, I may find some time over the weekend to look more closely.

git clone https://github.com/LiarPrincess/swift-numerics.git
cd swift-numerics

git fetch origin biginteger:biginteger
git fetch origin 0-Node:0-Node
git fetch origin 1-Init-from-int:1-Init-from-int
git fetch origin 2-String:2-String
git fetch origin 3-Comparable-equatable-hashable:3-Comparable-equatable-hashable
git fetch origin 4-Unary-PLUS-MINUS-INVERT:4-Unary-PLUS-MINUS-INVERT
git fetch origin 5-Binary-ADD-SUB:5-Binary-ADD-SUB
git fetch origin 6-Binary-MUL-DIV-MOD:6-Binary-MUL-DIV-MOD
git fetch origin 7-Binary-AND-OR-XOR:7-Binary-AND-OR-XOR
git fetch origin 8-Binary-SHIFTS:8-Binary-SHIFTS
git fetch origin 9-Int-properties:9-Int-properties
git fetch origin 10-Copy-on-write:10-Copy-on-write
git fetch origin 11-Property-based-tests:11-Property-based-tests
git fetch origin 12-Apple:12-Apple
git fetch origin 13-Init-from-float:13-Init-from-float
git fetch origin 13-Performance:13-Performance
git fetch origin 15-Codable:15-Codable

git checkout biginteger

git merge --commit --no-edit 0-Node
git merge --commit --no-edit 1-Init-from-int
git merge --commit --no-edit 2-String
git merge --commit --no-edit 3-Comparable-equatable-hashable
git merge --commit --no-edit 4-Unary-PLUS-MINUS-INVERT
git merge --commit --no-edit 5-Binary-ADD-SUB
git merge --commit --no-edit 6-Binary-MUL-DIV-MOD
git merge --commit --no-edit 7-Binary-AND-OR-XOR
git merge --commit --no-edit 8-Binary-SHIFTS
git merge --commit --no-edit 9-Int-properties
git merge --commit --no-edit 10-Copy-on-write
git merge --commit --no-edit 11-Property-based-tests
git merge --commit --no-edit 12-Apple
git merge --commit --no-edit 13-Init-from-float
git merge --commit --no-edit 13-Performance
git merge --commit --no-edit 15-Codable

@LiarPrincess
Copy link

LiarPrincess commented Apr 27, 2023

Oki, when I change update(from:count:) to assign(from:count:) I can run the tests. Not sure what is going on...

Then if we run the full Violet test suite we will get a bunch of failures in BitWidthTests and TrailingZeroBitCountTests. Those are expected, I marked them as a 'rabbit Discussion' in #252 [BigInt tests] rabbit Int properties.

The only other failure is in InitFromBinaryFloatingPoint754Tests:

  // swift test --enable-test-discovery --filter=BigIntTests.InitFromBinaryFloatingPoint754Tests/test_float80_spiderman
  // Skipping the: #if (arch(i386) || arch(x86_64)) && !os(Windows) && !os(Android)
  func test_float80_spiderman() {
    let d: Float80 = 9223372036854775807.5
    let expected: BigInt = BigInt("9223372036854775807")!

    let result = BigInt(d)
    if result != expected {
      print("d:        \(d)")
      print("expected: \(expected)")
      print("result:   \(result)")
    }
  }

This will print:

d:        9223372036854775807.5
expected: 9223372036854775807
result:   9223372036854775807

spidey-meme

Btw. 9223372036854775807.5 is a special number for Float80, because UInt64(1) << Float80.significandBitCount = 9223372036854775808 which is the 1st number where spacing between numbers (away from 0) is 1. In other words 9223372036854775807.5 is the highest number where the next number is 0.5 away.

print(Float80.significandBitCount) // 63 - well *teknikly* there is a hidden bit, but whatever...
let int = UInt64(1) << Float80.significandBitCount
let d = Float80(int)
print("int:             ", int)        // 9223372036854775808
print("Float80:         ", d)          // 9223372036854775808.0
print("Float80.nextDown:", d.nextDown) // 9223372036854775807.5 - goes down by 0.5 - our test case
print("Float80.nextUp:  ", d.nextUp)   // 9223372036854775809.0 - goes up by 1

I do not think this matters... it is just a fancy floating point trick. This does not happen for Float and Double.

Though, I don't think it has to be fixed in this PR. @mgriebling already did a lot of work. It can wait until we merge Violet tests (*if ever*), then we will deal with this failure.

Added an additional check to isValidPowerOfTwoRadix so odd numbers of bits (except radix 2) don't use the fast path in toString.
Now use a table lookup to determine the radix number bits.
Added a check for iOS for StringLiteralType.  Tested code under iOS and macOS under simulator and x86
@stephentyrone
Copy link
Member

// well teknikly there is a hidden bit, but whatever...

A digression: Float80 doesn't have a hidden significand bit; the leading bit is stored explicitly, unlike all other common IEEE 754 formats (https://en.wikipedia.org/wiki/Extended_precision).

significandBitCount is defined to be the number of fractional bits in the significand (i.e., it does not include the leading bit, which is implicit in every other format)--its the number of bits that are actually stored in the representation of Float16, Float, and Double (https://developer.apple.com/documentation/swift/binaryfloatingpoint/significandbitcount).

For fixed-width floating-point types, this is the actual number of fractional significand bits.

And we follow this pattern for Float80 even though the extra bit is stored in that format:

Note that Float80.significandBitCount is 63, even though 64 bits are used to store the significand in the memory representation of a Float80 (unlike other floating-point types, Float80 explicitly stores the leading integral significand bit, but the BinaryFloatingPoint APIs provide an abstraction so that users don’t need to be aware of this detail).

@LiarPrincess
Copy link

LiarPrincess commented Apr 27, 2023

Yep, I know.

This is the reason why we can just be generic over 'floating point' and not 'floating point with a special case for Float80'.

In the comment I wanted to point that 'there is a hidden bit' unlike in other types, though I may have worded that incorrectly/imprecisely. Maybe // well *teknikly* there IS a hidden bit, but whatever... would be better (uppercase IS).

Also, there are usual checks for the existence of Float80. This code snippet does not contain them, because I wanted to simplify. Obviously, the Violet PR (#258 [BigInt tests] skull Init from float 754) contains the #if thingie.


Anyway, the reason why this code snippet does not do what I would expect is:

result.words:   [9223372036854775807, 0]
expected.words: [9223372036854775807]

Then == checks if the words are equal -> they are not -> we enter if body. I may be wrong somewhere, but if the String representations are equal then I would expect the == to hold (for BigInt).


Btw. if we change == definition to:

  @inlinable
  public static func == (lhs: BigInt, rhs: BigInt) -> Bool {
    print("lhs: \(lhs.words)")
    print("rhs: \(rhs.words)")
    return lhs.words == rhs.words
  }

We will get (just look at the last line):

/mnt/Storage/Programming/TEST/swift-numerics (biginteger*) swift test --enable-test-discovery --filter=BigIntTests.InitFromBinaryFloatingPoint754Tests/test_initFromBinaryFloatingPoint754_Float80 > out.txt
warning: 'swift-numerics': found 4 file(s) which are unhandled; explicitly declare them as resources or exclude from the target
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Helpers/StringTestCases.generated.py
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Performance/PerformanceTests.generated.py
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Nodejs/NodeTests.generated.js
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Nodejs/README.md

Building for debugging...
[12/12] Linking swift-numericsPackageTests.xctest
Build complete! (6.22s)
error: malformed

I have not tested this extensively, but any print statement (for example print("ala")) would do this. I did clean and rm -rf .build. I don't think this matters that much for BigInt, it is not like we *actually* want to print those values on every ==.

Ubuntu platform that I mentioned above.

@LiarPrincess
Copy link

Ugh...

I forgot to mention that the compilation failure (rename from UnsafeMutablePointer.assign(from:count:) to UnsafeMutablePointer.update(from:count:)) was caused by SE-0370: Pointer Family Initialization Improvements and Better Buffer Slices. This change was specifically mentioned in 'Source compatibility' and 'Alternatives considered'.

I can't find in the proposal the exact version of Swift/stdlib where change was implemented (so that we can #if on it). It is also not mentioned in the code:

@inlinable
@_alwaysEmitIntoClient
@available(*, deprecated, renamed: "update(from:count:)")
@_silgen_name("_swift_se0370_UnsafeMutablePointer_assign_from_count")
public func assign(from source: UnsafePointer<Pointee>, count: Int) {
  update(from: source, count: count)
}

I can confirm that:

  • ✅ compilation succeeds:

    • Ubuntu 22.04.1 LTS 64-bit + Swift 5.8 (swift-5.8-RELEASE-ubuntu22.04) from Swift.org
  • ❌ compilation fails:

    • Ubuntu 22.04.1 LTS 64-bit + Swift 5.7.3 (swift-5.7.3-RELEASE) from Swift.org
    • macOS BigSur (11.7.4) + Xcode 13.2.1

Swift 5.8 Release notes mention SE-0370 in 'Swift Evolution Appendix'. Though, I can't 100% confirm that #if swift(>=5.8) is all we need.

@LiarPrincess
Copy link

The whole Vapor/Fibonacci/BigInt thingie on the forum reminded me of this PR. Just so that we are clear: some parts of this PR are uncomfortably similar to Violet code, and I’m not sure if I’m fine with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants