Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intops: core integer primitives #187

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions stew/intops.nim
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
## Core integer primitives suitable as building blocks for higher-level
## functionality such as bigints, saturating integer types etc - where
## applicable, these use compiler builtins - otherwise, they fall back on native
## Nim code that may be less efficient.
##
## In using these functions, it is recommended that you always call the function
## that returns the least information needed - for example, `mulOverflow` may
## be implemented more efficiently than `mulWiden`, meaning that if overflow
## detection is all that is needed, use the former.
##
## The API strives to map functions to platform-specific CPU instructions
## via compiler intrinsics or other compiler/target-specific implementations.
## Where this is not possible, the API instead emulates the instructions - such
## emulation may result in the loss of properies important to some applications
## such as constant-time:ness, atomicity or performance.

# Implementation notes:
#
# * `uintN` is assumed to be wrapping
# * "*Overflow" perform wrapping arithmetic while returning a bool for overflow
# * "*Widen" return full result in multiple words
# * overloads with carry/borrow exposed for chaining limbs
#
# TODO
# * use compiler intrinsics
# * signed ops
# * saturating ops
# * more primitives commonly available on CPU:s / intrinsics (pow / divmod / etc)
# * discovery mechanism to determine implementation quality
#
# References:
# https://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics
# https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
# https://doc.rust-lang.org/std/primitive.u32.html#implementations

func addOverflow*(x, y: SomeUnsignedInt):
tuple[result: SomeUnsignedInt, overflow: bool] =
Copy link
Contributor

@zah zah May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really simplify the implementation of BigInt libraries? If I try to imagine the loop that will be used there, it seems to me that it will be more complicated and less performant when based on this helper function.

In particular, the reliance on a tuple that gets translated to a C struct is what makes me nervous. If the carry is communicated with an output parameter, the compiler is a bit more free to perform register allocations in more optimal ways.

Also, ultimately, the carry should probably be obtained from the CPU itself, but I guess your plan is to replace the bodies of these functions in the future?

Copy link
Member Author

@arnetheduck arnetheduck May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the carry is communicated with an output parameter, the compiler is a bit more free to perform register allocations in more optimal ways.

Typically modern compilers are able to deal with this - also, the fact that it's a return value and not a pointer gives the compiler some freedoms it otherwise doesn't have - in llvm for example, this is typically handled by the SROA pass that decomposes structs into individual elements then assigns registers based on the lifetimes of the fields themselves (which in this case are trivial).

see https://gcc.godbolt.org/z/Ex8P76fWr for an example of how it works with a struct ret type.

These implementations are meant for the VM mainly - the actual (future) implementations would use compiler builtins which unfortunately differ in their actual API between platforms and compilers, but yes, the ideal is that the compiler maps a function like this to its ADC instruction that does a 3-operand addition returning the carry in a flag.

For bigints, the 3-parameter carry form in particular is interesting - for saturating arithmetic, the 2-paremeter version without carry is more appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a bug and improved example: the code now shows that using the builtin actually has no advantage over the no-builtin code in this particular case - both end up using the right ADC instruction for a 192-bit integer for example:

https://gcc.godbolt.org/z/bvW6aTr5a

## Add the two integers using wrapping arithmetic, returning the result and a
## boolean indicating that overflow happened.
##
## When used to construct bigint arithmetic, the overflow flag can be passed
## as carry to the next more significant word.

let r = x + y
(r, r < x)

func addOverflow*(x, y: SomeUnsignedInt, carry: bool):
tuple[result: SomeUnsignedInt, overflow: bool] =
## Add two integers and carry using wrapping arithmetic, returning the
## result and a boolean indicating that overflow happened.
##
## When used to construct bigint arithmetic, the overflow flag can be passed
## as carry to the next more significant word.

let
(a, b) = addOverflow(x, y)
(c, d) = addOverflow(a, typeof(a)(carry))
(c, b or d)

func subOverflow*(x, y: SomeUnsignedInt):
tuple[result: SomeUnsignedInt, overflow: bool] =
## Subtract y and borrow from x using wrapping arithmetic, returning the
## result and a boolean indicating whether overflow happened.

let r = x - y
(r, y > x)

func subOverflow*(x, y: SomeUnsignedInt, borrow: bool):
tuple[result: SomeUnsignedInt, overflow: bool] =
## Subtract y and borrow from x using wrapping arithmetic, returning the
## result and a boolean indicating whether overflow happened.
##
## When used to construct bigint arithmetic, the overflow flag can be passed
## as carry to the next more significant word.

let
(a, b) = subOverflow(x, y)
(c, d) = subOverflow(a, typeof(a)(borrow))
(c, b or d)

func mulWiden*(x, y: uint64): tuple[lo, hi: uint64] =
let
x0 = x and uint32.high
x1 = x shr 32
y0 = y and uint32.high
y1 = y shr 32
p11 = x1 * y1
p01 = x0 * y1
p10 = x1 * y0
p00 = x0 * y0
middle = p10 + (p00 shr 32) + (p01 and uint32.high)
rhi = p11 + (middle shr 32) + (p01 shr 32)
rlo = (middle shl 32) or (p00 and uint32.high)

(rlo, rhi)

func mulWiden*(x, y: uint32): tuple[lo, hi: uint32] =
let r = x.uint64 * y.uint64
(cast[uint32](r and uint32.high), cast[uint32](r shr 32))
func mulWiden*(x, y: uint16): tuple[lo, hi: uint16] =
let r = x.uint32 * y.uint32
(cast[uint16](r and uint16.high), cast[uint16](r shr 16))
func mulWiden*(x, y: uint8): tuple[lo, hi: uint8] =
let r = x.uint16 * y.uint16
(cast[uint8](r and uint8.high), cast[uint8](r shr 8))
func mulWiden*(x, y: uint): tuple[lo, hi: uint] =
## Perform `(x * y)` as if the computiation had been carried out in twice as
## wide a type returning the low and high words.
when sizeof(uint) == sizeof(uint64):
let (a, b) = mulWiden(uint64(x), uint64(y))
else:
let (a, b) = mulWiden(uint32(x), uint32(y))
(uint(a), uint(b))

func mulWiden*(x, y, carry: SomeUnsignedInt): tuple[lo, hi: SomeUnsignedInt] =
## Perform `((x * y) + carry)` as if the computiation had been carried out in
## twice as wide a type returning the low and high words
let
(lo, hi) = mulWiden(x, y)
(a, b) = addOverflow(lo, carry)
# The carry from this overflowing add can be ignored since the result of
# a multiplication always leaves room for adding one more `high`
(c, _) = addOverflow(hi, typeof(hi)(0), b)

(a, c)

func mulOverflow*(x, y: SomeUnsignedInt):
tuple[result: SomeUnsignedInt, overflow: bool] =
## Perform `(x * y)` using wrapping arithmetic, returning the result and a
## boolean indicating that overflow happened.
let
(a, b) = mulWiden(x, y)
(a, b > 0)
1 change: 1 addition & 0 deletions tests/all_tests.nim
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import
test_keyed_queue,
test_sorted_set,
test_interval_set,
test_intops,
test_macros,
test_objects,
test_ptrops,
Expand Down
68 changes: 68 additions & 0 deletions tests/test_intops.nim
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import unittest2

import ../stew/intops

template testAddOverflow[T: SomeUnsignedInt]() =
doAssert addOverflow(T.low, T.low) == (T.low, false)
doAssert addOverflow(T.high, T.low) == (T.high, false)
doAssert addOverflow(T.low, T.high) == (T.high, false)

doAssert addOverflow(T.high, T.high) == (T.high - 1, true)

doAssert addOverflow(T.high, T(0), false) == (T.high, false)
doAssert addOverflow(T.high, T(0), true) == (T(0), true)
doAssert addOverflow(T.high, T.high, true) == (T.high, true)

template testSubOverflow[T: SomeUnsignedInt]() =
doAssert subOverflow(T.low, T.low) == (T.low, false)
doAssert subOverflow(T.high, T.low) == (T.high, false)
doAssert subOverflow(T.high, T.high) == (T.low, false)

doAssert subOverflow(T.low, T.high) == (T(1), true)

doAssert subOverflow(T.high, T.high, false) == (T(0), false)
doAssert subOverflow(T.high, T.high, true) == (T.high, true)

template testAddOverflow() =
testAddOverflow[uint8]()
testAddOverflow[uint16]()
testAddOverflow[uint32]()
testAddOverflow[uint64]()
testAddOverflow[uint]()

template testSubOverflow() =
testSubOverflow[uint8]()
testSubOverflow[uint16]()
testSubOverflow[uint32]()
testSubOverflow[uint64]()
testSubOverflow[uint]()

template testMulWiden[T: SomeUnsignedInt]() =
doAssert mulWiden(T.low, T.low) == (T.low, T.low)
doAssert mulWiden(T(2), T(2)) == (T(4), T(0))
doAssert mulWiden(T.high, T(1)) == (T.high, T(0))
doAssert mulWiden(T(1), T.high) == (T.high, T(0))
doAssert mulWiden(T.high, T.high) == (T(1), T.high - 1)

doAssert mulWiden(T.high, T.high, T(0)) == (T(1), T.high - 1)
doAssert mulWiden(T.high, T.high, T.high) == (T(0), T.high)

# TODO testMulOverflow

template testMulWiden() =
testMulWiden[uint8]()
testMulWiden[uint16]()
testMulWiden[uint32]()
testMulWiden[uint64]()
testMulWiden[uint]()

template test() =
testAddOverflow()
testSubOverflow()
testMulWiden()

static: test()

suite "intops":
test "test":
test()