-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intops: core integer primitives #187
Open
arnetheduck
wants to merge
7
commits into
master
Choose a base branch
from
intops
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
391c2e2
intops: core integer primitives
arnetheduck 9ff33ed
add references
arnetheduck 403b5de
docs
arnetheduck 72d5338
Merge branch 'master' into intops
arnetheduck 1e2fbe1
test
arnetheduck 422b154
Merge branch 'intops' of github.com:status-im/nim-stew into intops
arnetheduck e9b2277
fix 32-bit
arnetheduck File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
## Core integer primitives suitable as building blocks for higher-level | ||
## functionality such as bigints, saturating integer types etc - where | ||
## applicable, these use compiler builtins - otherwise, they fall back on native | ||
## Nim code that may be less efficient. | ||
## | ||
## In using these functions, it is recommended that you always call the function | ||
## that returns the least information needed - for example, `mulOverflow` may | ||
## be implemented more efficiently than `mulWiden`, meaning that if overflow | ||
## detection is all that is needed, use the former. | ||
## | ||
## The API strives to map functions to platform-specific CPU instructions | ||
## via compiler intrinsics or other compiler/target-specific implementations. | ||
## Where this is not possible, the API instead emulates the instructions - such | ||
## emulation may result in the loss of properies important to some applications | ||
## such as constant-time:ness, atomicity or performance. | ||
|
||
# Implementation notes: | ||
# | ||
# * `uintN` is assumed to be wrapping | ||
# * "*Overflow" perform wrapping arithmetic while returning a bool for overflow | ||
# * "*Widen" return full result in multiple words | ||
# * overloads with carry/borrow exposed for chaining limbs | ||
# | ||
# TODO | ||
# * use compiler intrinsics | ||
# * signed ops | ||
# * saturating ops | ||
# * more primitives commonly available on CPU:s / intrinsics (pow / divmod / etc) | ||
# * discovery mechanism to determine implementation quality | ||
# | ||
# References: | ||
# https://llvm.org/docs/LangRef.html#arithmetic-with-overflow-intrinsics | ||
# https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html | ||
# https://doc.rust-lang.org/std/primitive.u32.html#implementations | ||
|
||
func addOverflow*(x, y: SomeUnsignedInt): | ||
tuple[result: SomeUnsignedInt, overflow: bool] = | ||
## Add the two integers using wrapping arithmetic, returning the result and a | ||
## boolean indicating that overflow happened. | ||
## | ||
## When used to construct bigint arithmetic, the overflow flag can be passed | ||
## as carry to the next more significant word. | ||
|
||
let r = x + y | ||
(r, r < x) | ||
|
||
func addOverflow*(x, y: SomeUnsignedInt, carry: bool): | ||
tuple[result: SomeUnsignedInt, overflow: bool] = | ||
## Add two integers and carry using wrapping arithmetic, returning the | ||
## result and a boolean indicating that overflow happened. | ||
## | ||
## When used to construct bigint arithmetic, the overflow flag can be passed | ||
## as carry to the next more significant word. | ||
|
||
let | ||
(a, b) = addOverflow(x, y) | ||
(c, d) = addOverflow(a, typeof(a)(carry)) | ||
(c, b or d) | ||
|
||
func subOverflow*(x, y: SomeUnsignedInt): | ||
tuple[result: SomeUnsignedInt, overflow: bool] = | ||
## Subtract y and borrow from x using wrapping arithmetic, returning the | ||
## result and a boolean indicating whether overflow happened. | ||
|
||
let r = x - y | ||
(r, y > x) | ||
|
||
func subOverflow*(x, y: SomeUnsignedInt, borrow: bool): | ||
tuple[result: SomeUnsignedInt, overflow: bool] = | ||
## Subtract y and borrow from x using wrapping arithmetic, returning the | ||
## result and a boolean indicating whether overflow happened. | ||
## | ||
## When used to construct bigint arithmetic, the overflow flag can be passed | ||
## as carry to the next more significant word. | ||
|
||
let | ||
(a, b) = subOverflow(x, y) | ||
(c, d) = subOverflow(a, typeof(a)(borrow)) | ||
(c, b or d) | ||
|
||
func mulWiden*(x, y: uint64): tuple[lo, hi: uint64] = | ||
let | ||
x0 = x and uint32.high | ||
x1 = x shr 32 | ||
y0 = y and uint32.high | ||
y1 = y shr 32 | ||
p11 = x1 * y1 | ||
p01 = x0 * y1 | ||
p10 = x1 * y0 | ||
p00 = x0 * y0 | ||
middle = p10 + (p00 shr 32) + (p01 and uint32.high) | ||
rhi = p11 + (middle shr 32) + (p01 shr 32) | ||
rlo = (middle shl 32) or (p00 and uint32.high) | ||
|
||
(rlo, rhi) | ||
|
||
func mulWiden*(x, y: uint32): tuple[lo, hi: uint32] = | ||
let r = x.uint64 * y.uint64 | ||
(cast[uint32](r and uint32.high), cast[uint32](r shr 32)) | ||
func mulWiden*(x, y: uint16): tuple[lo, hi: uint16] = | ||
let r = x.uint32 * y.uint32 | ||
(cast[uint16](r and uint16.high), cast[uint16](r shr 16)) | ||
func mulWiden*(x, y: uint8): tuple[lo, hi: uint8] = | ||
let r = x.uint16 * y.uint16 | ||
(cast[uint8](r and uint8.high), cast[uint8](r shr 8)) | ||
func mulWiden*(x, y: uint): tuple[lo, hi: uint] = | ||
## Perform `(x * y)` as if the computiation had been carried out in twice as | ||
## wide a type returning the low and high words. | ||
when sizeof(uint) == sizeof(uint64): | ||
let (a, b) = mulWiden(uint64(x), uint64(y)) | ||
else: | ||
let (a, b) = mulWiden(uint32(x), uint32(y)) | ||
(uint(a), uint(b)) | ||
|
||
func mulWiden*(x, y, carry: SomeUnsignedInt): tuple[lo, hi: SomeUnsignedInt] = | ||
## Perform `((x * y) + carry)` as if the computiation had been carried out in | ||
## twice as wide a type returning the low and high words | ||
let | ||
(lo, hi) = mulWiden(x, y) | ||
(a, b) = addOverflow(lo, carry) | ||
# The carry from this overflowing add can be ignored since the result of | ||
# a multiplication always leaves room for adding one more `high` | ||
(c, _) = addOverflow(hi, typeof(hi)(0), b) | ||
|
||
(a, c) | ||
|
||
func mulOverflow*(x, y: SomeUnsignedInt): | ||
tuple[result: SomeUnsignedInt, overflow: bool] = | ||
## Perform `(x * y)` using wrapping arithmetic, returning the result and a | ||
## boolean indicating that overflow happened. | ||
let | ||
(a, b) = mulWiden(x, y) | ||
(a, b > 0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
import unittest2 | ||
|
||
import ../stew/intops | ||
|
||
template testAddOverflow[T: SomeUnsignedInt]() = | ||
doAssert addOverflow(T.low, T.low) == (T.low, false) | ||
doAssert addOverflow(T.high, T.low) == (T.high, false) | ||
doAssert addOverflow(T.low, T.high) == (T.high, false) | ||
|
||
doAssert addOverflow(T.high, T.high) == (T.high - 1, true) | ||
|
||
doAssert addOverflow(T.high, T(0), false) == (T.high, false) | ||
doAssert addOverflow(T.high, T(0), true) == (T(0), true) | ||
doAssert addOverflow(T.high, T.high, true) == (T.high, true) | ||
|
||
template testSubOverflow[T: SomeUnsignedInt]() = | ||
doAssert subOverflow(T.low, T.low) == (T.low, false) | ||
doAssert subOverflow(T.high, T.low) == (T.high, false) | ||
doAssert subOverflow(T.high, T.high) == (T.low, false) | ||
|
||
doAssert subOverflow(T.low, T.high) == (T(1), true) | ||
|
||
doAssert subOverflow(T.high, T.high, false) == (T(0), false) | ||
doAssert subOverflow(T.high, T.high, true) == (T.high, true) | ||
|
||
template testAddOverflow() = | ||
testAddOverflow[uint8]() | ||
testAddOverflow[uint16]() | ||
testAddOverflow[uint32]() | ||
testAddOverflow[uint64]() | ||
testAddOverflow[uint]() | ||
|
||
template testSubOverflow() = | ||
testSubOverflow[uint8]() | ||
testSubOverflow[uint16]() | ||
testSubOverflow[uint32]() | ||
testSubOverflow[uint64]() | ||
testSubOverflow[uint]() | ||
|
||
template testMulWiden[T: SomeUnsignedInt]() = | ||
doAssert mulWiden(T.low, T.low) == (T.low, T.low) | ||
doAssert mulWiden(T(2), T(2)) == (T(4), T(0)) | ||
doAssert mulWiden(T.high, T(1)) == (T.high, T(0)) | ||
doAssert mulWiden(T(1), T.high) == (T.high, T(0)) | ||
doAssert mulWiden(T.high, T.high) == (T(1), T.high - 1) | ||
|
||
doAssert mulWiden(T.high, T.high, T(0)) == (T(1), T.high - 1) | ||
doAssert mulWiden(T.high, T.high, T.high) == (T(0), T.high) | ||
|
||
# TODO testMulOverflow | ||
|
||
template testMulWiden() = | ||
testMulWiden[uint8]() | ||
testMulWiden[uint16]() | ||
testMulWiden[uint32]() | ||
testMulWiden[uint64]() | ||
testMulWiden[uint]() | ||
|
||
template test() = | ||
testAddOverflow() | ||
testSubOverflow() | ||
testMulWiden() | ||
|
||
static: test() | ||
|
||
suite "intops": | ||
test "test": | ||
test() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really simplify the implementation of BigInt libraries? If I try to imagine the loop that will be used there, it seems to me that it will be more complicated and less performant when based on this helper function.
In particular, the reliance on a tuple that gets translated to a C struct is what makes me nervous. If the carry is communicated with an output parameter, the compiler is a bit more free to perform register allocations in more optimal ways.
Also, ultimately, the carry should probably be obtained from the CPU itself, but I guess your plan is to replace the bodies of these functions in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically modern compilers are able to deal with this - also, the fact that it's a return value and not a pointer gives the compiler some freedoms it otherwise doesn't have - in llvm for example, this is typically handled by the SROA pass that decomposes structs into individual elements then assigns registers based on the lifetimes of the fields themselves (which in this case are trivial).
see https://gcc.godbolt.org/z/Ex8P76fWr for an example of how it works with a struct ret type.
These implementations are meant for the VM mainly - the actual (future) implementations would use compiler builtins which unfortunately differ in their actual API between platforms and compilers, but yes, the ideal is that the compiler maps a function like this to its ADC instruction that does a 3-operand addition returning the carry in a flag.
For bigints, the 3-parameter carry form in particular is interesting - for saturating arithmetic, the 2-paremeter version without carry is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a bug and improved example: the code now shows that using the builtin actually has no advantage over the no-builtin code in this particular case - both end up using the right
ADC
instruction for a 192-bit integer for example:https://gcc.godbolt.org/z/bvW6aTr5a