-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add intrinsics for bigint helper methods #131566
Conversation
This comment has been minimized.
This comment has been minimized.
c8d0720
to
537c448
Compare
This comment has been minimized.
This comment has been minimized.
Not familiar with LLVM to the extent necessary. I think this is T-compiler land given that it's an intrinsic? r? compiler |
Sorry I forgot to respond to this, but: yes. The goal here is to add the intrinsics without worrying about the libs side since everything is unstable anyway. |
537c448
to
e7fd242
Compare
Gonna be honest, I managed to figure out how the LLVM code works, I managed to figure out how the consteval code works, and I will probably be able to figure out how the cranelift code works. I have absolutely no idea how the GCC code is supposed to work, especially with the various ABI calling conventions in the code that look like they're statically determined, but somehow also not. I initially wanted to offer fallback implementations for these intrinsics, but that's not really possible since there's a lot of maths that can't really be done in a type-generic way without adding way too many bounds on the intrinsics. I'm just going to leave these unimplemented for now and someone with way more experience on these backends can pick up the slack. Or, if someone here wants to give me a bit more advice on how this would even be implementable, I can maybe try to do it myself. I also don't know the best way of testing the consteval code because the ability to run tests in const context is basically nonexistent. I could probably be more creative and figure a way around this, like I have for library code, but eh. I'm just going to leave it for now and hand-wave it as an "unstable feature" for now. It's also just, kinda worth commenting on how much commenting is… not, present on a lot of this code. Like, the Rust code is fine, but since most of these backends are used over FFI, there are a lot of weird conventions that have to be followed and I have absolutely no idea what those are. I filed #131562 to cover part of this but it feels like lack of documentation in the compiler is probably a much bigger issue than I perceived it would be. Like, I know that a lot of the rustc dev guide stuff says you can message people on Zulip for advice, but uh. "Message someone in chat" is not documentation, obviously. And I hate bothering people. That's my rambling for now. Will try and make sure I can get at least everything to compile and the tests to pass. I know that the LLVM backend code works for sure, since I got it to pass all the library tests, but everything else is a massive shrug emoji for now. |
This comment has been minimized.
This comment has been minimized.
e7fd242
to
f7294c6
Compare
This comment has been minimized.
This comment has been minimized.
f7294c6
to
b08b546
Compare
This comment has been minimized.
This comment has been minimized.
b08b546
to
14d8338
Compare
Some changes occurred to the CTFE / Miri interpreter cc @rust-lang/miri Some changes occurred in compiler/rustc_codegen_gcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the Miri implementation! :)
However, it does not seem to be tested?
(lo, hi) | ||
} | ||
} else { | ||
let prod = l * r + c1 + c2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining why this does not cause overflow. Also please use strict_
operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will convert to strict, although I thought that the lack of overflow was evident by the fact that the 128-bit case was covered separately-- all other integers would be maximum 64 bits, where this operation cannot overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are multiplying two 64bit numbers here, which results in a 128bit number. Then you add more stuff. u64::MAX * u64::MAX
is fairly close to u128::MAX
, and then we add stuff... why can't this overflow? I have no intuition for this, so it definitely needs comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that u64::MAX * u64::MAX + u64::MAX + u64::MAX
is actually u128::MAX
, which is the principle behind allowing up to two carries for double-wide multiplication.
But yes, I'll add comments.
} | ||
#[cfg(not(bootstrap))] | ||
{ | ||
let (lo, hi) = l.carrying2_mul(r, c1, c2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate... for basic arithmetic like this, it'd be better if we had our own implementation of them rather than having to use the host operation. How hard would that be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my thought process here is that the ideal solution is to replace the current hard-coded i128
/u128
version of the code with a bigint implementation and do the multiplication directly in all cases. That would be the most ideal, and it would support a potential future with integers larger than 128 bits. It would also likely use methods like this one to perform bigint multiplication.
However, without that, my thought process was that I could either manually code a version of carrying2_mul
here that would perform worse and require extra scrutiny, or just use the version that's already been implemented and tested.
I'll defer to whatever you think is the better option, but that at least explains my reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how hard this is to implement directly, is there some code somewhere that would give me an idea?
It's also not great to have a completely different codepath for u128 and the rest, that makes proper testing more tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, just sharing a few of the implementations mentioned on the tracking issue:
- Tracking Issue for bigint helper methods #85532 (comment)
- Tracking Issue for bigint helper methods #85532 (comment)
- Tracking Issue for bigint helper methods #85532 (comment)
- Tracking Issue for bigint helper methods #85532 (comment)
Note that also, regardless of what we do, the result of double-wide multiplication of 128-bit integers is going to be two 128-bit integers, and it's only going to be the case of 128-bit integers where we need to scoop out the extra data from the more-significant 128 bits. So, effectively, even if I had the same path for all integers using the 128-bit double-wide mul, we'd still be special-casing 128 bits by only looking at the higher-order word in the 128-bit case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a bigint implementation
FWIW I'd love that for all our arithmetic.^^ It's probably too slow though. And using it only sometimes seems odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, without that, my thought process was that I could either manually code a version of carrying2_mul here that would perform worse and require extra scrutiny, or just use the version that's already been implemented and tested.
So, my thought process here is that we typically want to be independent from possible bugs in the standard library, and provide our own reference implementation. But, we haven't done that in numeric_intrinsic, so it'd be odd to use a higher standard here.
So fair, please stick with the current implementation, just with more comments.
let overflowed = Scalar::from_bool(if l.layout.abi.is_signed() { | ||
overflowed1 != overflowed2 | ||
} else { | ||
overflowed1 | overflowed2 | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining the logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely should just update the intrinsic docs to clarify the behaviour here, although the standard library has a bit of a weird relationship with how it documents intrinsics.
Most of them rely on having stabilised versions that they can just point to, since those will have the proper docs. The documentation here is split between iN::carrying_add
, uN::carrying_add
, iN::borrowing_sub
, and uN::borrowing_sub
, but the bottom line is that signed methods are merely checking for overflow, whereas unsigned methods want to actually return a new carry bit that can be chained along. That's what we're testing for in the methods and I'm just duplicating that here.
Not sure what the best solution for documentation would be here; open to ideas. I could just link those docs here, for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't understand why the signed thing overflowed if exactly one of the sub-operations overflowed. Like I could probably think about it for a minute and figure it out, but there should really be comments explaining that.
Your answer confused me even more, in which sense is the result different for signed vs unsigned? That should also be documented...
@@ -573,6 +601,106 @@ impl<'tcx, M: Machine<'tcx>> InterpCx<'tcx, M> { | |||
}) | |||
} | |||
|
|||
pub fn carrying_arith( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a doc comment to both new methods
@@ -626,6 +626,17 @@ where | |||
self.write_immediate(Immediate::Scalar(val.into()), dest) | |||
} | |||
|
|||
/// Write a scalar pair to a place | |||
#[inline(always)] | |||
pub fn write_scalar_pair( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this helper. We have immediates specifically to represent these pairs. Also this encourages to have scalar pairs without having their type, which is dangerous. I think we actually want to change write_immediate
to take an ImmTy
instead of an Immediate
, but that's a larger change... but this helper moves us in the wrong direction IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having typed immediates everywhere, I just know that this isn't what the code is doing right now, and that's why I added this method, since it's either this or import Immediate
directly into the intrinsics module and construct one myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or import Immediate directly into the intrinsics module and construct one myself.
Yes please do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed that the presence of write_scalar
was to avoid that, but I'll keep that in mind. Perhaps write_scalar
should also be removed if the goal is to have typed immediates everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalars are much less at risk of odd effects due to bad types since there's no field offsets / padding being computed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. I wasn't aware that this padding/offset was even meaningful since I thought that the reason for having a "pair" primitive was explicitly to avoid this.
I have no idea what this refers to, nor what it means. But the docs don't seem to talk about it at all, so that seems bad? If this pertains to the behavior of the intrinsics, then please add it to the intrinsic docs.
We have two ways of testing them. One is to add some functions in |
So, my main issue with testing these is that I was running into lots of headaches with the common testing faculties being unusable in const context. I will admit that when I was trying to get it working, I got frustrated and gave up instead of remembering some of the alternatives I've used in the past, but it was still frustrating. The biggest issue is that tuple equality does not work in const context, and so I have to resort to other methods. What seems most reasonable is to use |
Yeah, either |
So, I'm going to close this, not because I want to stop working on it, but because I feel like it's going to be easier to work on this in smaller chunks, like just So, it's not over, but, gonna be split up a bit. |
…=thomcc Run most `core::num` tests in const context too This adds some infrastructure for something I was going to use in rust-lang#131566, but it felt worthwhile enough on its own to merge/discuss separately. Essentially, right now we tend to rely on UI tests to ensure that things work in const context, rather than just using library tests. This uses a few simple macro tricks to make it *relatively* painless to execute tests in both runtime and compile-time context. And this only applies to the numeric tests, and not anything else. Recommended to review without whitespace in the diff. cc `@RalfJung`
Rollup merge of rust-lang#131707 - clarfonthey:constify-core-tests, r=thomcc Run most `core::num` tests in const context too This adds some infrastructure for something I was going to use in rust-lang#131566, but it felt worthwhile enough on its own to merge/discuss separately. Essentially, right now we tend to rely on UI tests to ensure that things work in const context, rather than just using library tests. This uses a few simple macro tricks to make it *relatively* painless to execute tests in both runtime and compile-time context. And this only applies to the numeric tests, and not anything else. Recommended to review without whitespace in the diff. cc `@RalfJung`
Tracking issue: #85532
This adds the following intrinsics:
add_with_carry
, the implementation forcarrying_add
sub_with_carry
, the implementation forborrowing_sub
mul_double
, the implementation forwidening_mul
mul_double_add
, the implementation forcarrying_mul
mul_double_add2
, the implementation for a new method,carrying2_mul
, which is likecarrying_mul
but accepts two arguments to add instead of oneRight now, there is no significant advantage to the
add_with_carry
andsub_with_carry
intrinsics on LLVM over the existing implementation, since the intrinsic version is effectively identical in implementation. However, the GCC backend does have dedicated intrinsics for these, and LLVM and other backends may eventually support them.The largest advantage comes from the multiplication intrinsics, since they can internally cast up to
256-bit
integer multiplication fori128
andu128
, which is not currently possible in Rust alone. The result will very likely optimize better than if it were written by hand.A few additional notes:
I'm marking this PR as WIP because I have only done the LLVM implementation of the intrinsics, and it feels appropriate to do the other backends as well, since these feel like pretty fundamental operations that people will want to use. In particular, miri needs to also know how to execute these intrinsics as well, and I haven't implemented that either.