-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement readNatural
plus readInt
and readWord
for 8, 16, 32, 64 bit and native machine bit sizes
#438
Conversation
Could you check whether the |
I don't think that's necessary. It is IMHO sufficient to observe that they did something really kludgey and didn't even check for overflows. While seeing the So I think the new refactored code is justified on its own merits, but of course opinions may differ, so I'm open to further discussion... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to accept these additions – it does seem likely that they will be useful.
@Bodigrim What do you think?
I haven't looked at the implementation yet.
I'm on board with these changes, looks like a good idea. Have not looked at implementation details yet. |
Thanks! In terms of implementation, it is basically a small refactor With that out of the way, the rest of the API is then just wrappers that read a Word64 and if it is in range return the requested data type (Int, Word, Int64), and if not then Don't know whether there's demand for |
Data/ByteString/Char8.hs
Outdated
| w <= wordMaxAsWord64 | ||
, let !i = fromIntegral w | ||
, i >= 0 = Just (i, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| w <= wordMaxAsWord64 | |
, let !i = fromIntegral w | |
, i >= 0 = Just (i, str) | |
| w <= intMaxAsWord64 = Just (fromIntegral w, str) |
Seems a bit simpler, no?
I'd also suggest to introduce monomorphic helpers word64ToInt = fromIntegral
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be sure to force fromIntegral w
, so that ideally GHC would elide the thunk. Hence !i = ...
Otherwise, you're right, your bounds check is equivalent, so I can drop one conditional. So it would be:
| w <= intMaxAsWord64 = let !i = fromIntegral w in Just (i, str)
I just pushed a fixup commit, that addresses the int bounds check, and tidies up some comments, cosmetic issues. Whoever decides to merge should squash first (I can do that as a force push once all is approved if you like). |
Data/ByteString/Char8.hs
Outdated
cvtneg (Just (w, str)) | ||
| w <= wordMaxAsWord64 | ||
, let !i = negate $ fromIntegral w | ||
, i <= 0 = Just (i, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably also use w <= negatedIntMinAsWord64
to save a comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you cleanly express negatedIntMinAsWord64
? The naive fromIntegral (minBound :: Int) :: Word64
isn't it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My less naive version is: fromIntegral (fromIntegral (minBound :: Int) :: Word) :: Word64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which, withTypeApplications
I guess becomes:
intMinAsWord64 = fromIntegral @Word @Word64 $ fromIntegral @Int @Word minBound
Do you want to force "neg" into the name, or is it OK to leave it implicit that this is an absolute value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you.
Oops, forgot to add the monomorphic helpers. I guess another fixup commit? |
Readability (it is easier to validate correctness of casts when types are explicit) and type-guided refactoring (it's harder to introduce a mistake after changing some types). Since we target GHC 8.0+, alternatively you can use |
OK, we're on the same page then. Just wanted to know I wasn't missing something deeper... |
I think that's everything noted so far... |
What we don't have is a 32-bit test environment to really make sure that 32-bit Ints are handled correctly, we presently only know this by code inspection, but perhaps 32-bit systems are no longer a concern? In any case, don't know how to specify that in CI, or find GHC builds for 32-bit systems, ... |
I guess if we actually add the corresponding |
We do have 32-bit CI, both arm and intel: https://cloud.drone.io/haskell/bytestring/132 |
I didn't know that bytestring has 32-bit CI tests. No worries. About
Any preference for that over the non-negating previous version? |
|
Perhaps so, but I do like not calling negate on unsigned quantities. I'll do whatever Simon says. :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeApplications
is a blessing!
Two more commits (ultimately to be squashed I think). These should address all outstanding issues. |
Data/ByteString/Char8.hs
Outdated
@@ -1,7 +1,12 @@ | |||
{-# LANGUAGE AllowAmbiguousTypes #-} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll hate myself for this suggestion, when I'll need to backport something to bytestring-0.11
, but...)
Let's move read{Word,Int}
-related functionality into a separate internal module, so that the main one remains unpolluted by new language extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, works for me. I don't think it needs to be visible to users, so it should be no problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move
read{Word,Int}
-related functionality into a separate internal module, so that the main one remains unpolluted by new language extensions.
I don't mind how this has played out for this PR, but what's the motivation for not using TypeApplications
in D.B.Char8
?
@Bodigrim's suggestion to factor out the code into a separate module turned out to be a major win. This pretty much eliminated all the code duplication, and made it easier to mimic the more efficient code path from the overflow-checked functions to similarly simplify and rework I like the result. Sorry it is basically a clean slate now, but actually should be easy to review... |
Thanks for the prompt review, pushed a fixup. |
And we comparing both
Either way there are n-1 products, but the size distribution of operations is different. |
Any eps > 0 will do.
There is no difference in native backend, because it takes O(n*m) time to multiply n bits by m bits. But in libgmp backend, which is used by default on the majority of platforms, the former approach is faster. See https://github.com/Bodigrim/fast-digits/blob/aef0438dba67d49b814857663c24e283ea95c2ed/src/Data/FastDigits.hs#L114-L119. It deals with a reverse scenario, when you split a large number into digits, but asymptotics are the same. It is faster to divide-and-conquer than chip digits one by one. Same for reconstructing a number from digits. |
So it sounds like I'll be reverting the most recent fixup, but I'm not sure what to say about asymptotic performance benefits of divide-and-conquer in the code commentary. I think it is difficult to be accurate... |
You can refer https://gmplib.org/manual/Multiplication-Algorithms in a comment, it lists all subquadratic algorithms. For reasonably-long integers I'd quote Toom-4 estimate, O(n^1.4). |
Reverted to divide and conquer and addressed comment nits. |
87a5277
to
17aba92
Compare
This might be it? Any further nits? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, these are the last nitpicks from my side.
a2a5813
to
fa2a08f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Overall count of changes in Data
is 568 insertions, 313 deletions, and the delta of 250 lines is mostly due to comments. And we provide a lot of new safe and fast (on par with bytestring-lexing
) routines.
rather than:
the latter looks simpler but involves more underlying conversions to actually perform the negation (NCG backend, GHC 9.2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments regarding the library.
In the future I hope that we can avoid huge PRs like this one. I'd much prefer to review a series of smallish PRs that incrementally builds the same functionality.
I am surprised to see this called a large PR... It is just 100 lines of code or so in each of two modules. It is hard to see how this would happen incrementally, it is a rewrite that makes That said, I don't have more such things in the pipeline, unless you'd welcome a follow for analogous functions for hex, perhaps even octal, ... (full range of functions from bytestring-lexing, but without uncaught overflows). |
Some applications want to read either unsigned or explicitly 64-bit integers (e.g. warp). Provide all the missing overflow-checked interfaces. * readInt8, readInt16, readInt32, readInt64 * readWord, readWord8, readWord16, readWord32, readWord64 * readNatural Cleaned up the code and improved tests. Uses Word as the accumular for all types other than Int64 and Word64, which use Word64. When words are 64 bit uses base 10^19 rather than 10^9 when assembling Natural and Integer values.
It's not just the number of lines changed. It's also the huge number of review comments and corresponding changes. I think the repeated squashing of commits also made it very hard to review this PR incrementally. IIRC the scope of the PR was also increased by adding This isn't meant to criticize you, @vdukhovni. I wish I had realized my problem earlier and communicated it. In the future I'll try to request limiting the size of similar PRs at an early stage. For example, it would be possible to add naive implementations combined with tests and benchmarks in a first PR and do optimizations in a second one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
prop_readIntBoundsCC = rdWordBounds @Word | ||
&& rdWordBounds @Word8 | ||
&& rdWordBounds @Word16 | ||
&& rdWordBounds @Word32 | ||
&& rdWordBounds @Word64 | ||
&& rdIntBounds @Int | ||
&& rdIntBounds @Int8 | ||
&& rdIntBounds @Int16 | ||
&& rdIntBounds @Int32 | ||
&& rdIntBounds @Int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that combining properties with &&
will make them a bit hard to debug if they do end up failing. Combining them with (.&&.)
should make that easier, I think.
At this stage, it's probably not worth changing though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know about ".&&.", improving the tests without touching the main code may be reasonable? Your call...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to get this PR finished as soon as possible. Feel free to send more improvements in follow-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to get this PR finished as soon as possible. Feel free to send more improvements in follow-up PRs.
Makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good use case for conjoin
.
Thanks! |
+1, please do not squash any sizeable branches during review process. We'll squash when merging. (I remember that I fat-fingered rebase instead of squash in #309, sorry for that.)
+1, to avoid possible disappointment, I'd prefer a simple PR to agree on API first and elaborate implementation in subsequent PRs. I personally do not have use cases for octal / hexadecimal parsing, but I assume their implementation could be much simpler than for decimals?.. Providing it out of the box would be nice. @sjakobi what do you think? |
For reading hexadecimal numbers, I'm aware of a use case in What's the use case for reading octal numbers? If we aren't aware of one, I think it would be better to wait until one comes up – API size does affect maintainability after all. Also, let's please move this discussion to a proper issue. A closed PR is a bad place for it. |
Some applications want to read either unsigned or explicitly 64-bit integers
(e.g. warp). Provide the missing interfaces. Based on a suggestion by @Bodigrim the code has been moved to a single module enabling simpler maintenance and reduced duplication. A very helpful nudge...