Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added upper case hex builder support #117

Closed
wants to merge 3 commits into from

Conversation

hth313
Copy link

@hth313 hth313 commented Mar 3, 2017

I fixed a bug in Data.ByteString.Builder.Prim.primMapByteStringBounded which caused an infinite loop due to incorrect test for buffer overflow. There are two other locations in this module that also make this kind of test, but they already do it right (using <=).

Support for upper-case hexadecimal builders is the main thing here. I went for doing it at C level and retired the Base16 module, which makes it more consistent.

There are also some new builders for writing a specified number of hexadecimal digits (suffix Width). These allows for writing arbitrary number of hex digits based on values either in 32 or 64 bit types. I needed them myself, as I have odd formatted hex data that does not follow the built-in Haskell types.

The test suite has been updated with more tests for the new hex builders.

hth313 added 2 commits March 3, 2017 11:34
primMapByteStringBounded would refuse to write to a buffer that had
just enough space left to fit an item, incorrectly thinking it had a
buffer overflow. Then asking for a new buffer of at least the size to
cover the item could result in getting a buffer of that size, causing
an infinite loop.
cbits/itoa.c Outdated
};

// unsigned long ints (64 bit words)
char* _hs_bytestring_long_long_uint_hex_upper (long long unsigned int x, char* buf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be safer to use uint64_t from <stdint.h>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint64_t means that it requires C99 (or later) and a byte oriented target that has a type that is exactly 64 bits, otherwise it may not exist.

Then there is the issue that it should match the Haskell side, and while there is a CLLong type (which represents a C long long type), there seems to be nothing in Foreign.C.Types that corresponds to a uint64_t.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth GHC itself requires C99. I believe the appropriate type to use on the Haskell side would be Word64. IFAIK, the Foreign.C.Types types exist simply to mirror the rather vague ANSI C types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the Haskell wiki and it seems you are right, it should be able to marshal a Word64. I will update the PR.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very clean although some of it goes well over my head.

Since this PR includes a bug fix, it's a bit unfortunate that it has been lying around for such a long time. It would be great to release this fix soon!

Regarding the build failures with 7.0 and 7.2, are these fundamental incompatibilities or does this need just a bit more CPP or the like?

@bgamari Are you happy with the C code in itoa.c?

Data/ByteString/Builder/Prim.hs Show resolved Hide resolved
Copy link
Contributor

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too could use the combinators with the user supplied widths, so that e.g. I could get the low 16 bits of a 32-bit word without making a copy (though aggressive inlining often eliminates the copies, it can be tricky to get right). To that end I'd also like to a see shift count, so that I could get the upper-16 bits of a 32-bit word, or similar. If we're generalizing bit-range extraction, why just the low bits?

Otherwise, and overall, I think the PR is useful. I just think that perhaps the Int width is likely better a Word8, and perhaps <stdint.h> is still not sufficiently portable...

>$< pairF word32HexFixed word32HexFixed
{-# INLINE encodeWordHexFixedWidth #-}
encodeWordHexFixedWidth :: forall a. (Storable a, Integral a) => Int -> FixedPrim a
encodeWordHexFixedWidth width = fixedPrim width $ c_uint32_fixed_hex (CInt (fromIntegral width)) . fromIntegral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the width here be unsigned? (e.g. Word8 rather than Int). Nasty surprises likely await anyone who passes a negative value, and also fixed widths above 255 bytes don't seem to have much utility. So I think there's a case for the width being a Word8, but a Word would also be OK(ish).

@@ -4,13 +4,15 @@
// inspired by: http://www.jb.man.ac.uk/~slowe/cpp/itoa.html //
///////////////////////////////////////////////////////////////

#include <stdio.h>
#include <stdint.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, sufficiently old, but still extant versions of Windows (really Visual Studio) don't have stdint.h. This may warrant confirmation wrt. to the supported Windows releases and toolchains for Haskell.

@Bodigrim Bodigrim added the blocked: info-needed somebody needs to provide more information or investigate label May 11, 2020
@hth313
Copy link
Author

hth313 commented May 22, 2020

Just a couple of comments. The reason why I needed these upper case hex characters is that I emit intel-hex and Motorola S-records. These are old formats. At least intel-hex demands upper case hex characters, some readers will not accept lower case hex digits.

Now that I also generate code for PDP-11, I get to think that maybe I want octal numbers as well. Those old people (even older than me) probably want to see list files in octal, LOL

So my question is, if I add another PR which handles octal numbers, will it get accepted or warehoused?

@vdukhovni
Copy link
Contributor

Just a couple of comments. The reason why I needed these upper case hex characters is that I emit intel-hex and Motorola S-records. These are old formats. At least intel-hex demands upper case hex characters, some readers will not accept lower case hex digits.

Now that I also generate code for PDP-11, I get to think that maybe I want octal numbers as well. Those old people (even older than me) probably want to see list files in octal, LOL

So my question is, if I add another PR which handles octal numbers, will it get accepted or warehoused?

Well, you can of course just use either of

fixedPrim :: Int -> (a -> Ptr Word8 -> IO ()) -> FixedPrim a
boudedPrim :: Int -> (a -> Ptr Word8 -> IO (Ptr Word8)) -> BoundedPrim a -- [sic]

to craft octal encoders for your own needs, without adding them to this library. And while I'm not in principle opposed to seeing octal encodings merged, perhaps the benefit (few use-cases) does not justify the cost (more code to maintain and larger library code size).

As for the stdint.h question, I'd like to see it used if available on all supported platforms, I was merely asking whether at this time all the supported Windows toolchains actually have stdint.h. I am not sure they do, but perhaps that's no longer an issue.

@vdukhovni
Copy link
Contributor

Can you fix the CI issues? It appears that CInt needs to be imported into Prim/ASCII.hs for early 7.x releases.

If this is to be merged at this stage, it would probably have to go into in without new features (e.g. octal).

@sjakobi sjakobi linked an issue Jun 25, 2020 that may be closed by this pull request
@Bodigrim Bodigrim added the blocked: patch-needed somebody needs to write a patch or contribute code label Jul 2, 2020
@Bodigrim
Copy link
Contributor

As I wrote in #112, I am not keen to add 25 new public functions for a feature with a narrow audience. If this is still wanted, I encourage to come up with a more concise API, which I'll be happy to review in a timely manner.

@Bodigrim
Copy link
Contributor

Bodigrim commented Jan 9, 2021

Closing as stalled and outdated. Feel free to reopen, if you come up with a solution, adding less new public functions.

@Bodigrim Bodigrim closed this Jan 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked: info-needed somebody needs to provide more information or investigate blocked: patch-needed somebody needs to write a patch or contribute code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upper case hex builders missing
6 participants