Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose fromStrict/toStrict directly from Data.ByteString #281

Merged
merged 1 commit into from
Sep 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Data/ByteString.hs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ module Data.ByteString (
singleton, -- :: Word8 -> ByteString
pack, -- :: [Word8] -> ByteString
unpack, -- :: ByteString -> [Word8]
fromStrict, -- :: ByteString -> Lazy.ByteString
toStrict, -- :: Lazy.ByteString -> ByteString
Comment on lines +53 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, in the context of the Data.ByteString module, these seem like better names to me:

Suggested change
fromStrict, -- :: ByteString -> Lazy.ByteString
toStrict, -- :: Lazy.ByteString -> ByteString
toLazy, -- :: ByteString -> Lazy.ByteString
fromLazy, -- :: Lazy.ByteString -> ByteString

Data.ByteString.Lazy should keep the old names though.

Thoughts?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the new names better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to have the very same function under two different names. Is there any prior art?

Imagine having both modules in scope:

import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BL

If fromStrict is the same entity, just re-exported, GHC suggests only one option for a hole _ :: BS.ByteString -> BL.ByteString - I do not even have to search for this function in haddocks! But if we define toLazy = fromStrict in Data.ByteString, GHC would be obliged to suggest both options. Now this is really confusing for a user: he does not know that it is just synonyms, so needs to check haddocks for both modules and painstakingly compare semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, that means that folks who just want to trim the import lists don't win. They have to change their code, and if they want a simple migration path to the new bytestring releases that make the imports optional, they've more work to do.

One might even provide both sets of names. fromStrict == toLazy and fromLazy == toStrict, and re-export both forms from the Internal module. So you can have either name from either source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reason for proposing these names was that I think that with Data.ByteString.toStrict and Data.ByteString.fromStrict it's not obvious which type we're respectively converting from or to. The other type could plausibly be [Word8] or String.

However, that means that folks who just want to trim the import lists don't win. They have to change their code, and if they want a simple migration path to the new bytestring releases that make the imports optional, they've more work to do.

@vdukhovni Can you expand on the scenario that you're describing here?

I don't understand how anyone could profit from the new {from,to}Strict re-exports while retaining compatibility with older bytestring versions.


-- * Basic interface
cons, -- :: Word8 -> ByteString -> ByteString
Expand Down Expand Up @@ -233,6 +235,7 @@ import Data.Bits (bitSize, shiftL, (.|.), (.&.))
#endif

import Data.ByteString.Internal
import Data.ByteString.Lazy.Internal (fromStrict, toStrict)
import Data.ByteString.Unsafe

import qualified Data.List as List
Expand Down
2 changes: 2 additions & 0 deletions Data/ByteString/Char8.hs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ module Data.ByteString.Char8 (
singleton, -- :: Char -> ByteString
pack, -- :: String -> ByteString
unpack, -- :: ByteString -> String
B.fromStrict, -- :: ByteString -> Lazy.ByteString
B.toStrict, -- :: Lazy.ByteString -> ByteString

-- * Basic interface
cons, -- :: Char -> ByteString -> ByteString
Expand Down
47 changes: 0 additions & 47 deletions Data/ByteString/Lazy.hs
Original file line number Diff line number Diff line change
Expand Up @@ -272,53 +272,6 @@ fromChunks cs = L.foldr chunk Empty cs
toChunks :: ByteString -> [P.ByteString]
toChunks cs = foldrChunks (:) [] cs

-- |/O(1)/ Convert a strict 'ByteString' into a lazy 'ByteString'.
fromStrict :: P.ByteString -> ByteString
fromStrict bs | S.null bs = Empty
| otherwise = Chunk bs Empty

-- |/O(n)/ Convert a lazy 'ByteString' into a strict 'ByteString'.
--
-- Note that this is an /expensive/ operation that forces the whole lazy
-- ByteString into memory and then copies all the data. If possible, try to
-- avoid converting back and forth between strict and lazy bytestrings.
--
toStrict :: ByteString -> S.ByteString
toStrict = \cs -> goLen0 cs cs
-- We pass the original [ByteString] (bss0) through as an argument through
-- goLen0, goLen1, and goLen since we will need it again in goCopy. Passing
-- it as an explicit argument avoids capturing it in these functions'
-- closures which would result in unnecessary closure allocation.
where
-- It's still possible that the result is empty
goLen0 _ Empty = S.empty
goLen0 cs0 (Chunk c cs) | S.null c = goLen0 cs0 cs
goLen0 cs0 (Chunk c cs) = goLen1 cs0 c cs

-- It's still possible that the result is a single chunk
goLen1 _ bs Empty = bs
goLen1 cs0 bs (Chunk c cs)
| S.null c = goLen1 cs0 bs cs
| otherwise =
goLen cs0 (S.checkedAdd "Lazy.concat" (S.length bs) (S.length c)) cs

-- General case, just find the total length we'll need
goLen cs0 !total (Chunk c cs) = goLen cs0 total' cs
where
total' = S.checkedAdd "Lazy.concat" total (S.length c)
goLen cs0 total Empty =
S.unsafeCreate total $ \ptr -> goCopy cs0 ptr

-- Copy the data
goCopy Empty !_ = return ()
goCopy (Chunk (S.BS _ 0 ) cs) !ptr = goCopy cs ptr
goCopy (Chunk (S.BS fp len) cs) !ptr = do
withForeignPtr fp $ \p -> do
S.memcpy ptr p len
goCopy cs (ptr `plusPtr` len)
-- See the comment on Data.ByteString.Internal.concat for some background on
-- this implementation.

------------------------------------------------------------------------

{-
Expand Down
2 changes: 1 addition & 1 deletion Data/ByteString/Lazy/Char8.hs
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ module Data.ByteString.Lazy.Char8 (

-- Functions transparently exported
import Data.ByteString.Lazy
(fromChunks, toChunks, fromStrict, toStrict
(fromChunks, toChunks
,empty,null,length,tail,init,append,reverse,transpose,cycle
,concat,take,drop,splitAt,intercalate
,isPrefixOf,isSuffixOf,group,inits,tails,copy
Expand Down
78 changes: 65 additions & 13 deletions Data/ByteString/Lazy/Internal.hs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{-# LANGUAGE CPP #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE TypeFamilies #-}
#if __GLASGOW_HASKELL__ >= 703
Expand Down Expand Up @@ -41,15 +42,18 @@ module Data.ByteString.Lazy.Internal (
-- * Conversion with lists: packing and unpacking
packBytes, packChars,
unpackBytes, unpackChars,
-- * Conversions with strict ByteString
fromStrict, toStrict,

) where

import Prelude hiding (concat)

import qualified Data.ByteString.Internal as S
import qualified Data.ByteString as S (length, take, drop)

import Data.Word (Word8)
import Data.Word (Word8)
import Foreign.ForeignPtr (withForeignPtr)
import Foreign.Ptr (plusPtr)
import Foreign.Storable (Storable(sizeOf))

#if MIN_VERSION_base(4,13,0)
Expand Down Expand Up @@ -236,26 +240,26 @@ eq :: ByteString -> ByteString -> Bool
eq Empty Empty = True
eq Empty _ = False
eq _ Empty = False
eq (Chunk a as) (Chunk b bs) =
case compare (S.length a) (S.length b) of
LT -> a == S.take (S.length a) b && eq as (Chunk (S.drop (S.length a) b) bs)
EQ -> a == b && eq as bs
GT -> S.take (S.length b) a == b && eq (Chunk (S.drop (S.length b) a) as) bs
eq (Chunk a@(S.BS ap al) as) (Chunk b@(S.BS bp bl) bs) =
case compare al bl of
LT -> a == S.BS bp al && eq as (Chunk (S.BS (S.plusForeignPtr bp al) (bl - al)) bs)
EQ -> a == b && eq as bs
GT -> S.BS ap bl == b && eq (Chunk (S.BS (S.plusForeignPtr ap bl) (al - bl)) as) bs

cmp :: ByteString -> ByteString -> Ordering
cmp Empty Empty = EQ
cmp Empty _ = LT
cmp _ Empty = GT
cmp (Chunk a as) (Chunk b bs) =
case compare (S.length a) (S.length b) of
LT -> case compare a (S.take (S.length a) b) of
EQ -> cmp as (Chunk (S.drop (S.length a) b) bs)
cmp (Chunk a@(S.BS ap al) as) (Chunk b@(S.BS bp bl) bs) =
case compare al bl of
LT -> case compare a (S.BS bp al) of
EQ -> cmp as (Chunk (S.BS (S.plusForeignPtr bp al) (bl - al)) bs)
result -> result
EQ -> case compare a b of
EQ -> cmp as bs
result -> result
GT -> case compare (S.take (S.length b) a) b of
EQ -> cmp (Chunk (S.drop (S.length b) a) as) bs
GT -> case compare (S.BS ap bl) b of
EQ -> cmp (Chunk (S.BS (S.plusForeignPtr ap bl) (al - bl)) as) bs
result -> result

append :: ByteString -> ByteString -> ByteString
Expand All @@ -268,3 +272,51 @@ concat css0 = to css0
go (Chunk c cs) css = Chunk c (go cs css)
to [] = Empty
to (cs:css) = go cs css

------------------------------------------------------------------------
-- Conversions

-- |/O(1)/ Convert a strict 'ByteString' into a lazy 'ByteString'.
fromStrict :: S.ByteString -> ByteString
fromStrict (S.BS _ 0) = Empty
fromStrict bs = Chunk bs Empty

-- |/O(n)/ Convert a lazy 'ByteString' into a strict 'ByteString'.
--
-- Note that this is an /expensive/ operation that forces the whole lazy
-- ByteString into memory and then copies all the data. If possible, try to
-- avoid converting back and forth between strict and lazy bytestrings.
--
toStrict :: ByteString -> S.ByteString
toStrict = \cs -> goLen0 cs cs
-- We pass the original [ByteString] (bss0) through as an argument through
-- goLen0, goLen1, and goLen since we will need it again in goCopy. Passing
-- it as an explicit argument avoids capturing it in these functions'
-- closures which would result in unnecessary closure allocation.
where
-- It's still possible that the result is empty
goLen0 _ Empty = S.BS S.nullForeignPtr 0
goLen0 cs0 (Chunk (S.BS _ 0) cs) = goLen0 cs0 cs
goLen0 cs0 (Chunk c cs) = goLen1 cs0 c cs

-- It's still possible that the result is a single chunk
goLen1 _ bs Empty = bs
goLen1 cs0 bs (Chunk (S.BS _ 0) cs) = goLen1 cs0 bs cs
goLen1 cs0 (S.BS _ bl) (Chunk (S.BS _ cl) cs) =
goLen cs0 (S.checkedAdd "Lazy.concat" bl cl) cs

-- General case, just find the total length we'll need
goLen cs0 !total (Chunk (S.BS _ cl) cs) =
goLen cs0 (S.checkedAdd "Lazy.concat" total cl) cs
goLen cs0 total Empty =
S.unsafeCreate total $ \ptr -> goCopy cs0 ptr

-- Copy the data
goCopy Empty !_ = return ()
goCopy (Chunk (S.BS _ 0 ) cs) !ptr = goCopy cs ptr
goCopy (Chunk (S.BS fp len) cs) !ptr = do
withForeignPtr fp $ \p -> do
S.memcpy ptr p len
goCopy cs (ptr `plusPtr` len)
-- See the comment on Data.ByteString.Internal.concat for some background on
-- this implementation.