-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try optimizing loops in Data.ByteString.findIndex[End]
in the style of #273
#338
Comments
Just tried this out on findIndexEnd :: (Word8 -> Bool) -> ByteString -> Maybe Int
findIndexEnd k (BS x l) = accursedUnutterablePerformIO $ withForeignPtr x $
\fp ->
let
go !n | n < 0 = return Nothing
| otherwise = do w <- peekByteOff fp n
if k w
then return (Just n)
else go (n-1)
in
go (l-1)
{-# INLINE findIndexEnd #-} On the
Newer:
Haven't yet experimented with a similar technique for |
@Boarders if you run benchmarks with Feel free to throw new benchmarks into A PR would be much appreciated. |
In the above I was getting my performance numbers backwards, I tried out findIndexEnd :: (Word8 -> Bool) -> ByteString -> Maybe Int
findIndexEnd k (BS x l) = accursedUnutterablePerformIO $ withForeignPtr x $
\fp ->
let
go !n | n < 0 = return Nothing
| otherwise = do w <- peekByteOff fp n
if k w
then return (Just n)
else go (n-1)
in
go (l-1)
{-# INLINE findIndexEnd #-} findIndexEnd :: (Word8 -> Bool) -> ByteString -> Maybe Int
findIndexEnd k (BS x l) = accursedUnutterablePerformIO $ withForeignPtr x $ \ fp ->
let
start = fp `plusPtr` (l - 1)
end = fp `plusPtr` (- 1)
go !ptr | ptr == end = return Nothing
| otherwise = do w <- peek ptr
if k w
then return (Just ((ptr `minusPtr` end) - 1))
else go (ptr `plusPtr` (- 1))
in
go start
{-# INLINE findIndexEnd #-} and the same sort of variants on |
@Boarders Could it be that the benchmarks you're looking at simply don't exercise the |
Ah ok, I am dumb ( :D ). Let me check again! |
No worries, @Boarders! The naming is definitely confusing. We could consider changing the |
I wrote some (not excellent) benchmarks and this change does improve things:
new:
I'll see if there are any other obvious functions like this and then make a PR. |
I just discovered even map benefits from this sort of transformation! Changing: map :: (Word8 -> Word8) -> ByteString -> ByteString
map f (BS fp len) = unsafeDupablePerformIO $ withForeignPtr fp $ \a ->
create len $ map_ 0 a
where
map_ :: Int -> Ptr Word8 -> Ptr Word8 -> IO ()
map_ !n !p1 !p2
| n >= len = return ()
| otherwise = do
x <- peekByteOff p1 n
pokeByteOff p2 n (f x)
map_ (n+1) p1 p2
{-# INLINE map #-} to map :: (Word8 -> Word8) -> ByteString -> ByteString
map f (BS fp len) = unsafeDupablePerformIO $ unsafeWithForeignPtr fp $ \ptr1 ->
create len $ \ptr2 -> m ptr1 ptr2
where
m p1 p2 = map_ 0
where
map_ :: Int -> IO ()
map_ !n
| n >= len = return ()
| otherwise = do
x <- peekByteOff p1 n
pokeByteOff p2 n (f x)
map_ (n+1)
{-# INLINE map #-} Gave me this on a benchmark:
|
@Boarders Awesome! Please check that there is no regression for shorter strings. This kind of optimization can potentially lead to a (constant) increase of heap allocations. If you'd be able to accompany your PR with Core dumps of modified functions before and after, it would help to review a lot. If new Core contains |
Just quickly checked, it is faster (by a reasonable margin) on every example I have tried. I'll see what I can do about getting some core, if you have any recommended place to put it then that would help. |
#348 implements the renaming. |
@Boarders, for completeness, did you check that there are no more inner loops left where we might be able to reduce the number of arguments? |
@sjakobi : The only other example I came across is the new packZipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString
packZipWith f (BS fp l) (BS fq m) = unsafeDupablePerformIO $
withForeignPtr fp $ \a ->
withForeignPtr fq $ \b ->
create len $ go a b
where
go p1 p2 = zipWith_ 0
where
zipWith_ :: Int -> Ptr Word8 -> IO ()
zipWith_ !n !r
| n >= len = return ()
| otherwise = do
x <- peekByteOff p1 n
y <- peekByteOff p2 n
pokeByteOff r n (f x y)
zipWith_ (n+1) r
len = min l m
{-# INLINE packZipWith #-} That could be re-written to: packZipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString
packZipWith f (BS fp l) (BS fq m) = unsafeDupablePerformIO $
withForeignPtr fp $ \srcPtr1 ->
withForeignPtr fq $ \srcPtr2 ->
create len $ \destPtr -> go srcPtr1 srcPtr2 destPtr
where
go p1 p2 dest = zipWith_ 0
where
zipWith_ :: Int -> IO ()
zipWith_ !n
| n >= len = return ()
| otherwise = do
x <- peekByteOff p1 n
y <- peekByteOff p2 n
pokeByteOff dest n (f x y)
zipWith_ (n+1)
len = min l m
{-# INLINE packZipWith #-} This one leads to better performance when not inlined (26.63 μs vs 37.06 μs) but worse performance when inlined (9.011 μs vs 5.940 μs). That is mysterious to me but I thought it better to leave it out of the issue. If someone wishes to investigate more then it might be interesting to do so. |
In #273 we optimized the inner loops of several functions by floating out a static argument.
findIndex
andfindIndexEnd
have a similar format as the optimized functions, so it would be good to check whether they can be improved in a similar way.findIndexOrEnd
is probably the most similar of the already optimized functions.bytestring/Data/ByteString.hs
Lines 1340 to 1366 in 8c631df
There might be even more functions that could be optimized in a similar way.
The text was updated successfully, but these errors were encountered: