Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize isSpace functions #315

Merged
merged 4 commits into from
Oct 29, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions Data/ByteString/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -671,16 +671,16 @@ c2w = fromIntegral . ord
{-# INLINE c2w #-}

-- | Selects words corresponding to white-space characters in the Latin-1 range
-- ordered by frequency.
isSpaceWord8 :: Word8 -> Bool
isSpaceWord8 w =
w == 0x20 ||
w == 0x0A || -- LF, \n
w == 0x09 || -- HT, \t
w == 0x0C || -- FF, \f
w == 0x0D || -- CR, \r
w == 0x0B || -- VT, \v
w == 0xA0 -- spotted by QC..
isSpaceWord8 w8 =
-- Avoid the cost of narrowing arithmetic results to Word8,
-- the conversion from Word8 to Word is free.
let w :: Word
!w = fromIntegral w8
in w - 0x21 > 0x7e -- not [x21..0x9f]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition discriminates 127 out of 256 possibilities. Could you please benchmark w .&. 0x50 == 0, which discriminates 192 values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the & 0x50 test, I get noticeably better results, which outperform also the proposed PR on all the test cases.

benchmarked words/lots of words
time                 221.2 μs   (220.8 μs .. 222.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 220.9 μs   (220.9 μs .. 221.1 μs)
std dev              338.0 ns   (159.2 ns .. 669.3 ns)

benchmarked words/one huge word
time                 18.07 μs   (17.95 μs .. 18.23 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.96 μs   (17.94 μs .. 18.01 μs)
std dev              104.3 ns   (68.50 ns .. 195.1 ns)

benchmarked words/paragraphs
time                 6.243 μs   (6.225 μs .. 6.267 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.236 μs   (6.231 μs .. 6.240 μs)
std dev              15.92 ns   (11.74 ns .. 25.17 ns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I get even better results combining both filters:

isSpaceWord8 :: Word8 -> Bool
isSpaceWord8 = \ !w8 ->
    -- Avoid the cost of narrowing arithmetic results to Word8,
    -- the conversion from Word8 to Word is free.
    let w :: Word
        !w = fromIntegral w8
     in w .&. 0x50 == 0   -- Quick non-whitespace filter
        && w - 0x21 > 0x7e -- Second non-whitespace filter
        && ( w == 0x20    -- SP
          || w - 0x09 < 5 -- HT, NL, VT, FF, CR
          || w == 0xa0 )  -- NBSP
benchmarked words/lots of words
time                 216.5 μs   (215.6 μs .. 218.3 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 215.8 μs   (215.7 μs .. 216.3 μs)
std dev              720.2 ns   (154.3 ns .. 1.500 μs)

benchmarked words/one huge word
time                 16.77 μs   (16.61 μs .. 16.94 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 16.87 μs   (16.82 μs .. 16.91 μs)
std dev              138.9 ns   (97.75 ns .. 183.6 ns)

benchmarked words/paragraphs
time                 6.060 μs   (6.040 μs .. 6.083 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.048 μs   (6.044 μs .. 6.054 μs)
std dev              16.40 ns   (12.03 ns .. 23.45 ns)

Just reran the PR as-is as a sanity check that nothing changed in the mean-time and I get:

benchmarked words/lots of words
time                 228.5 μs   (227.6 μs .. 229.0 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 229.8 μs   (229.1 μs .. 232.5 μs)
std dev              3.588 μs   (216.7 ns .. 7.228 μs)

benchmarked words/one huge word
time                 22.45 μs   (22.37 μs .. 22.52 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 22.62 μs   (22.54 μs .. 22.80 μs)
std dev              374.5 ns   (181.8 ns .. 612.1 ns)

benchmarked words/paragraphs
time                 6.746 μs   (6.727 μs .. 6.760 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 6.776 μs   (6.760 μs .. 6.841 μs)
std dev              87.14 ns   (23.18 ns .. 190.9 ns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the two filters only 33 candidate characters are left:

λ> length $ filter (\w -> w .&. 0x50 == 0 && w - 0x21 > 0x7e) [0..255 :: Word8]
33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's more, other than whitespace, almost all are infrequent in text strings (rather than binary data):

[0,1,2,3,4,5,6,7,8 -- controls
,9,10,11,12,13 -- whitespace
,14,15 -- controls
,32,160 -- whitespace
,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175 -- ¡¢£¤¥¦§¨©ª«¬­®¯
]

Copy link
Contributor

@vdukhovni vdukhovni Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethercrow I see you've switched to the implementation I was testing, are you seeing similar benchmark improvements on your hardware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the version with two filters is the fastest for me as well.

&& ( w == 0x20 -- SP
|| w - 0x09 < 5 -- HT, NL, VT, FF, CR
|| w == 0xa0 ) -- NBSP
{-# INLINE isSpaceWord8 #-}

-- | Selects white-space characters in the Latin-1 range
Expand Down
7 changes: 7 additions & 0 deletions bench/BenchAll.hs
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@ byteStringChunksData = map (S.pack . replicate (4 ) . fromIntegral) intData
oldByteStringChunksData :: [OldS.ByteString]
oldByteStringChunksData = map (OldS.pack . replicate (4 ) . fromIntegral) intData

{-# NOINLINE loremIpsum #-}
loremIpsum :: S.ByteString
loremIpsum = S8.pack "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I would fold this across multiple lines. The version I used was:

paragraphs :: S.ByteString
paragraphs = S8.pack $
   "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\n\
   \incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis\n\
   \nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n\
   \Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu\n\
   \fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\n\
   \culpa qui officia deserunt mollit anim id est laborum.\n\
   \\n\
   \Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\n\
   \doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\n\
   \veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim\n\
   \ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia\n\
   \consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque\n\
   \porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur,\n\
   \adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et\n\
   \dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis\n\
   \nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid\n\
   \ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea\n\
   \voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem\n\
   \eum fugiat quo voluptas nulla pariatur?"

Though one paragraph is likely sufficient...


-- benchmark wrappers
---------------------
Expand Down Expand Up @@ -397,6 +400,10 @@ main = do
]
]
, bgroup "sort" $ map (\s -> bench (S8.unpack s) $ nf S.sort s) sortInputs
, bgroup "words"
[ bench "lorem ipsum" $ nf S8.words loremIpsum
, bench "one huge word" $ nf S8.words byteStringData
]
, bgroup "folds"
[ bgroup "foldl'" $ map (\s -> bench (show $ S.length s) $
nf (S.foldl' (\acc x -> acc + fromIntegral x) (0 :: Int)) s) foldInputs
Expand Down