-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize isSpace functions #315
Conversation
For the curious, here is how GCC compiles |
GCC implementation is equivalent to import Data.Bits
isSpace :: Word8 -> Bool
isSpace x = x - 0x09 <= 4 || x .&. 0x7f == 0x20 and one can use |
Thanks for the hint, I just tried this:
It slowed down If I add the 0x20 check to this version, it performs identically to what is already in PR. So it's probably better not to add all those magic hashes. |
Well, your benchmark is obviously biased in favor of the short-circuiting version, but I guess it is not different in this aspect from the real-world data. Recently @vdukhovni and I discussed a similar function in haskell-streaming/streaming-bytestring#31 (comment) |
Indeed the proposed function is almost identical to the one in streaming bytestring, except that I optimise for most characters being non-whitespace ASCII characters, by first ruling out most of those: -- Predicate to test whether a 'Word8' value is either ASCII whitespace,
-- or a unicode NBSP (U+00A0). Optimised for ASCII text, with spaces
-- as the most frequent whitespace characters.
w8IsSpace :: Word8 -> Bool
w8IsSpace = \ !w8 ->
-- Avoid the cost of narrowing arithmetic results to Word8,
-- the conversion from Word8 to Word is free.
let w :: Word
!w = fromIntegral w8
in w - 0x21 > 0x7e -- not [x21..0x9f]
&& ( w == 0x20 -- SP
|| w - 0x09 < 5 -- HT, NL, VT, FF, CR
|| w == 0xa0 ) -- NBSP
{-# INLINE w8IsSpace #-} I am curious how the above compares with this PR on "real world" test data... [ EDIT: I'm not convinced that the This PR:
The alternative from streaming-bytestring:
] |
Changed the test case from BTW how do you run only a subset of benchmarks with |
You can pass in options via
Try passing in (vincenthz/hs-gauge#97 is related) |
8c6bdde
to
c4e44bd
Compare
Viktor's version is the winner on lorem ipsum on my machine as well. So let's adopt that. |
Data/ByteString/Internal.hs
Outdated
-- the conversion from Word8 to Word is free. | ||
let w :: Word | ||
!w = fromIntegral w8 | ||
in w - 0x21 > 0x7e -- not [x21..0x9f] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition discriminates 127 out of 256 possibilities. Could you please benchmark w .&. 0x50 == 0
, which discriminates 192 values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the & 0x50
test, I get noticeably better results, which outperform also the proposed PR on all the test cases.
benchmarked words/lots of words
time 221.2 μs (220.8 μs .. 222.0 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 220.9 μs (220.9 μs .. 221.1 μs)
std dev 338.0 ns (159.2 ns .. 669.3 ns)
benchmarked words/one huge word
time 18.07 μs (17.95 μs .. 18.23 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 17.96 μs (17.94 μs .. 18.01 μs)
std dev 104.3 ns (68.50 ns .. 195.1 ns)
benchmarked words/paragraphs
time 6.243 μs (6.225 μs .. 6.267 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.236 μs (6.231 μs .. 6.240 μs)
std dev 15.92 ns (11.74 ns .. 25.17 ns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I get even better results combining both filters:
isSpaceWord8 :: Word8 -> Bool
isSpaceWord8 = \ !w8 ->
-- Avoid the cost of narrowing arithmetic results to Word8,
-- the conversion from Word8 to Word is free.
let w :: Word
!w = fromIntegral w8
in w .&. 0x50 == 0 -- Quick non-whitespace filter
&& w - 0x21 > 0x7e -- Second non-whitespace filter
&& ( w == 0x20 -- SP
|| w - 0x09 < 5 -- HT, NL, VT, FF, CR
|| w == 0xa0 ) -- NBSP
benchmarked words/lots of words
time 216.5 μs (215.6 μs .. 218.3 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 215.8 μs (215.7 μs .. 216.3 μs)
std dev 720.2 ns (154.3 ns .. 1.500 μs)
benchmarked words/one huge word
time 16.77 μs (16.61 μs .. 16.94 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 16.87 μs (16.82 μs .. 16.91 μs)
std dev 138.9 ns (97.75 ns .. 183.6 ns)
benchmarked words/paragraphs
time 6.060 μs (6.040 μs .. 6.083 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.048 μs (6.044 μs .. 6.054 μs)
std dev 16.40 ns (12.03 ns .. 23.45 ns)
Just reran the PR as-is as a sanity check that nothing changed in the mean-time and I get:
benchmarked words/lots of words
time 228.5 μs (227.6 μs .. 229.0 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 229.8 μs (229.1 μs .. 232.5 μs)
std dev 3.588 μs (216.7 ns .. 7.228 μs)
benchmarked words/one huge word
time 22.45 μs (22.37 μs .. 22.52 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 22.62 μs (22.54 μs .. 22.80 μs)
std dev 374.5 ns (181.8 ns .. 612.1 ns)
benchmarked words/paragraphs
time 6.746 μs (6.727 μs .. 6.760 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 6.776 μs (6.760 μs .. 6.841 μs)
std dev 87.14 ns (23.18 ns .. 190.9 ns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Between the two filters only 33 candidate characters are left:
λ> length $ filter (\w -> w .&. 0x50 == 0 && w - 0x21 > 0x7e) [0..255 :: Word8]
33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's more, other than whitespace, almost all are infrequent in text strings (rather than binary data):
[0,1,2,3,4,5,6,7,8 -- controls
,9,10,11,12,13 -- whitespace
,14,15 -- controls
,32,160 -- whitespace
,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175 -- ¡¢£¤¥¦§¨©ª«¬®¯
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethercrow I see you've switched to the implementation I was testing, are you seeing similar benchmark improvements on your hardware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the version with two filters is the fastest for me as well.
8e96da8
to
9127d6e
Compare
bench/BenchAll.hs
Outdated
@@ -101,6 +101,9 @@ byteStringChunksData = map (S.pack . replicate (4 ) . fromIntegral) intData | |||
oldByteStringChunksData :: [OldS.ByteString] | |||
oldByteStringChunksData = map (OldS.pack . replicate (4 ) . fromIntegral) intData | |||
|
|||
{-# NOINLINE loremIpsum #-} | |||
loremIpsum :: S.ByteString | |||
loremIpsum = S8.pack "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\nSed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I would fold this across multiple lines. The version I used was:
paragraphs :: S.ByteString
paragraphs = S8.pack $
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\n\
\incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis\n\
\nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n\
\Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu\n\
\fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in\n\
\culpa qui officia deserunt mollit anim id est laborum.\n\
\\n\
\Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium\n\
\doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore\n\
\veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim\n\
\ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia\n\
\consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque\n\
\porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur,\n\
\adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et\n\
\dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis\n\
\nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid\n\
\ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea\n\
\voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem\n\
\eum fugiat quo voluptas nulla pariatur?"
Though one paragraph is likely sufficient...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. The only nit (already noted) is folding the long string across multiple lines.
@ethercrow Tests for GHC < 7.10 are failing.
|
For compatibility with older GHC, you'll need to import import Data.Word (Word8) new: import Data.Word (Word8, Word) With that, the CI tests should pass. |
@ethercrow while we are waiting for @sjakobi to review, do you want me to label this as "hacktoberfest-accepted"? |
I started looking at bytestring when you posted a hacktoberfest call to arms, but that hacktoberfest context was not important to me, so don't worry about it. Thank you for caring though! |
This reverts commit 0055867. # Conflicts: # Data/ByteString/Internal.hs # bench/BenchAll.hs
This reverts commit cc2287b.
GHC unlike GCC does not optimize expressions like
x == 10 || x == 11 || x == 12 || x == 13
intox >= 10 && x <= 13
and further intox - 10 <= 3
so I did it here manually. Turns out this optimization was already applied years ago toData.Char.isSpace
: https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.Unicode.html#isSpaceI chose the
w == 0x20 || w == 0xA0 || w - 0x09 <= 4
order or terms instead ofw == 0x20 || w - 0x09 <= 4 || w == 0xA0
because it was faster on my machine. It also uses one fewer register according to the Compiler Explorer. It might be beneficial to adopt this order of terms inData.Char.isSpace
as well.I also added a benchmark for
words
that usesisSpaceWord8
a lot.Before:
After: