-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate reverse in preparation for a pure-Haskell implementation #536
Conversation
General comment: isn't
already possible. Isn't there a way to test whether the backend is JS? wasn't it if arch(js) || flag(pure-haskell) with flag existing mostly for testing purposes. |
Similarly, you probably want (AFAIK it's disabled for in-GHC-tree builds so you don't see it, but you can very easily see it outside - as it's enabled by default). |
Oh dear, so
Sad. Weren't their initial version added to GHC at about the same time? |
Scratch previous: there is |
Thanks for comments @phadej .
Yes indeed, I just hadn't done it yet for this PR. I'll also look at the simdutf stuff! |
|
Ok, on a hunch I looked in the readme and found the bit about cloning test data separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chreekat!
Yes that's exactly what I think we should do. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now almost completely ready.
- The pure-haskell flag documents itself as "not fully implemented"
- All flags are manual
- Nothing is ready for consumption, so no change to the changelog
- I know the JS team is committed to the JS backend, so I don't feel bad merging a partially-implemented feature even though I have little time to continue working on this (but I do still have time!)
There's just one thing left I don't know how to do! I don't want people to accidentally find Data.Text.Internal.Reverse.reverse on Hoogle and try to use it. They should continue using Data.Text.reverse. I don't know enough about Haddocks to do that. Does someone know how to do it?
Good work!
Perhaps |
Yes, thanks, that sounds familiar. I'll have a look. |
The |
All finished now. |
Will respond to other comments tomorrow. |
cabal-docspec doesn't appear to be available on my OS, so I'm just taking a stab at fixing it blindly. |
Just waiting on the decision to push the module into |
6da38e0
to
7266ad4
Compare
By the way, according to the microbenchmarks, the pure Haskell version is 2-100x slower than the C version:
|
(@chreekat you can use |
Nice, I realize I wasn't reading the numbers properly last night. The actual speeddowns look like:
|
Thanks @hsyl20 ! I'll have more time to look at this again next week. Will follow up then. |
7266ad4
to
739871f
Compare
I'm gonna mark this as draft again while I figure out the doctest and try out some of the speedups. |
Turns out moving the module into other-modules solved the docspec problem... nice. |
src/Data/Text/Internal/Reverse.hs
Outdated
let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in) | ||
in do | ||
A.copyI pointLength dest (p_out - pointLength + 1) ba p_in | ||
reversePoints ba (p_in + pointLength) dest (p_out - pointLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't benchmark it, but a general rule is that passing unchanged arguments through recursive calls is suboptimal for performance. A usual pattern is
reversePoints ba x dest y = go x y
where
go pIn pOut = .... go pInt' pOut'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, and all the microbenchmarks surprisingly fared a lot worse.
diff --git a/src/Data/Text/Internal/Reverse.hs b/src/Data/Text/Internal/Reverse.hs
index 153611f..3c0f95c 100644
--- a/src/Data/Text/Internal/Reverse.hs
+++ b/src/Data/Text/Internal/Reverse.hs
@@ -46,12 +46,13 @@ reversePoints
-> A.MArray s -- ^ Output array
-> Int -- ^ Output index
-> ST s ()
-reversePoints _ _ _ p_out | p_out < 0 = pure ()
-reversePoints ba p_in dest p_out =
- let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in)
- in do
- A.copyI pointLength dest (p_out - pointLength + 1) ba p_in
- reversePoints ba (p_in + pointLength) dest (p_out - pointLength)
+reversePoints src x dest y = go x y where
+ go _ pOut | pOut < 0 = pure ()
+ go pIn pOut =
+ let pLen = utf8LengthByLeader (A.unsafeIndex src pIn)
+ in do
+ A.copyI pLen dest (pOut - pLen + 1) src pIn
+ go (pIn + pLen) (pOut - pLen)
#else
reverse (Text (A.ByteArray ba) off len) = runST $ do
marr@(A.MutableByteArray mba) <- A.new len
Benchmark text-benchmarks: RUNNING...
All
Pure
tiny
reverse
Text: OK (0.28s)
63.8 ns ± 5.3 ns, 79% more than baseline
LazyText: OK (0.22s)
50.2 ns ± 2.8 ns, 10% more than baseline
ascii-small
reverse
Text: OK (0.19s)
732 μs ± 54 μs, 101% more than baseline
LazyText: OK (0.12s)
472 μs ± 43 μs, 29% more than baseline
ascii
reverse
Text: OK (2.02s)
637 ms ± 19 ms, 110% more than baseline
LazyText: OK (1.26s)
395 ms ± 5.5 ms, 32% more than baseline
english
reverse
Text: OK (0.33s)
41.9 ms ± 1.5 ms, 98% more than baseline
LazyText: OK (0.19s)
26.4 ms ± 1.5 ms, 32% more than baseline
russian
reverse
Text: OK (0.32s)
75.4 μs ± 6.3 μs, 83% more than baseline
LazyText: OK (0.21s)
51.4 μs ± 3.0 μs, 25% more than baseline
japanese
reverse
Text: OK (0.21s)
49.4 μs ± 2.8 μs, 103% more than baseline
LazyText: OK (0.29s)
34.6 μs ± 2.9 μs, 42% more than baseline
All 12 tests passed (6.42s)
Benchmark text-benchmarks: FINISH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly, I can reproduce your measurements, but, looking at Core, I'm fairly certain that this is some sort of benchmarking quirk. You can compare yourself by putting
{-# OPTIONS_GHC -ddump-to-file -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-}
- Let's go with
go pIn pOut
, Core output is much nicer. - You most certainly want
go !_ pOut
, not justgo _ pOut
, otherwise GHC believes that this function is lazy inpIn
and would not unbox it. - Common subexpression elimination pass is not powerful enough to share
pOut - pLen
between two expressions, please do it manually. - Another microoptimisation is to pass
reversePoints ba off dest len
, checkpOut <= 0
, and then you shave off+1
inA.copyI pLen dest (pOut - pLen) src pIn
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks reasonable, thanks for starting JS preparations.
I believe I've addressed all feedback now. |
src/Data/Text/Internal/Reverse.hs
Outdated
reversePoints src pIn dest pOut = | ||
let pointLength = utf8LengthByLeader (A.unsafeIndex src pIn) | ||
pOut' = pOut - pointLength + 1 | ||
-- Repeated unsafeWrite is faster than copyI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know which is faster with JS backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, unless we know JS backend behaviour, I'd strive for simpler Core, which should give GHC more opportunities to do backend-dependent optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check. TBH, in many cases I'd rather have simpler code even if it's a little slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsyl20 had suggested this change for backends other than JavaScript, by the way. tbh I'd rather use copyI
. If copyI
(a primop) is truly slower than repeated calls to unsafeWrite
(another primop) maybe it's the primops that need to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just used copyI because it seems like the right thing to do. I've squashed all commits as well.
6528755
to
c6128e4
Compare
@chreekat could you please fix https://github.com/haskell/text/actions/runs/6597744008/job/17925891514?pr=536#step:6:106? Other two CI failures are spurious, but this one is genuine. |
c6128e4
to
6204458
Compare
@Bodigrim whoops! Fixed now. |
Thanks! |
Thanks to everyone who participated and patiently waited for my progress! I'll start on the next C function soon. |
CC @hsyl20, @luite , and others whose handles I don't know (Jeff, Josh, ...)
To write programs with the JS backend, text's C bits need to be replaced, either with JavaScript bits or with pure Haskell. Sylvain suggested pure Haskell.
I'm opening this draft just to check if this is the direction you all expected a pure-Haskell implementation to go.
pure-haskell
PURE_HASKELL
that is defined ifpure-haskell
is true.pure-haskell
default to True when the target platform is JS.This method allows one to test the pure-Haskell implementation on the platform of their choice.
As a proof of concept, I have introduced a buggy version of
reverse
with a pure Haskell implementation. Run withcabal test -f pure-haskell
to see it "in action"!