Isolate reverse in preparation for a pure-Haskell implementation #536

chreekat · 2023-08-24T15:09:46Z

CC @hsyl20, @luite , and others whose handles I don't know (Jeff, Josh, ...)

To write programs with the JS backend, text's C bits need to be replaced, either with JavaScript bits or with pure Haskell. Sylvain suggested pure Haskell.

I'm opening this draft just to check if this is the direction you all expected a pure-Haskell implementation to go.

Introduce a Cabal flag pure-haskell
Introduce a CPP guard macro PURE_HASKELL that is defined if pure-haskell is true.
Use the guard to decide whether to use the C bits or write a Haskell implementation.
(not implemented) make pure-haskell default to True when the target platform is JS.

This method allows one to test the pure-Haskell implementation on the platform of their choice.

As a proof of concept, I have introduced a buggy version of reverse with a pure Haskell implementation. Run with cabal test -f pure-haskell to see it "in action"!

cbits/measure_off.c

phadej · 2023-08-24T16:55:25Z

General comment:

isn't

(not implemented) make pure-haskell default to True when the target platform is JS.

already possible. Isn't there a way to test whether the backend is JS? wasn't it arch(js)? I'd expect to see

if arch(js) || flag(pure-haskell)

with flag existing mostly for testing purposes.

phadej · 2023-08-24T17:01:04Z

Similarly, you probably want arch(js) || flag(pure-haskell) to make simdutf flag a no-op. There are already conditionals for a lot of archs.

(AFAIK it's disabled for in-GHC-tree builds so you don't see it, but you can very easily see it outside - as it's enabled by default).

phadej · 2023-08-24T17:07:16Z

Oh dear, so cabal-install-3.10.1.0recognizes arch(wasm32) but doesn't recognize arch(js). EDIT: it recognizes arch(javascript).

Warning: Unknown architecture name 'js'

Sad. Weren't their initial version added to GHC at about the same time?

phadej · 2023-08-24T17:15:55Z

Scratch previous: there is arch(javascript) and it works in even older Cabals.

chreekat · 2023-08-25T05:47:11Z

Thanks for comments @phadej .

isn't

(not implemented) make pure-haskell default to True when the target platform is JS.
already possible.

Yes indeed, I just hadn't done it yet for this PR.

I'll also look at the simdutf stuff!

chreekat · 2023-08-27T12:21:22Z

@phadej is the new commit (fd2d9e1) ok?

chreekat · 2023-08-27T13:48:25Z

reverse is now implemented. Looking forward to seeing the benchmarks... hm! cabal bench fails because of missing files. The directory benchmarks/text-test-data is missing.

chreekat · 2023-08-27T13:49:17Z

Ok, on a hunch I looked in the readme and found the bit about cloning test data separately.

text.cabal

hsyl20

Hi @chreekat!
Yes that's exactly what I think we should do. Thanks!

text.cabal

chreekat · 2023-09-01T16:23:29Z

@phadej is fd03e49 what you had in mind when you said to make simdutf a no-op?

I have to say I don't like it because it silently ignores user input, but it's not a hill I care to die on.

chreekat

This is now almost completely ready.

The pure-haskell flag documents itself as "not fully implemented"
All flags are manual
Nothing is ready for consumption, so no change to the changelog
I know the JS team is committed to the JS backend, so I don't feel bad merging a partially-implemented feature even though I have little time to continue working on this (but I do still have time!)

There's just one thing left I don't know how to do! I don't want people to accidentally find Data.Text.Internal.Reverse.reverse on Hoogle and try to use it. They should continue using Data.Text.reverse. I don't know enough about Haddocks to do that. Does someone know how to do it?

cbits/measure_off.c

hsyl20 · 2023-09-08T09:53:11Z

Good work!

There's just one thing left I don't know how to do! I don't want people to accidentally find Data.Text.Internal.Reverse.reverse on Hoogle and try to use it. They should continue using Data.Text.reverse. I don't know enough about Haddocks to do that. Does someone know how to do it?

Perhaps {-# OPTIONS_HADDOCK hide, prune #-} (cf https://haskell-haddock.readthedocs.io/en/latest/markup.html#module-attributes) but I've never tried it.

chreekat · 2023-09-08T12:00:29Z

Yes, thanks, that sounds familiar. I'll have a look.

phadej · 2023-09-08T13:16:42Z

{-# OPTIONS_HADDOCK not-home #-} that's what other internal modules in text use.

The {-# OPTIONS_HADDOCK hide #-} is IMHO consumer-hostile. If you want to hide a module, hide it by having it in other-modules (which also have a benefit of not changing the external API).

chreekat · 2023-09-08T13:37:28Z

All finished now.

text.cabal

src/Data/Text/Internal/Reverse.hs

chreekat · 2023-09-09T15:23:12Z

Will respond to other comments tomorrow.

chreekat · 2023-09-10T18:43:30Z

cabal-docspec doesn't appear to be available on my OS, so I'm just taking a stab at fixing it blindly.

chreekat · 2023-09-10T19:11:41Z

Just waiting on the decision to push the module into other-modules and the result of the doctest now.

chreekat · 2023-09-10T19:26:17Z

By the way, according to the microbenchmarks, the pure Haskell version is 2-100x slower than the C version:

--- fast.result	2023-09-10 22:24:18.873802814 +0300
+++ pure.result	2023-09-10 22:24:49.305166781 +0300
@@ -1,4 +1,4 @@
-cabal -j bench --benchmark-options="-p /reverse/"
+cabal --builddir=dist-pure -fpure-haskell -j bench --benchmark-options="-p /reverse/"
 Build profile: -w ghc-9.2.8 -O1
 In order, the following will be built (use -v for more details):
  - text-2.0.2 (bench:text-benchmarks) (first run)
@@ -10,40 +10,40 @@
   Pure
     tiny
       reverse
-        Text:     OK (0.28s)
-          15.7 ns ± 890 ps
-        LazyText: OK (0.12s)
-          26.8 ns ± 2.6 ns
+        Text:     OK (0.30s)
+          34.3 ns ± 1.9 ns
+        LazyText: OK (0.36s)
+          43.5 ns ± 1.7 ns
     ascii-small
       reverse
-        Text:     OK (0.65s)
-          36.4 μs ± 3.2 μs
-        LazyText: OK (0.65s)
-          37.3 μs ± 3.4 μs
+        Text:     OK (0.18s)
+          349  μs ±  24 μs
+        LazyText: OK (0.17s)
+          349  μs ±  29 μs
     ascii
       reverse
-        Text:     OK (0.62s)
-          29.6 ms ± 732 μs
-        LazyText: OK (0.17s)
-          31.3 ms ± 3.0 ms
+        Text:     OK (0.96s)
+          289  ms ±  17 ms
+        LazyText: OK (0.93s)
+          285  ms ± 3.6 ms
     english
       reverse
-        Text:     OK (0.60s)
-          2.06 ms ± 133 μs
-        LazyText: OK (0.16s)
-          2.14 ms ± 204 μs
+        Text:     OK (0.17s)
+          20.2 ms ± 1.5 ms
+        LazyText: OK (0.14s)
+          19.1 ms ± 1.6 ms
     russian
       reverse
-        Text:     OK (0.36s)
-          5.03 μs ± 195 ns
-        LazyText: OK (0.35s)
-          5.04 μs ± 209 ns
+        Text:     OK (0.18s)
+          44.2 μs ± 2.8 μs
+        LazyText: OK (0.18s)
+          44.3 μs ± 2.7 μs
     japanese
       reverse
-        Text:     OK (0.84s)
-          5.90 μs ± 414 ns
-        LazyText: OK (1.66s)
-          6.06 μs ±  69 ns
+        Text:     OK (0.88s)
+          25.8 μs ± 1.7 μs
+        LazyText: OK (0.89s)
+          25.9 μs ± 1.6 μs
 
-All 12 tests passed (7.03s)
+All 12 tests passed (6.15s)
 Benchmark text-benchmarks: FINISH

Bodigrim · 2023-09-10T19:34:26Z

(@chreekat you can use --baseline to compare benchmarks, or compare_benches.sh)

chreekat · 2023-09-11T07:49:42Z

Nice, I realize I wasn't reading the numbers properly last night. The actual speeddowns look like:

cabal --builddir=dist-pure -fpure-haskell -j bench --benchmark-options="-p /reverse/ --baseline fast.baseline"
Build profile: -w ghc-9.2.8 -O1
In order, the following will be built (use -v for more details):
 - text-2.0.2 (bench:text-benchmarks) (first run)
Preprocessing benchmark 'text-benchmarks' for text-2.0.2..
Building benchmark 'text-benchmarks' for text-2.0.2..
Running 1 benchmarks...
Benchmark text-benchmarks: RUNNING...
All
  Pure
    tiny
      reverse
        Text:     OK (0.16s)
          33.8 ns ± 3.2 ns, 117% more than baseline
        LazyText: OK (0.20s)
          43.5 ns ± 3.6 ns, 62% more than baseline
    ascii-small
      reverse
        Text:     OK (0.18s)
          346  μs ±  22 μs, 860% more than baseline
        LazyText: OK (0.17s)
          345  μs ±  29 μs, 834% more than baseline
    ascii
      reverse
        Text:     OK (0.95s)
          288  ms ±  17 ms, 876% more than baseline
        LazyText: OK (0.93s)
          284  ms ± 2.7 ms, 802% more than baseline
    english
      reverse
        Text:     OK (0.17s)
          20.1 ms ± 1.5 ms, 878% more than baseline
        LazyText: OK (0.14s)
          18.9 ms ± 1.5 ms, 788% more than baseline
    russian
      reverse
        Text:     OK (0.18s)
          43.9 μs ± 2.9 μs, 767% more than baseline
        LazyText: OK (0.18s)
          43.8 μs ± 2.7 μs, 793% more than baseline
    japanese
      reverse
        Text:     OK (0.87s)
          25.6 μs ± 1.5 μs, 317% more than baseline
        LazyText: OK (0.87s)
          25.6 μs ± 1.5 μs, 314% more than baseline

src/Data/Text/Internal/Reverse.hs

chreekat · 2023-09-13T10:58:41Z

Thanks @hsyl20 ! I'll have more time to look at this again next week. Will follow up then.

chreekat · 2023-09-16T08:43:44Z

I'm gonna mark this as draft again while I figure out the doctest and try out some of the speedups.

chreekat · 2023-09-21T17:40:35Z

Turns out moving the module into other-modules solved the docspec problem... nice.

src/Data/Text/Internal/Reverse.hs

Bodigrim · 2023-09-24T12:07:33Z

src/Data/Text/Internal/Reverse.hs

+    let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in)
+    in do
+        A.copyI pointLength dest (p_out - pointLength + 1) ba p_in
+        reversePoints ba (p_in + pointLength) dest (p_out - pointLength)


I didn't benchmark it, but a general rule is that passing unchanged arguments through recursive calls is suboptimal for performance. A usual pattern is

reversePoints ba x dest y = go x y where go pIn pOut = .... go pInt' pOut'

I tried this, and all the microbenchmarks surprisingly fared a lot worse.

diff --git a/src/Data/Text/Internal/Reverse.hs b/src/Data/Text/Internal/Reverse.hs index 153611f..3c0f95c 100644 --- a/src/Data/Text/Internal/Reverse.hs +++ b/src/Data/Text/Internal/Reverse.hs @@ -46,12 +46,13 @@ reversePoints -> A.MArray s -- ^ Output array -> Int -- ^ Output index -> ST s () -reversePoints _ _ _ p_out | p_out < 0 = pure () -reversePoints ba p_in dest p_out = - let pointLength = utf8LengthByLeader (A.unsafeIndex ba p_in) - in do - A.copyI pointLength dest (p_out - pointLength + 1) ba p_in - reversePoints ba (p_in + pointLength) dest (p_out - pointLength) +reversePoints src x dest y = go x y where + go _ pOut | pOut < 0 = pure () + go pIn pOut = + let pLen = utf8LengthByLeader (A.unsafeIndex src pIn) + in do + A.copyI pLen dest (pOut - pLen + 1) src pIn + go (pIn + pLen) (pOut - pLen) #else reverse (Text (A.ByteArray ba) off len) = runST $ do marr@(A.MutableByteArray mba) <- A.new len

Benchmark text-benchmarks: RUNNING... All Pure tiny reverse Text: OK (0.28s) 63.8 ns ± 5.3 ns, 79% more than baseline LazyText: OK (0.22s) 50.2 ns ± 2.8 ns, 10% more than baseline ascii-small reverse Text: OK (0.19s) 732 μs ± 54 μs, 101% more than baseline LazyText: OK (0.12s) 472 μs ± 43 μs, 29% more than baseline ascii reverse Text: OK (2.02s) 637 ms ± 19 ms, 110% more than baseline LazyText: OK (1.26s) 395 ms ± 5.5 ms, 32% more than baseline english reverse Text: OK (0.33s) 41.9 ms ± 1.5 ms, 98% more than baseline LazyText: OK (0.19s) 26.4 ms ± 1.5 ms, 32% more than baseline russian reverse Text: OK (0.32s) 75.4 μs ± 6.3 μs, 83% more than baseline LazyText: OK (0.21s) 51.4 μs ± 3.0 μs, 25% more than baseline japanese reverse Text: OK (0.21s) 49.4 μs ± 2.8 μs, 103% more than baseline LazyText: OK (0.29s) 34.6 μs ± 2.9 μs, 42% more than baseline All 12 tests passed (6.42s) Benchmark text-benchmarks: FINISH

Surprisingly, I can reproduce your measurements, but, looking at Core, I'm fairly certain that this is some sort of benchmarking quirk. You can compare yourself by putting

{-# OPTIONS_GHC -ddump-to-file -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-}

Let's go with go pIn pOut, Core output is much nicer.

You most certainly want go !_ pOut, not just go _ pOut, otherwise GHC believes that this function is lazy in pIn and would not unbox it.

Common subexpression elimination pass is not powerful enough to share pOut - pLen between two expressions, please do it manually.

Another microoptimisation is to pass reversePoints ba off dest len, check pOut <= 0, and then you shave off +1 in A.copyI pLen dest (pOut - pLen) src pIn.

Bodigrim

Overall looks reasonable, thanks for starting JS preparations.

chreekat · 2023-09-30T19:17:27Z

I believe I've addressed all feedback now.

Bodigrim · 2023-09-30T19:25:46Z

src/Data/Text/Internal/Reverse.hs

+reversePoints src pIn dest pOut =
+    let pointLength = utf8LengthByLeader (A.unsafeIndex src pIn)
+        pOut' = pOut - pointLength + 1
+        -- Repeated unsafeWrite is faster than copyI


Do you know which is faster with JS backend?

I mean, unless we know JS backend behaviour, I'd strive for simpler Core, which should give GHC more opportunities to do backend-dependent optimizations.

I'll double check. TBH, in many cases I'd rather have simpler code even if it's a little slower.

@hsyl20 had suggested this change for backends other than JavaScript, by the way. tbh I'd rather use copyI. If copyI (a primop) is truly slower than repeated calls to unsafeWrite (another primop) maybe it's the primops that need to change.

I've just used copyI because it seems like the right thing to do. I've squashed all commits as well.

Bodigrim · 2023-10-21T16:54:31Z

@chreekat could you please fix https://github.com/haskell/text/actions/runs/6597744008/job/17925891514?pr=536#step:6:106? Other two CI failures are spurious, but this one is genuine.

chreekat · 2023-10-22T08:26:14Z

@chreekat could you please fix https://github.com/haskell/text/actions/runs/6597744008/job/17925891514?pr=536#step:6:106? Other two CI failures are spurious, but this one is genuine.

@Bodigrim whoops! Fixed now.

Bodigrim · 2023-10-28T04:29:49Z

Thanks!

chreekat · 2023-11-01T08:42:37Z

Thanks to everyone who participated and patiently waited for my progress!

I'll start on the next C function soon.

chreekat commented Aug 24, 2023

View reviewed changes

cbits/measure_off.c Show resolved Hide resolved

hsyl20 reviewed Aug 28, 2023

View reviewed changes

text.cabal Outdated Show resolved Hide resolved

hsyl20 reviewed Aug 28, 2023

View reviewed changes

text.cabal Show resolved Hide resolved

phadej suggested changes Aug 28, 2023

View reviewed changes

text.cabal Outdated Show resolved Hide resolved

phadej reviewed Aug 28, 2023

View reviewed changes

text.cabal Show resolved Hide resolved

chreekat commented Sep 7, 2023

View reviewed changes

cbits/measure_off.c Show resolved Hide resolved

chreekat changed the title ~~Draft: Isolate reverse in preparation for a pure-Haskell implementation~~ Isolate reverse in preparation for a pure-Haskell implementation Sep 8, 2023

Bodigrim reviewed Sep 8, 2023

View reviewed changes

text.cabal Outdated Show resolved Hide resolved

Bodigrim reviewed Sep 8, 2023

View reviewed changes

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

Bodigrim reviewed Sep 8, 2023

View reviewed changes

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

chreekat force-pushed the b/pure-haskell branch from 6da38e0 to 7266ad4 Compare September 10, 2023 19:19

hsyl20 reviewed Sep 11, 2023

View reviewed changes

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

chreekat force-pushed the b/pure-haskell branch from 7266ad4 to 739871f Compare September 13, 2023 17:14

chreekat changed the title ~~Isolate reverse in preparation for a pure-Haskell implementation~~ Draft: Isolate reverse in preparation for a pure-Haskell implementation Sep 16, 2023

Bodigrim reviewed Sep 24, 2023

View reviewed changes

src/Data/Text/Internal/Reverse.hs Outdated Show resolved Hide resolved

Bodigrim reviewed Sep 24, 2023

View reviewed changes

chreekat changed the title ~~Draft: Isolate reverse in preparation for a pure-Haskell implementation~~ Isolate reverse in preparation for a pure-Haskell implementation Sep 30, 2023

Bodigrim reviewed Sep 30, 2023

View reviewed changes

chreekat force-pushed the b/pure-haskell branch from 6528755 to c6128e4 Compare October 21, 2023 13:48

alaendle mentioned this pull request Oct 22, 2023

Draft: Fix build with JS backend ghcjs/ghcjs-base#134

Closed

Implement pure-Haskell reverse

6204458

chreekat force-pushed the b/pure-haskell branch from c6128e4 to 6204458 Compare October 22, 2023 08:23

Bodigrim mentioned this pull request Oct 22, 2023

Fix CI #541

Merged

Bodigrim approved these changes Oct 23, 2023

View reviewed changes

Bodigrim requested a review from Lysxia October 23, 2023 21:00

hsyl20 approved these changes Oct 24, 2023

View reviewed changes

Lysxia approved these changes Oct 25, 2023

View reviewed changes

Bodigrim merged commit 1ae86be into haskell:master Oct 28, 2023
25 of 27 checks passed

chreekat deleted the b/pure-haskell branch November 10, 2023 16:55

TeofilC mentioned this pull request Nov 18, 2023

Remove accidentally committed dump pragmas #546

Merged

Isolate reverse in preparation for a pure-Haskell implementation #536

Isolate reverse in preparation for a pure-Haskell implementation #536

Conversation

chreekat commented Aug 24, 2023

phadej commented Aug 24, 2023

phadej commented Aug 24, 2023 • edited Loading

phadej commented Aug 24, 2023 • edited Loading

phadej commented Aug 24, 2023

chreekat commented Aug 25, 2023

chreekat commented Aug 27, 2023

chreekat commented Aug 27, 2023

chreekat commented Aug 27, 2023

hsyl20 left a comment

Choose a reason for hiding this comment

chreekat commented Sep 1, 2023

chreekat left a comment

Choose a reason for hiding this comment

hsyl20 commented Sep 8, 2023

chreekat commented Sep 8, 2023

phadej commented Sep 8, 2023 • edited Loading

chreekat commented Sep 8, 2023

chreekat commented Sep 9, 2023

chreekat commented Sep 10, 2023

chreekat commented Sep 10, 2023

chreekat commented Sep 10, 2023 • edited Loading

Bodigrim commented Sep 10, 2023

chreekat commented Sep 11, 2023

chreekat commented Sep 13, 2023

chreekat commented Sep 16, 2023

chreekat commented Sep 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim left a comment

Choose a reason for hiding this comment

chreekat commented Sep 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bodigrim commented Oct 21, 2023

chreekat commented Oct 22, 2023

Bodigrim commented Oct 28, 2023

chreekat commented Nov 1, 2023

phadej commented Aug 24, 2023 •

edited

Loading

phadej commented Aug 24, 2023 •

edited

Loading

phadej commented Sep 8, 2023 •

edited

Loading

chreekat commented Sep 10, 2023 •

edited

Loading