Disable implicit fusion rules #348

Bodigrim · 2021-06-11T17:41:44Z

A tale of two tails

What is the asymptotic complexity of Data.Text.tail? Internally Text is a byte array with offset and length counters, so taking its tail involves just bumping an offset by 1 or 2 code units, depending on UTF-16 encoding of the first character. This should be O(1) time/memory and is indeed so.

text/src/Data/Text.hs

Lines 532 to 536 in f1b4fc0

    
           tail :: Text -> Text 
        
           tail t@(Text arr off len) 
        
               | len <= 0  = emptyError "tail" 
        
               | otherwise = text arr (off+d) (len-d) 
        
               where d = iter_ t 0

But what is the complexity of Data.Text.tail . Data.Text.tail? O(1), isn't it? Lo and behold: it is O(n) time and O(n) memory! Here is why:

text/src/Data/Text.hs

Lines 539 to 544 in f1b4fc0

    
           {-# RULES 
        
           "TEXT tail -> fused" [~1] forall t. 
        
               tail t = unstream (S.tail (stream t)) 
        
           "TEXT tail -> unfused" [1] forall t. 
        
               unstream (S.tail (stream t)) = tail t 
        
            #-}

When we compile a single tail, these rules cancel each other and we end up with a usual definition, cited above, which has O(1) complexity. However, if there is tail . tail, things get interesting because of stream . unstream = id rewrite rule:

tail . tail
  = unstream . S.tail . stream . unstream . S.tail . stream
  = unstream . S.tail . S.tail . stream

Here S.tail comes from Data.Text.Internal.Fusion.Common and just skips the first element of a stream, so it is O(1). But nevertheless stream must process the materialized Text in whole, so O(n) time, and unstream materializes the whole stream into Text, so O(n) memory.

This fusion issue plagues not only tail . tail, but also any other combination of slicing functions. See Bottlenecked on Haskell's text library for the similar issue about take . drop. This particular one was patched in #301 with one more rewrite rule to prevent fusion of take . drop. But we cannot reasonably add rules for every possible combination of take, drop, tail, init, head, last!

How important is fusion?

Fusion frameworks shine for polymorphic containers such as vector, where long chains of transformations are ubiquitous. They are much less useful for monomorphic containers. That's basically why ByteString abandoned its fusion framework 13 years ago. Chains of operations like map of filter are even less likely to occur for Text than for ByteString: processing Unicode data in such way is almost certainly unsound.

Instead there are two operations, which dominate Text applications:

Slicing. Once we received a Text input, we need to parse it one way or another, and parsing means slicing. As explained above, the fusion framework in Text has all chances to make your performance unpredictably worse and no chances to make it better.
Concatenation. To produce a Text output we usually concatenate smaller pieces of Text. There is a robust, fast and predictable mechanism to do so via Builder. Relying on the fusion framework for concatenation is an antipattern, because it leads to slow compilation times and unpredictable runtime. Instead of neatly copying two bytearrays together (as Builder does), the fusion framework decodes both of them to streams and then encodes them back.

I'm not, however, arguing here for removing stream fusion framework entirely. Partly because there still could be legitimate cases to use it. Mostly because many functions in text do not have a materialized counterpart and are expressed in terms of stream transformations. Rewriting them to operate on byte arrays is a non-trivial enterprise.

However, I argue for scraping implicit fusion rules. This is exactly what gets us into trouble with tail above. If users have a particularly long sequence of transformations, they can reach out to stream API explicitly. But incurring this choice upon them implicitly, in an unmitigatable way is unacceptable.

This PR removes rewrite rules for implicit fusion. It also removes haddock annotations: some function may still happen to fuse by chance (if they have no materialized implementation at all), but this is no longer promised. There are no visible API changes, so in certain sense this is not even a major change. Anyways, we are heading to switching internal representation to UTF8, so time is right to make all major changes in one go.

We may also wish to trigger this rule when tail t is an argument of stream. But not when tail is at the beginning or at the end of pipeline: there is no benefit to fuse tail . map f or map f . tail. Potentially we could have a rule of form stream . tail . unstream = S.tail. But now you lost an ability to fuse stream . tail . tail . unstream to S.tail . S.tail.

The thing is that other functions have a choice between materialized and streaming versions as well. E. g., concatenation can just physically copy bytearrays, or convert them to streams and sequence results. isPrefixOf can either operate on arrays, or on streams. One can compute the length of Text from its materialized representation by counting continuation bytes, or convert it to a stream and measure its size, etc.

The choice of fusion strategy is a non-local optimization. I think that nowadays one could get a pretty good mileage from a compiler plugin for fusion, able to perform global analysis. But rewrite rules are unsuitable to do so.

I also wonder if you've tried to measure the impact of this?

One can run cabal bench, but honestly I do not think it is relevant to this PR. The goal here is to make performance predictable, not necessarily better for a given set of benchmarks. Any routine which happens to get slower can be restored to its former glory by using explicit stream API.

jberryman · 2021-06-12T01:46:09Z

The goal here is to make performance predictable, not necessarily better for a given set of benchmarks.

I think rewrite rules should probably never be the cause of worse than advertised performance, and if the only way to achieve that in this case is to remove fusion that makes sense to me.

It sounds like you're going further though and making an argument that applies just as well to rewrite rule -based optimizations generally, and even most of ghc's optimization passes, right?

Bodigrim · 2021-06-12T09:26:30Z

It sounds like you're going further though and making an argument that applies just as well to rewrite rule -based optimizations generally, and even most of ghc's optimization passes, right?

I would not go that far. I'm quite happy when things magically become better than advertised. But not when they become worse and I'm out of control.

Rewrite rules are a very brittle tool. They are pain to debug and pain to test. Rewrite rules with phase control are exponentially worse. There still could be rare cases when they bear their weight, but I don't think text is one of them for the reasons described above.

cartazio · 2021-06-14T13:33:29Z

Yeah, rewrite rules are pretty non-compositional to engineer and this is a great example. At some point I hope ghc or a successor to it works out having some sort of equality saturation based scheme because then a lot of the complexity properties should be better. But that’s not today. Predictable characteristics is step -1 of good quality enginering. And a library like text deserves to have predictable good performance.

parsonsmatt

I like this, though I do think we should note in the CHANGELOG that such a thing is happening in a relatively loud way. I would expect a lot of naive uses of Text to become slow from this change. We would ideally point folks to a means of using the streaming interface directly. Right now, a Ctrl+F on the Hackage docs for Data.Text only pull up the lazy text type as an option. But even the Data.Text.Lazy docs say that a lazy Text computation will allocate two lazy Text values. Searching for streaming again in this module only pulls up the fusion rules.

Searching streaming pulls up nothing on the main page. To use the fusion framework directly, you need to look at Data.Text.Internal.Fusion, which carries a Warning: , minimal documentation, and no examples.

Perhaps we could avoid the documentation/de-internalization effort on the Fusion modules by instead linking to one of the many streaming frameworks, like conduit, pipes, streaming, etc.

I'm approving the PR based on technical merits and overall impact. But I would like to see affected users given a clear warning and a well-documented and easily discoverable migration path to fixing these issues.

emilypi

I see no code problems @Bodigrim, but I think @parsonsmatt has a point in the sense of documenting these changes as we go along. Unless you want to do one bigger push at a later time, it'd probably be good to record as we go.

Bodigrim · 2021-06-16T20:23:14Z

@parsonsmatt thanks, I mentioned the change in haddocks. Sorry for the brevity, I'll elaborate on migration path later, after UTF8 switch. I hope it is acceptable for now.

IMHO streaming and fusion are not quite interchangeable: streaming is about not materializing data in full, while fusion is about not materializing data at all. So strictly speaking conduit and pipes are not a replacement for Data.Text.Internal.Fusion. I'm happy to raise an issue about renaming it to Data.Text.Fusion and adding proper examples/documentation.

emilypi

Thanks for updating the docs @Bodigrim - approved.

Boarders · 2021-06-16T21:41:13Z

Very good detective work @Bodigrim! Given the nature of how people use text I agree that automatic fusion is a bad idea and that we should disable it by default (it is unclear when it is a good idea though it seems to work reasonably for typical tasks with vector). I'm going to work on some docs for the stream fusion framework to enable opt-in fusion for users and ease migration paths for this change.

src/Data/Text.hs

Bodigrim · 2021-06-20T20:39:08Z

@parsonsmatt would you like me to improve anything else?

parsonsmatt

looks great, thanks!

jberryman · 2021-08-25T19:19:02Z

For the record it looks like this change nicely improves performance on the hasura benchmark suite. No insight as to whether that's from our own code, or deeper in a library somewhere. Also this is comparing 1.2.3.2 with 321c10a , so maybe there were other performance-related changes that account for this

Results copied below; let me know if you want any interpretation

The regression report below shows, for each benchmark, the percent change for different metrics, between the merge base (the changes from PR 2186) and this PR. For advice on interpreting benchmarks, please see benchmarks/README.md.

More significant regressions or improvements will be colored with #b31d28 or #22863a, respectively.

You can view graphs of the full reports here:

chinook.json: 📊 these changes... 📊 merge base... 📊 both compared
huge_schema.json: 📊 these changes... 📊 merge base... 📊 both compared
remote_schema.json: 📊 these changes... 📊 merge base... 📊 both compared

Click here for a detailed report.

#       ┌────────────────┐
#       │  chinook.json  │
#       └────────────────┘
#                                          
#       ᐉ  Memory Residency (RTS-reported):
#           live_bytes               :    -0.1   (BEFORE benchmarks ran; baseline for schema)
#           live_bytes               :    -0.1   (AFTER benchmarks ran)
*           mem_in_use               :     2.2   (BEFORE benchmarks ran; baseline for schema)
#           mem_in_use               :    -1.6   (AFTER benchmarks ran)
#                                          
#       ᐅ simple query low load:
*           bytes_alloc_per_req      :    -2.1
*           min                      :     2.3
#           p50                      :     2.0
#                                          
#       ᐅ simple query high load:
*           bytes_alloc_per_req      :    -2.1
#           min                      :    -1.1
#           p50                      :     0.1
#                                          
#       ᐅ complex query low load small result:
+           bytes_alloc_per_req      :   -14.3
+           min                      :    -6.3
+           p50                      :    -5.0
#                                          
#       ᐅ complex query high load small result:
+           bytes_alloc_per_req      :   -14.3
+           min                      :    -5.0
+           p50                      :    -6.5
#                                          
#       ᐅ complex query low load large result:
+           bytes_alloc_per_req      :   -13.9
+           min                      :    -4.7
+           p50                      :    -4.7
#                                          
#       ᐅ complex query high load large result:
+           bytes_alloc_per_req      :   -13.9
#           min                      :    -0.2
#           p50                      :    -0.8
#                                          
#       ┌────────────────────┐
#       │  huge_schema.json  │
#       └────────────────────┘
#                                          
#       ᐉ  Memory Residency (RTS-reported):
#           live_bytes               :    -0.0   (BEFORE benchmarks ran; baseline for schema)
#           live_bytes               :    -0.0   (AFTER benchmarks ran)
#           mem_in_use               :     0.1   (BEFORE benchmarks ran; baseline for schema)
*           mem_in_use               :    -3.2   (AFTER benchmarks ran)
#                                          
#       ᐅ small query low load:
*           bytes_alloc_per_req      :    -2.4
#           min                      :    -1.3
*           p50                      :     2.0
#                                          
#       ᐅ huge query low load:
+++         bytes_alloc_per_req      :   -31.0
++          min                      :   -17.8
++          p50                      :   -18.6
#

abooij · 2021-08-26T08:32:25Z

To add to @jberryman's results, the Hasura code has a lot of interleavings of <> and pack, along these lines:

pack "Hello there " <> userName <> pack ", the temperature today is " <> degreesCelcius <> pack " degrees celcius."

However, through the use of OverloadedStrings, these applications of pack become invisible.

Before this PR, append and pack would interact particularly badly. Here's a Haskell program that, under -O2, generates compilation products exceeding 2 gigabytes:

{-# LANGUAGE OverloadedStrings #-}
module Test where
import Data.Text
g :: Text -> String -> Text
g a b = a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " "
  <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a <> " " <> a
  <> " " <> a <> " " <> a <> " " <> a <> " " <> a

A cherry-pick of haskell#348 onto 1.2.3.2, with some quick conflict fixes

A cherry-pick of haskell#348 onto 1.2.5.0

Disable implicit fusion rules

a272e64

Bodigrim requested review from tathougies, Boarders, chessai, emilypi, Lysxia and parsonsmatt June 13, 2021 10:06

parsonsmatt requested changes Jun 14, 2021

View reviewed changes

emilypi previously approved these changes Jun 14, 2021

View reviewed changes

Briefly mention changes in changelog.md

409f62f

Bodigrim dismissed emilypi’s stale review via 409f62f June 16, 2021 20:15

emilypi approved these changes Jun 16, 2021

View reviewed changes

Boarders approved these changes Jun 16, 2021

View reviewed changes

Lysxia reviewed Jun 16, 2021

View reviewed changes

src/Data/Text.hs Show resolved Hide resolved

Lysxia approved these changes Jun 16, 2021

View reviewed changes

parsonsmatt approved these changes Jun 21, 2021

View reviewed changes

Bodigrim merged commit 321c10a into haskell:master Jun 21, 2021

Bodigrim deleted the no-implicit-fusion branch June 21, 2021 17:48

Bodigrim mentioned this pull request Aug 23, 2021

Switch internal representation to UTF8 #365

Merged

jkachmar mentioned this pull request Aug 26, 2021

[WIP] Reduce Core bloat #319

Closed

jberryman pushed a commit to hasura/text that referenced this pull request Sep 1, 2021

Disable implicit fusion rules (haskell#348)

874c316

A cherry-pick of haskell#348 onto 1.2.3.2, with some quick conflict fixes

phadej mentioned this pull request Sep 9, 2021

spanWith, takeWhileWith #362

Closed

awjchen pushed a commit to awjchen/text that referenced this pull request Jan 7, 2022

Disable implicit fusion rules (haskell#348)

b2803f2

awjchen pushed a commit to awjchen/text that referenced this pull request Jan 7, 2022

Disable implicit fusion rules (haskell#348)

a426eb1

Xitian9 mentioned this pull request Apr 26, 2022

Performance issue w/ new Text instances haskell-hvr/regex-tdfa#9

Open

nomeata mentioned this pull request May 18, 2022

Fusion not working as advertised #202

Closed

jberryman pushed a commit to hasura/text that referenced this pull request Jul 6, 2022

Disable implicit fusion rules (haskell#348)

ba0fd2b

A cherry-pick of haskell#348 onto 1.2.5.0

konsumlamm mentioned this pull request Mar 11, 2023

Is vector susceptible to the same fusion problems as text? haskell/vector#457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable implicit fusion rules #348

Disable implicit fusion rules #348

Bodigrim commented Jun 11, 2021

phadej commented Jun 11, 2021 •

edited

Loading

jberryman commented Jun 11, 2021 •

edited

Loading

Bodigrim commented Jun 11, 2021 •

edited

Loading

jberryman commented Jun 12, 2021

Bodigrim commented Jun 12, 2021

cartazio commented Jun 14, 2021

parsonsmatt left a comment

emilypi left a comment

Bodigrim commented Jun 16, 2021

emilypi left a comment

Boarders commented Jun 16, 2021

Bodigrim commented Jun 20, 2021

parsonsmatt left a comment

jberryman commented Aug 25, 2021

abooij commented Aug 26, 2021

	tail :: Text -> Text
	tail t@(Text arr off len)
	\| len <= 0 = emptyError "tail"
	\| otherwise = text arr (off+d) (len-d)
	where d = iter_ t 0

	{-# RULES
	"TEXT tail -> fused" [~1] forall t.
	tail t = unstream (S.tail (stream t))
	"TEXT tail -> unfused" [1] forall t.
	unstream (S.tail (stream t)) = tail t
	#-}

Disable implicit fusion rules #348

Disable implicit fusion rules #348

Conversation

Bodigrim commented Jun 11, 2021

A tale of two tails

How important is fusion?

Further reading

phadej commented Jun 11, 2021 • edited Loading

jberryman commented Jun 11, 2021 • edited Loading

Bodigrim commented Jun 11, 2021 • edited Loading

jberryman commented Jun 12, 2021

Bodigrim commented Jun 12, 2021

cartazio commented Jun 14, 2021

parsonsmatt left a comment

Choose a reason for hiding this comment

emilypi left a comment

Choose a reason for hiding this comment

Bodigrim commented Jun 16, 2021

emilypi left a comment

Choose a reason for hiding this comment

Boarders commented Jun 16, 2021

Bodigrim commented Jun 20, 2021

parsonsmatt left a comment

Choose a reason for hiding this comment

jberryman commented Aug 25, 2021

abooij commented Aug 26, 2021

phadej commented Jun 11, 2021 •

edited

Loading

jberryman commented Jun 11, 2021 •

edited

Loading

Bodigrim commented Jun 11, 2021 •

edited

Loading