improve printing algorithm performance for many nested callbacks #576

leeor · 2022-06-23T11:06:22Z

Suggested implementation, with a caveat of me know being familiar with the code at all :-)

I have completely removed propagateForcedBreaks from the code as it looked to me to be doing nothing while forcing evaluation of everything. Did I miss something?

See #261.

time ./_build/install/default/bin/rescript -print res <test file>
./_build/install/default/bin/rescript -print res   0.65s user 0.03s system 33% cpu 2.009 total

refactor `CustomLayout t list` to `CustomLayout (t Lazy.t) list`

leeor · 2022-06-23T11:09:14Z

Removing the propagate... function did break a bunch of things :-)

leeor · 2022-06-23T12:25:26Z

The problem is that the way propagateForceBreaks is implemented forces evaluation of all elements of the document, the handling of the Concat case is a fold_left so no early exit condition. It is called from toString before the findGroupThatFits are evaluated, so it processes elements that will be later discarded.

Can the two cases that propagate... handles be handled in the recursive process function within toString?

cristianoc · 2022-06-23T13:39:19Z

Prettier:

let foo = function (x, y, z) {
  foo(x, y, z);
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, z);
  });
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, function (x, y, z) {
      foo(x, y, z);
    });
  });
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, function (x, y, z) {
      foo(x, y, function (x, y, z) {
        foo(x, y, z);
      });
    });
  });
};

let foo1 = (x, y, z) => foo(x, y, z);

let foo2 = (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z));

let foo3 = (x, y, z) =>
  foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)));

let foo4 = (x, y, z) =>
  foo3(x, y, (x, y, z) =>
    foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))
  );

let foo5 = (x, y, z) =>
  foo4(x, y, (x, y, z) =>
    foo3(x, y, (x, y, z) =>
      foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))
    )
  );

let foo1 = (x, y, z) => {
  foo(x, y, z);
};

let foo2 = (x, y, z) => {
  foo1(x, y, (x, y, z) => {
    foo(x, y, z);
  });
};

let foo3 = (x, y, z) => {
  foo2(x, y, (x, y, z) => {
    foo1(x, y, (x, y, z) => {
      foo(x, y, z);
    });
  });
};

let foo4 = (x, y, z) => {
  foo3(x, y, (x, y, z) => {
    foo2(x, y, (x, y, z) => {
      foo1(x, y, (x, y, z) => {
        foo(x, y, z);
      });
    });
  });
};

Rescript:

let foo1 = (x, y, z) => foo(x, y, z)

let foo2 = (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))

let foo3 = (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))

let foo4 = (x, y, z) =>
  foo3(x, y, (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))))

let foo5 = (x, y, z) =>
  foo4(x, y, (x, y, z) =>
    foo3(x, y, (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))))
  )

let foo1 = (x, y, z) => {
  foo(x, y, z)
}

let foo2 = (x, y, z) => {
  foo1(x, y, (x, y, z) => {
    foo(x, y, z)
  })
}

let foo3 = (x, y, z) => {
  foo2(x, y, (x, y, z) => {
    foo1(x, y, (x, y, z) => {
      foo(x, y, z)
    })
  })
}

let foo4 = (x, y, z) => {
  foo3(x, y, (x, y, z) => {
    foo2(x, y, (x, y, z) => {
      foo1(x, y, (x, y, z) => {
        foo(x, y, z)
      })
    })
  })
}

cristianoc · 2022-06-23T13:40:58Z

On the example

Prettier:

let foo = () =>
  bar((x) =>
    bar((x) =>
      bar((x) =>
        bar((x) =>
          bar((x) =>
            bar((x) =>
              bar((x) =>
                bar((x) => bar((x) => bar((x) => bar((x) => bar((x) => x)))))
              )
            )
          )
        )
      )
    )
  );

ReScript:

let foo = () =>
  bar(x =>
    bar(x =>
      bar(x =>
        bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => x)))))))))
      )
    )
  )

cristianoc · 2022-06-23T13:48:26Z

So first observation is: prettier might be using 2 things not 3? Possible.

Second observation: the state exploration does not seem to need to be complete (i.e. 2 or 3, to the power of n). In the following sense:

If the nested one further down breaks, the current one breaks too.

So you don't get e.g. a stack of (breaks, doesn't, breaks, doesn't).
So really it should not be exponential explosion.

In that sense, the semantics of Doc.customLayout would change in that it would not try all the possibilities.

How does that sound?

cristianoc · 2022-06-23T13:55:50Z

To be precise, customLayout([case1, case2]) at nested level 3 would not be:

[case1, case1, case1]
[case1, case1, case2]
[case1, case2, case1]
[case1, case2, case2]
[case2, case1, case1]
[case2, case1, case2]
[case2, case2, case1]
[case2, case2, case2]

but:

[case1, case1, case1]
[case2, case1, case1]
[case2, case2, case1]
[case2, case2, case2]

leeor · 2022-06-23T14:16:17Z

Sounds good to me. Is that a complex change?

cristianoc · 2022-06-23T14:41:32Z

Sounds good to me. Is that a complex change?

Don't know, one needs to try.
I think the printing functions will likely need to pass an extra parameter to know whether it's an immediately-nested case.
But to test the idea, one can just set a global reference in the file. And add a coupe of lines that mutate it.

cristianoc · 2022-06-23T14:47:35Z

I think likely only the willBreak and fits functions in res_doc. ml need to change.

cristianoc · 2022-06-23T17:49:59Z

This is what I would try.

Permitted sequences have the form 1 1 1 1 1 2 2 2 1.

Suppose there are at most 2 cases. Keep the current level, a number between 1 and 2, as parameter to the functions.

If the current level is 1 then do the following:

Take case1 and try to format the rest at level 1. If it fits, choose case1.
If it does not fit, choose case2 and format the rest at level 2.

If the current level is 2 then:

Choose case2 and format the rest at level 2.

This generalises to level n by always trying case n and switch to case n+1 and level n+1 when it does not fit.

There can be different instances of customLayout with different cases, and different numbers of cases. One can treat them all the same. Start from level 1, and if the current level is, say 4, and the customLayout being considered only has 2 cases,
assume by convention that we're talking about case2.

leeor · 2022-06-26T06:02:48Z

How does your suggestion solve the performance hit incurred by scanning the whole tree in propagateForceBreaks immediately when toString starts?

Maybe toString can be refactored to incorporate that logic but only for the actual nodes that are chosen?

cristianoc · 2022-06-26T06:11:11Z

How does your suggestion solve the performance hit incurred by scanning the whole tree in propagateForceBreaks immediately when toString starts?

Maybe toString can be refactored to incorporate that logic but only for the actual nodes that are chosen?

I had a vague memory that was not an issue. Just checked the code:

    | CustomLayout children ->
      (* When using CustomLayout, we don't want to propagate forced breaks
       * from the children up. By definition it picks the first layout that fits
       * otherwise it takes the last of the list.
       * However we do want to propagate forced breaks in the sublayouts. They
       * might need to be broken. We just don't propagate them any higher here *)
      let _ = walk (Concat children) in
      false

can't see any slowdown there.

cristianoc · 2022-06-26T06:12:26Z

But I'm not sure I understood your question.
The main point of the suggestion is: replace exponential traversal with linear traversal.

cristianoc · 2022-06-26T06:14:13Z

Take it with a grain of salt, done a quick back-of-the-envelope calculation. But the difference, given exactly the same grammar we have today, is from 3 ^ n to 3 * n.

…ustomLayout variations

leeor · 2022-06-27T20:59:18Z

Walking the children using Content means a fold_left that traverses all variations. Unless I am missing something here, again, not very familiar with this code base. From my previous tests with the lazy evaluated variations, skipping the call in toString make it almost instantaneous.

I now brought back the propagateForceBreaks function and the call to it from toString but without processing CustomLayout variations. Instead, I call it from the process helper in toString after a CustomeLayout variation is chosen. This greatly impacts the overall performance, with only a minimal impact on formatting.

Performance:

./_build/install/default/bin/rescript -print res   0.65s user 0.02s system 46% cpu 1.452 total

Compared to ~22 seconds before lazy evaluation and the propagateForceBreaks changes.

The formatting difference (hence failed tests):

diff --git a/tests/printer/other/expected/nesting.res.txt b/tests/printer/other/expected/nesting.res.txt
index 0cafca836cfc..b25b2d1724cd 100644
--- a/tests/printer/other/expected/nesting.res.txt
+++ b/tests/printer/other/expected/nesting.res.txt
@@ -1,7 +1,5 @@
-let unitsCommands = state.units->Js.Array2.mapi((
-  {unit: targetUnit, coordinates: targetCoordinates},
-  i,
-) => {
+let unitsCommands =
+  state.units->Js.Array2.mapi(({unit: targetUnit, coordinates: targetCoordinates}, i) => {
     // n^2
     let res = []
     state.units->Js.Array2.forEachi((
@@ -50,4 +48,4 @@ let unitsCommands = state.units->Js.Array2.mapi((
       }
     })
     res
-})
+  })

The diff ignores whitespace changes, the whole function body is a single indention deeper than before.

WDYT?

cristianoc · 2022-06-28T01:52:50Z

Tried the latest version here.

I've made the example slightly bigger:

% dune exec -- time rescript tst.txt
let foo = () =>                 
  bar(x =>
    bar(x =>
      bar(x =>
        bar(x =>
          bar(x =>
            bar(x =>
              bar(x =>
                bar(x =>
                  bar(x =>
                    bar(x =>
                      bar(x =>
                        bar(x =>
                          bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => x)))))))
                        )
                      )
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
  )
        4.78 real         3.88 user         0.58 sys

cristianoc · 2022-06-28T01:53:45Z

With just another 3 more nested callbacks, it uses up nearly all the memory on my machine and takes 57 seconds.
Can't optimise against exponentials, one needs to remove them.

cristianoc · 2022-06-28T03:17:44Z

The first issue is that it generates an exponential number of docs, even before trying to format them.

cristianoc · 2022-06-28T03:40:55Z

Turns out parsing also takes time on a bigger nested example. So that time should be factored out of the measurement.

cristianoc · 2022-06-28T05:26:34Z

The 2 cases produced in printArgumentsWithCallbackInLastPosition alone creates an exponential number of docs.

cristianoc · 2022-06-28T06:07:20Z

To summarise my understanding.
There's doc construction and doc printing.
This separation makes it difficult to express interesting logic when several layout cases need to be considered out of which one is chosen.
The only construct supporting logic is Customlayout, which only supports the logic "try this, else try that, else that".

The current PR puts the real logic behind lazy, so that printing can do more work. It has gotten rid of a 3^n explosion but there's still a 2^n explosion left at doc construction time.

Perhaps it's possible to patch this PR to remove the 2^n.
The alternative is to put more logic into the doc so that linearly size, non-lazy, docs can be contructed.

Follows #576

cristianoc · 2022-06-28T07:42:36Z

One step forward, in the lazy approach, here: #590
The output is still not the desired one but it's fast.

Another approach consists of not generating at all an exponential number of docs (whether hidden behind lazy or not), but only generate those corresponding to the sequences: #576 (comment)

leeor · 2022-06-28T07:45:30Z

My thoughts:

We're already seeing a formatting change, is that acceptable?
I don't mind laziness, it is an easy solution to reduce upfront costs of any values we won't really need later on, so can provide value whatever the final process looks like.
The current state of this PR, given we're ok with (1) above, is a significant improvement in runtime. Maybe that's good enough to merge and release so that some benefit will already be made available? (also see the next point)
I admit I have not fully understood the change you're looking to do in fits and willBreak. Adding a level argument to willBreak and process all the way down to calculate in fits and selecting the second CustomLayout case whenever it's higher than 1, did not make any change to the runtime.

WDYT?

leeor · 2022-06-28T07:49:53Z

By output is still not the desired one, do you mean the final formatting being printed, or timing/code structure?

cristianoc · 2022-06-28T07:50:47Z

It's fast but it does not print what we want (e.g. the example motivating this, but also changes to the tests, need to check if they are good or not).
So if it can be made to print what we need , then we can keep it.

More experimentation needed.

cristianoc · 2022-06-28T07:55:41Z

This prints what we need:

    | CustomLayout children ->
      let _ = List.fold_left
        (fun forceBreak child ->
          let childForcesBreak = walk (Lazy.force child) in
          forceBreak || childForcesBreak)
        false children in
      false

but it is slow, as it undoes all what lazy does.

Btw this is the output:

let foo = () => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x =>
                    bar(x => bar(x => bar(x => bar(x => bar(x => x)))))
                  )))))))))

It corresponds to what TS does. Except for all the closing parens in one place.

cristianoc · 2022-06-28T08:13:28Z

Now #590 propagates the force break later on.
The output on the motivating example is the one we want.
There are still a few changes in the tests that need to be investigated.

cristianoc · 2022-06-28T09:11:29Z

I admit I have not fully understood the change you're looking to do in fits and willBreak. Adding a level argument to willBreak and process all the way down to calculate in fits and selecting the second CustomLayout case whenever it's higher than 1, did not make any change to the runtime.

The idea is to pass an extra parameter to all the functions that produce a doc. So all the printing functions.
And: don't print a call-back that breaks, inside another callback printed as non-breaking.
So you literally generate a smaller doc, with fewer cases.

cristianoc · 2022-06-28T12:03:55Z

And here it is in the simplest version:
#591
just stop enumerating after a nesting threshold.

improve printing algorithm to performance for many nested callbacks

25aea45

refactor `CustomLayout t list` to `CustomLayout (t Lazy.t) list`

fix missing call to propagateForcedBreaks with lazy processing of C…

4dbfbab

…ustomLayout variations

leeor changed the title ~~improve printing algorithm to performance for many nested callbacks~~ improve printing algorithm performance for many nested callbacks Jun 27, 2022

cristianoc added a commit that referenced this pull request Jun 28, 2022

Callbacks without exponential explosion.

e28e31c

Follows #576

cristianoc mentioned this pull request Jun 28, 2022

Callbacks without exponential explosion. #590

Closed

fix formatting

f8a32e1

fix formatting

9513039

cristianoc closed this Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve printing algorithm performance for many nested callbacks #576

improve printing algorithm performance for many nested callbacks #576

leeor commented Jun 23, 2022

leeor commented Jun 23, 2022

leeor commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

leeor commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022 •

edited

Loading

cristianoc commented Jun 23, 2022 •

edited

Loading

leeor commented Jun 26, 2022

cristianoc commented Jun 26, 2022

cristianoc commented Jun 26, 2022

cristianoc commented Jun 26, 2022

leeor commented Jun 27, 2022 •

edited

Loading

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022 •

edited

Loading

leeor commented Jun 28, 2022

leeor commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022 •

edited

Loading

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

improve printing algorithm performance for many nested callbacks #576

improve printing algorithm performance for many nested callbacks #576

Conversation

leeor commented Jun 23, 2022

leeor commented Jun 23, 2022

leeor commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022

leeor commented Jun 23, 2022

cristianoc commented Jun 23, 2022

cristianoc commented Jun 23, 2022 • edited Loading

cristianoc commented Jun 23, 2022 • edited Loading

leeor commented Jun 26, 2022

cristianoc commented Jun 26, 2022

cristianoc commented Jun 26, 2022

cristianoc commented Jun 26, 2022

leeor commented Jun 27, 2022 • edited Loading

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022 • edited Loading

leeor commented Jun 28, 2022

leeor commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022 • edited Loading

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 28, 2022

cristianoc commented Jun 23, 2022 •

edited

Loading

cristianoc commented Jun 23, 2022 •

edited

Loading

leeor commented Jun 27, 2022 •

edited

Loading

cristianoc commented Jun 28, 2022 •

edited

Loading

cristianoc commented Jun 28, 2022 •

edited

Loading