Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

improve printing algorithm performance for many nested callbacks #576

Closed
wants to merge 4 commits into from

Conversation

leeor
Copy link

@leeor leeor commented Jun 23, 2022

Suggested implementation, with a caveat of me know being familiar with the code at all :-)

I have completely removed propagateForcedBreaks from the code as it looked to me to be doing nothing while forcing evaluation of everything. Did I miss something?

See #261.

time ./_build/install/default/bin/rescript -print res <test file>
./_build/install/default/bin/rescript -print res   0.65s user 0.03s system 33% cpu 2.009 total

refactor `CustomLayout t list` to `CustomLayout (t Lazy.t) list`
@leeor
Copy link
Author

leeor commented Jun 23, 2022

Removing the propagate... function did break a bunch of things :-)

@leeor
Copy link
Author

leeor commented Jun 23, 2022

The problem is that the way propagateForceBreaks is implemented forces evaluation of all elements of the document, the handling of the Concat case is a fold_left so no early exit condition. It is called from toString before the findGroupThatFits are evaluated, so it processes elements that will be later discarded.

Can the two cases that propagate... handles be handled in the recursive process function within toString?

@cristianoc
Copy link
Contributor

Prettier:

let foo = function (x, y, z) {
  foo(x, y, z);
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, z);
  });
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, function (x, y, z) {
      foo(x, y, z);
    });
  });
};

let foo = function (x, y, z) {
  foo(x, y, function (x, y, z) {
    foo(x, y, function (x, y, z) {
      foo(x, y, function (x, y, z) {
        foo(x, y, z);
      });
    });
  });
};

let foo1 = (x, y, z) => foo(x, y, z);

let foo2 = (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z));

let foo3 = (x, y, z) =>
  foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)));

let foo4 = (x, y, z) =>
  foo3(x, y, (x, y, z) =>
    foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))
  );

let foo5 = (x, y, z) =>
  foo4(x, y, (x, y, z) =>
    foo3(x, y, (x, y, z) =>
      foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))
    )
  );

let foo1 = (x, y, z) => {
  foo(x, y, z);
};

let foo2 = (x, y, z) => {
  foo1(x, y, (x, y, z) => {
    foo(x, y, z);
  });
};

let foo3 = (x, y, z) => {
  foo2(x, y, (x, y, z) => {
    foo1(x, y, (x, y, z) => {
      foo(x, y, z);
    });
  });
};

let foo4 = (x, y, z) => {
  foo3(x, y, (x, y, z) => {
    foo2(x, y, (x, y, z) => {
      foo1(x, y, (x, y, z) => {
        foo(x, y, z);
      });
    });
  });
};

Rescript:

let foo1 = (x, y, z) => foo(x, y, z)

let foo2 = (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))

let foo3 = (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z)))

let foo4 = (x, y, z) =>
  foo3(x, y, (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))))

let foo5 = (x, y, z) =>
  foo4(x, y, (x, y, z) =>
    foo3(x, y, (x, y, z) => foo2(x, y, (x, y, z) => foo1(x, y, (x, y, z) => foo(x, y, z))))
  )

let foo1 = (x, y, z) => {
  foo(x, y, z)
}

let foo2 = (x, y, z) => {
  foo1(x, y, (x, y, z) => {
    foo(x, y, z)
  })
}

let foo3 = (x, y, z) => {
  foo2(x, y, (x, y, z) => {
    foo1(x, y, (x, y, z) => {
      foo(x, y, z)
    })
  })
}

let foo4 = (x, y, z) => {
  foo3(x, y, (x, y, z) => {
    foo2(x, y, (x, y, z) => {
      foo1(x, y, (x, y, z) => {
        foo(x, y, z)
      })
    })
  })
}

@cristianoc
Copy link
Contributor

On the example

Prettier:

let foo = () =>
  bar((x) =>
    bar((x) =>
      bar((x) =>
        bar((x) =>
          bar((x) =>
            bar((x) =>
              bar((x) =>
                bar((x) => bar((x) => bar((x) => bar((x) => bar((x) => x)))))
              )
            )
          )
        )
      )
    )
  );

ReScript:

let foo = () =>
  bar(x =>
    bar(x =>
      bar(x =>
        bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => x)))))))))
      )
    )
  )

@cristianoc
Copy link
Contributor

So first observation is: prettier might be using 2 things not 3? Possible.

Second observation: the state exploration does not seem to need to be complete (i.e. 2 or 3, to the power of n). In the following sense:

  • If the nested one further down breaks, the current one breaks too.

So you don't get e.g. a stack of (breaks, doesn't, breaks, doesn't).
So really it should not be exponential explosion.

In that sense, the semantics of Doc.customLayout would change in that it would not try all the possibilities.

How does that sound?

@cristianoc
Copy link
Contributor

To be precise, customLayout([case1, case2]) at nested level 3 would not be:

[case1, case1, case1]
[case1, case1, case2]
[case1, case2, case1]
[case1, case2, case2]
[case2, case1, case1]
[case2, case1, case2]
[case2, case2, case1]
[case2, case2, case2]

but:

[case1, case1, case1]
[case2, case1, case1]
[case2, case2, case1]
[case2, case2, case2]

@leeor
Copy link
Author

leeor commented Jun 23, 2022

Sounds good to me. Is that a complex change?

@cristianoc
Copy link
Contributor

Sounds good to me. Is that a complex change?

Don't know, one needs to try.
I think the printing functions will likely need to pass an extra parameter to know whether it's an immediately-nested case.
But to test the idea, one can just set a global reference in the file. And add a coupe of lines that mutate it.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 23, 2022

I think likely only the willBreak and fits functions in res_doc. ml need to change.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 23, 2022

This is what I would try.

Permitted sequences have the form 1 1 1 1 1 2 2 2 1.

Suppose there are at most 2 cases. Keep the current level, a number between 1 and 2, as parameter to the functions.

If the current level is 1 then do the following:

  • Take case1 and try to format the rest at level 1. If it fits, choose case1.
  • If it does not fit, choose case2 and format the rest at level 2.

If the current level is 2 then:

  • Choose case2 and format the rest at level 2.

This generalises to level n by always trying case n and switch to case n+1 and level n+1 when it does not fit.

There can be different instances of customLayout with different cases, and different numbers of cases. One can treat them all the same. Start from level 1, and if the current level is, say 4, and the customLayout being considered only has 2 cases,
assume by convention that we're talking about case2.

@leeor
Copy link
Author

leeor commented Jun 26, 2022

How does your suggestion solve the performance hit incurred by scanning the whole tree in propagateForceBreaks immediately when toString starts?

Maybe toString can be refactored to incorporate that logic but only for the actual nodes that are chosen?

@cristianoc
Copy link
Contributor

How does your suggestion solve the performance hit incurred by scanning the whole tree in propagateForceBreaks immediately when toString starts?

Maybe toString can be refactored to incorporate that logic but only for the actual nodes that are chosen?

I had a vague memory that was not an issue. Just checked the code:

    | CustomLayout children ->
      (* When using CustomLayout, we don't want to propagate forced breaks
       * from the children up. By definition it picks the first layout that fits
       * otherwise it takes the last of the list.
       * However we do want to propagate forced breaks in the sublayouts. They
       * might need to be broken. We just don't propagate them any higher here *)
      let _ = walk (Concat children) in
      false

can't see any slowdown there.

@cristianoc
Copy link
Contributor

But I'm not sure I understood your question.
The main point of the suggestion is: replace exponential traversal with linear traversal.

@cristianoc
Copy link
Contributor

Take it with a grain of salt, done a quick back-of-the-envelope calculation. But the difference, given exactly the same grammar we have today, is from 3 ^ n to 3 * n.

@leeor
Copy link
Author

leeor commented Jun 27, 2022

Walking the children using Content means a fold_left that traverses all variations. Unless I am missing something here, again, not very familiar with this code base. From my previous tests with the lazy evaluated variations, skipping the call in toString make it almost instantaneous.

I now brought back the propagateForceBreaks function and the call to it from toString but without processing CustomLayout variations. Instead, I call it from the process helper in toString after a CustomeLayout variation is chosen. This greatly impacts the overall performance, with only a minimal impact on formatting.

Performance:

./_build/install/default/bin/rescript -print res   0.65s user 0.02s system 46% cpu 1.452 total

Compared to ~22 seconds before lazy evaluation and the propagateForceBreaks changes.

The formatting difference (hence failed tests):

diff --git a/tests/printer/other/expected/nesting.res.txt b/tests/printer/other/expected/nesting.res.txt
index 0cafca836cfc..b25b2d1724cd 100644
--- a/tests/printer/other/expected/nesting.res.txt
+++ b/tests/printer/other/expected/nesting.res.txt
@@ -1,7 +1,5 @@
-let unitsCommands = state.units->Js.Array2.mapi((
-  {unit: targetUnit, coordinates: targetCoordinates},
-  i,
-) => {
+let unitsCommands =
+  state.units->Js.Array2.mapi(({unit: targetUnit, coordinates: targetCoordinates}, i) => {
     // n^2
     let res = []
     state.units->Js.Array2.forEachi((
@@ -50,4 +48,4 @@ let unitsCommands = state.units->Js.Array2.mapi((
       }
     })
     res
-})
+  })

The diff ignores whitespace changes, the whole function body is a single indention deeper than before.

WDYT?

@leeor leeor changed the title improve printing algorithm to performance for many nested callbacks improve printing algorithm performance for many nested callbacks Jun 27, 2022
@cristianoc
Copy link
Contributor

Tried the latest version here.

I've made the example slightly bigger:

% dune exec -- time rescript tst.txt
let foo = () =>                 
  bar(x =>
    bar(x =>
      bar(x =>
        bar(x =>
          bar(x =>
            bar(x =>
              bar(x =>
                bar(x =>
                  bar(x =>
                    bar(x =>
                      bar(x =>
                        bar(x =>
                          bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => x)))))))
                        )
                      )
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
  )
        4.78 real         3.88 user         0.58 sys

@cristianoc
Copy link
Contributor

With just another 3 more nested callbacks, it uses up nearly all the memory on my machine and takes 57 seconds.
Can't optimise against exponentials, one needs to remove them.

@cristianoc
Copy link
Contributor

The first issue is that it generates an exponential number of docs, even before trying to format them.

@cristianoc
Copy link
Contributor

Turns out parsing also takes time on a bigger nested example. So that time should be factored out of the measurement.

@cristianoc
Copy link
Contributor

The 2 cases produced in printArgumentsWithCallbackInLastPosition alone creates an exponential number of docs.

@cristianoc
Copy link
Contributor

To summarise my understanding.
There's doc construction and doc printing.
This separation makes it difficult to express interesting logic when several layout cases need to be considered out of which one is chosen.
The only construct supporting logic is Customlayout, which only supports the logic "try this, else try that, else that".

The current PR puts the real logic behind lazy, so that printing can do more work. It has gotten rid of a 3^n explosion but there's still a 2^n explosion left at doc construction time.

Perhaps it's possible to patch this PR to remove the 2^n.
The alternative is to put more logic into the doc so that linearly size, non-lazy, docs can be contructed.

cristianoc added a commit that referenced this pull request Jun 28, 2022
@cristianoc
Copy link
Contributor

cristianoc commented Jun 28, 2022

One step forward, in the lazy approach, here: #590
The output is still not the desired one but it's fast.

Another approach consists of not generating at all an exponential number of docs (whether hidden behind lazy or not), but only generate those corresponding to the sequences: #576 (comment)

@leeor
Copy link
Author

leeor commented Jun 28, 2022

My thoughts:

  1. We're already seeing a formatting change, is that acceptable?
  2. I don't mind laziness, it is an easy solution to reduce upfront costs of any values we won't really need later on, so can provide value whatever the final process looks like.
  3. The current state of this PR, given we're ok with (1) above, is a significant improvement in runtime. Maybe that's good enough to merge and release so that some benefit will already be made available? (also see the next point)
  4. I admit I have not fully understood the change you're looking to do in fits and willBreak. Adding a level argument to willBreak and process all the way down to calculate in fits and selecting the second CustomLayout case whenever it's higher than 1, did not make any change to the runtime.

WDYT?

@leeor
Copy link
Author

leeor commented Jun 28, 2022

By output is still not the desired one, do you mean the final formatting being printed, or timing/code structure?

@cristianoc
Copy link
Contributor

It's fast but it does not print what we want (e.g. the example motivating this, but also changes to the tests, need to check if they are good or not).
So if it can be made to print what we need , then we can keep it.

More experimentation needed.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 28, 2022

This prints what we need:

    | CustomLayout children ->
      let _ = List.fold_left
        (fun forceBreak child ->
          let childForcesBreak = walk (Lazy.force child) in
          forceBreak || childForcesBreak)
        false children in
      false

but it is slow, as it undoes all what lazy does.

Btw this is the output:

let foo = () => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x => bar(x =>
                    bar(x => bar(x => bar(x => bar(x => bar(x => x)))))
                  )))))))))

It corresponds to what TS does. Except for all the closing parens in one place.

@cristianoc
Copy link
Contributor

Now #590 propagates the force break later on.
The output on the motivating example is the one we want.
There are still a few changes in the tests that need to be investigated.

@cristianoc
Copy link
Contributor

  1. I admit I have not fully understood the change you're looking to do in fits and willBreak. Adding a level argument to willBreak and process all the way down to calculate in fits and selecting the second CustomLayout case whenever it's higher than 1, did not make any change to the runtime.

The idea is to pass an extra parameter to all the functions that produce a doc. So all the printing functions.
And: don't print a call-back that breaks, inside another callback printed as non-breaking.
So you literally generate a smaller doc, with fewer cases.

@cristianoc
Copy link
Contributor

And here it is in the simplest version:
#591
just stop enumerating after a nesting threshold.

@cristianoc cristianoc closed this Jun 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants