Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

Formatter hangs on deeply nested function calls #261

Closed
nkrkv opened this issue Feb 12, 2021 · 14 comments
Closed

Formatter hangs on deeply nested function calls #261

nkrkv opened this issue Feb 12, 2021 · 14 comments

Comments

@nkrkv
Copy link
Contributor

nkrkv commented Feb 12, 2021

Steps to reproduce

Put the following in a file (say, Hang.res):

let foo = () =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x =>
  bar(x => x)
  ))))))))))))))

Run:

bsc -format Hang.res

Expected result

The source is indented properly and printed to stdout

Actual result

The process hangs, loads CPU to 100%, eats one GB of RAM after another, then after 1m30s or so crashes with out-of-memory.

The source (if put back into the context), compiles fine. Just formatting causes the problem.

Machine info

$ node --version                                               
v12.16.3

$ bsc -version        
BuckleScript 9.0.0 ( Using OCaml:4.06.1+BS )

$ uname -a            
Linux archxx 5.10.6-arch1-1 #1 SMP PREEMPT Sat, 09 Jan 2021 18:22:35 +0000 x86_64 GNU/Linux
@IwanKaramazow
Copy link
Contributor

Thanks for the report. I will investigate this over the weekend. This code is probably generated by a ppx?

@cristianoc
Copy link
Contributor

Part involved: https://github.com/rescript-lang/syntax/blob/master/src/res_printer.ml#L4059

@nkrkv
Copy link
Contributor Author

nkrkv commented Feb 12, 2021

This code is probably generated by a ppx?

Nope, it’s hand-written (simplified version) to convert 15-tuple of Result.t to Result.t of 15-tuple

@bobzhang
Copy link
Member

might be there is some exponential complexity iN the algorithm

@cristianoc
Copy link
Contributor

We already found the issue some time back.
It is an exponential blowup.
There are 3 possible formatting s for callbacks, and the first one that fits line width is chosen.

With n nested callbacks it's 3^n cases. The cases are generated statically, and the explosion happens at case generation time.

One needs to be a little creative here. By Eg prioritize one case when there are nested callbacks. Or use some other creative solution.

@leeor
Copy link

leeor commented Jun 21, 2022

Started running into this issue more now that I'm removing let-syntax from our code-base. It hits tests files pretty hard because they always involve at least two callbacks before any "real" code (describe + test boilerplate). Some of which I have to break into multiple files just because they can't be formatted.

Out of interest, how is this different from how prettier works for JS? From my experience, formatting there is super fast.

Also, I'm assuming that the algorithm formats depth-last, and if so, can it be simplified to make a decision based on the specific "call-site" instead of how it affects deeper callbacks? In other words, each callback will be formatted based on it's own location, ignoring any additional nested callbacks?

@cristianoc
Copy link
Contributor

Found back where the relevant code is:

  (* Sometimes one of the non-callback arguments will break.
   * There might be a single line comment in there, or a multiline string etc.
   * showDialog(
   *   `
   *   Do you really want to leave this workspace?
   *   Some more text with detailed explanations...
   *   `,
   *   ~danger=true,
   *   // comment   --> here a single line comment
   *   ~confirmText="Yes, I am sure!",
   *   ~onConfirm={() => ()},
   *  )
   * In this case, we always want the arguments broken over multiple lines,
   * like a normal function call.
   *)
  if Doc.willBreak printedArgs then breakAllArgs
  else Doc.customLayout [fitsOnOneLine; arugmentsFitOnOneLine; breakAllArgs]

This will print the first case out of the 3 cases [fitsOnOneLine; arugmentsFitOnOneLine; breakAllArgs] that fits.
This is exponential on directly nested callbacks.

This is where fit is computed:

      | CustomLayout docs ->
        let rec findGroupThatFits groups =
          match groups with
          | [] -> Nil
          | [lastGroup] -> lastGroup
          | doc :: docs ->
            if fits (width - pos) ((ind, Flat, doc) :: rest) then doc
            else findGroupThatFits docs
        in
        let doc = findGroupThatFits docs in
        process ~pos lineSuffices ((ind, Flat, doc) :: rest))

@cristianoc
Copy link
Contributor

cristianoc commented Jun 21, 2022

Happy to get PRs if someone is interested in contributing.

One obvious fix is to do Doc.customLayout [fitsOnOneLine] which will, very quickly, return the solution that probably does not look very good.

Another fix is to preserve the original heuristic as much as possible, but cut down space space exploration in the nested calls (say level 2 or 3).

Another fix is to come up with a different heuristic that does not require trying so many things. Guess that corresponds to what prettier does.

@leeor
Copy link

leeor commented Jun 23, 2022

Results of some tinkering with the formatter:

Vanilla:

time ./_build/install/default/bin/rescript -print res <test file>
./_build/install/default/bin/rescript -print res   23.13s user 0.75s system 95% cpu 24.929 total

Using only arugmentsFitOnOneLine:

index a2780c79e3dc..befe8d9870f8 100644
--- a/src/res_printer.ml
+++ b/src/res_printer.ml
@@ -4011,18 +4011,7 @@ and printArgumentsWithCallbackInLastPosition ~uncurried args cmtTbl =
       let argDoc = printArgument arg cmtTbl in
       loop (Doc.line :: Doc.comma :: argDoc :: acc) args
   in
-  let printedArgs, callback, callback2 = loop [] args in
-
-  (* Thing.map(foo, (arg1, arg2) => MyModuleBlah.toList(argument)) *)
-  let fitsOnOneLine =
-    Doc.concat
-      [
-        (if uncurried then Doc.text "(." else Doc.lparen);
-        printedArgs;
-        callback;
-        Doc.rparen;
-      ]
-  in
+  let printedArgs, _callback, callback2 = loop [] args in
 
   (* Thing.map(longArgumet, veryLooooongArgument, (arg1, arg2) =>
    *   MyModuleBlah.toList(argument)
@@ -4063,7 +4052,7 @@ and printArgumentsWithCallbackInLastPosition ~uncurried args cmtTbl =
    * like a normal function call.
    *)
   if Doc.willBreak printedArgs then breakAllArgs
-  else Doc.customLayout [fitsOnOneLine; arugmentsFitOnOneLine; breakAllArgs]
+  else Doc.customLayout [arugmentsFitOnOneLine]
 
 and printArguments ~uncurried
     (args : (Asttypes.arg_label * Parsetree.expression) list) cmtTbl =

Result:

time ./_build/install/default/bin/rescript -print res <test file>
./_build/install/default/bin/rescript -print res   17.10s user 0.11s system 95% cpu 17.952 total

Refactor breakAllArgs to only be computed when needed:

index a2780c79e3dc..4eaefbb9f698 100644
--- a/src/res_printer.ml
+++ b/src/res_printer.ml
@@ -4011,18 +4011,7 @@ and printArgumentsWithCallbackInLastPosition ~uncurried args cmtTbl =
       let argDoc = printArgument arg cmtTbl in
       loop (Doc.line :: Doc.comma :: argDoc :: acc) args
   in
-  let printedArgs, callback, callback2 = loop [] args in
-
-  (* Thing.map(foo, (arg1, arg2) => MyModuleBlah.toList(argument)) *)
-  let fitsOnOneLine =
-    Doc.concat
-      [
-        (if uncurried then Doc.text "(." else Doc.lparen);
-        printedArgs;
-        callback;
-        Doc.rparen;
-      ]
-  in
+  let printedArgs, _callback, callback2 = loop [] args in
 
   (* Thing.map(longArgumet, veryLooooongArgument, (arg1, arg2) =>
    *   MyModuleBlah.toList(argument)
@@ -4045,7 +4034,6 @@ and printArgumentsWithCallbackInLastPosition ~uncurried args cmtTbl =
    *   (param1, parm2) => doStuff(param1, parm2)
    * )
    *)
-  let breakAllArgs = printArguments ~uncurried args cmtTblCopy2 in
 
   (* Sometimes one of the non-callback arguments will break.
    * There might be a single line comment in there, or a multiline string etc.
@@ -4062,8 +4050,8 @@ and printArgumentsWithCallbackInLastPosition ~uncurried args cmtTbl =
    * In this case, we always want the arguments broken over multiple lines,
    * like a normal function call.
    *)
-  if Doc.willBreak printedArgs then breakAllArgs
-  else Doc.customLayout [fitsOnOneLine; arugmentsFitOnOneLine; breakAllArgs]
+  if Doc.willBreak printedArgs then printArguments ~uncurried args cmtTblCopy2
+  else Doc.customLayout [arugmentsFitOnOneLine]
 
 and printArguments ~uncurried
     (args : (Asttypes.arg_label * Parsetree.expression) list) cmtTbl =
time ./_build/install/default/bin/rescript -print res <test file>
./_build/install/default/bin/rescript -print res   0.53s user 0.01s system 42% cpu 1.262 total

Maybe what's needed here to to refactor the custom layout algorithm to be lazy?

@cristianoc
Copy link
Contributor

That would solve the perf issue.
Another question is about design: what should the output look like. Perhaps this is a good time to re-explore a bit, make sure all the choices are intentional.
E.g. what does prettier do? And if the behaviour is different, is the difference in language explaining it?
Based on the answer, there are follow-on questions.

@cristianoc
Copy link
Contributor

Sorry should have put these comment on the PR. Now deleted and move them there now.

@cristianoc
Copy link
Contributor

cristianoc commented Jun 28, 2022

Will go with this one and clean up the various PRs #591

A more ambitious change for the future involves making the doc language more expressive, so that the logic used to say "if X fits then do Y else do Z" can be expressed directly.

@leeor
Copy link

leeor commented Jun 28, 2022

@cristianoc any chance of seeing this in a new release soon? Would be very helpful :-)

@cristianoc
Copy link
Contributor

It's merged so, yes in the upcoming release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants