Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lwt_unix.fork not canceling pending promises #789

Closed
picojulien opened this issue May 29, 2020 · 4 comments
Closed

Lwt_unix.fork not canceling pending promises #789

picojulien opened this issue May 29, 2020 · 4 comments
Labels

Comments

@picojulien
Copy link

The documentation Lwt_unix.fork claims that

in the child process all pending jobs are canceled

However it seems it is not entirely the case, as demonstrated by the example bellow.
In this example I fork processes in a parallel iteration over a list. I expect that only the main process forks new jobs, however some forked processes do fork again while they obviously shouldn't.

With sequential iteration on the list, the problematic behavior does not happen.

Warning: Don't increase the size of the list, it rapidly leads to a fork bomb that exhausts available pids which crashes the system.

Tested with Lwt 4.2.1(ocaml 4.07.1) and 5.3.0 (ocaml 4.10.0).

open Lwt

let ( -- ) i j = List.init (j - i + 1) (fun x -> x + i)

let main_pid = Unix.getpid ()

let fork_one i =
  Lwt_io.flush_all () >>= fun () ->
  if Unix.getpid () <> main_pid then
    Format.printf
      "Error: Forking from non-main process %d (%d in the list) child of %s@."
      (Unix.getpid ())
      i
      (let ppid = Unix.getppid () in
       if ppid = main_pid then "main" else string_of_int ppid) ;
  match Lwt_unix.fork () with
  | 0 -> Lwt_unix.sleep 0.1 >>= fun _ -> exit 0
  | pid ->
      Format.printf "%d forked from %d@." pid (Unix.getpid ()) ;
      Lwt_unix.waitpid [] pid >>= fun _ -> Lwt.return_unit

let fork_many i = Lwt_list.map_p fork_one (0 -- i)

let _ =
  Format.printf "Main process pid is %d@." (Unix.getpid ()) ;
  Lwt_main.run (fork_many 3)
@aantron
Copy link
Collaborator

aantron commented Jun 2, 2020

However it seems it is not entirely the case,

The "jobs" that are canceled are the Unix system call jobs being run by Lwt_engine, i.e. the child process does not inherit or duplicate any outstanding system calls that are in progress in the parent, e.g. individual calls to Lwt_unix.stat that are being handled at the moment fork is called.

Iterations over lists, and any other operations on promises, lists, or other pure-OCaml data structures aren't individual system calls and aren't considered "jobs" in the sense meant here by the docs, and they are not canceled or in any way affected by fork.

In the program in the comment, the child inherits the list iteration in both the map_p and map_s cases, and would fork again. The only difference in the map_s case is that the child executes exit 0 in the course of resolving the promise that would advance the iteration, so it never reaches the pending call to the next fork. In the map_p case, of course, the children are able to proceed to the pending forks because the iteration starts them without waiting for each element's promise to resolve.

Does this help to explain the behavior?

The documentation of Lwt_unix.fork needs to be changed.

@aantron aantron added the docs label Jun 2, 2020
@picojulien
Copy link
Author

Thank you for your clear answer.
Is there a way to "unschedule" all pending promises ?
In my case, when I fork, I don't want that yielding points in the child process leads to scheduling of all promises that where pending in the main thread, but only the newly created ones.

In the legacy code I'm working on, we were using nested Lwt_main.run to do that, but as it is impossible in Lwt 5 we would need something to do that.

@aantron
Copy link
Collaborator

aantron commented Jun 8, 2020

There is no way to unschedule pending promises (in fact, Lwt is not aware of all such promises in general, as it has no central list of pending promises. They are just retained in memory by the various pointers between objects in the heap, and are found in the user's stack and data structures.

We could add an API to "cancel" (or just abandon) all pending promises that the results of Lwt_unix.yield (or Lwt.pause), because the resolving functions for those are tracked by Lwt in a list:

let yielded = Lwt_sequence.create ()

and we would just need to expose a function to clear that list when the user wants. That should achieve the same effect as a nested Lwt_main.run previously would have (note that also did not abandon all pending promises, but only pending promises resulting from yield). Note that this behavior in the past was accidental and not reliable, it was probably working only because the callback in which you eventually called the nested Lwt_main.run was triggered by this code in Lwt_main:

lwt/src/unix/lwt_main.ml

Lines 37 to 41 in 4340664

if not (Lwt_sequence.is_empty yielded) then begin
let tmp = Lwt_sequence.create () in
Lwt_sequence.transfer_r yielded tmp;
Lwt_sequence.iter_l (fun wakener -> Lwt.wakeup wakener ()) tmp
end;

transfer_r transfers the contents of yielded to tmp and clears yielded.

So if the same callback had been triggered from a different place (such as an ordinary callback triggered by I/O resolution), without this in the recent history of the stack, or if earlier processing inside this iteration over yielded promises had added new yielded promises, the nested Lwt_main.run would have inherited those.

Personally, when an Lwt user, I tend to avoid using Lwt after fork unless I first do exec. It seems difficult to be sure about the exact behavior of the child process, unless it is okay for it to behave the same as the parent. If using fork with Lwt anyway, I would tend to fork only during initialization, when the set of outstanding promises is very easy to be clear about, especially before calling into any libraries that might set up complex networks of promises in memory.

@picojulien
Copy link
Author

Indeed I was thinking about abandoning yielded promises.
However, I guess that it must be carefully documented to not give a false sense of security.

aantron added a commit that referenced this issue Aug 2, 2020
Part of #789.

[skip ci]
@aantron aantron closed this as completed in d6fcacf Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants