refactor: replace deprecated runloop in fsevents #8304

kevinji · 2023-07-31T00:14:52Z

Replace the use of the deprecated FSEventStreamScheduleWithRunLoop() with FSEventStreamSetDispatchQueue().

Since a dispatch queue spawns new threads, we need to call caml_c_thread_register() and caml_c_thread_unregister() for these threads. This is done indirectly via a pthread_key_t so that the functions only need to be called more than once if the dispatch queue switches which thread is running the callback.

We replicate the blocking nature of the existing code using a mutex and condition variable.

See git/git@b022600 for more details about this approach.

Fixes #7352.

kevinji · 2023-07-31T02:14:32Z

This is my first time interfacing with C from OCaml and I'm getting the following error in CI:

+  ../watching/helpers.sh: line 3: 85234 Illegal instruction: 4  ( dune build "$@" --passive-watch-mode > .#dune-output 2>&1 )

Some guidance around how I can reproduce this error locally and how I could go about debugging this would be helpful!

Alizter · 2023-07-31T10:55:14Z

@kevinji That is a very strange error. What is your OS / arch? What kind of file system are you using?

kevinji · 2023-07-31T17:53:53Z

@Alizter This is the macOS CI build error. I’m having trouble replicating it on a local M1 machine so I’m also asking for some help setting up my dev environment—I’ve run make bootstrap, make dev, and then make test but I’m getting other errors that aren’t present in CI.

I’ve modified the C bindings to fsevents so I’m wondering if I need to pass a flag for pthread support, or if I messed something up when doing FFI between C and OCaml.

Alizter · 2023-07-31T18:02:00Z

I wonder if the C optimizer is doing some funky things. What C toolchain versions are you using?

kevinji · 2023-07-31T19:02:04Z

On my machine both gcc and clang refer to Apple's clang:

~
❯ gcc --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

~
❯ clang --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

kevinji · 2023-07-31T21:12:47Z

For reference, on my local machine, I get warnings that look like

File "test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t b/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t.corrected
index bfdebcb23..fcc113691 100644
--- a/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t
+++ b/_build/.sandbox/8a8c802dbf0064c575d0050dcd663d21/default/test/blackbox-tests/test-cases/virtual-libraries/incorrect-archive-7027.t.corrected
@@ -48,3 +48,10 @@ https://github.com/ocaml/dune/issues/7027
   > EOF

   $ dune exec ./foo.exe
+  /var/folders/5d/8_6n7qx13nnbr5nmmr096gqw0000gn/T/build_1858d6_dune/build_874b91_dune/camlobj3924a9.c:1525:14: warning: a function declaration without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
+  extern value caml_get_public_method();
+               ^
+  /var/folders/5d/8_6n7qx13nnbr5nmmr096gqw0000gn/T/build_1858d6_dune/build_874b91_dune/camlobj3924a9.c:1727:14: warning: a function declaration without a prototype is deprecated in all versions of C and is not supported in C2x [-Wdeprecated-non-prototype]
+  extern value caml_set_oo_id();
+               ^
+  2 warnings generated.

and failed expect tests like

File "test/expect-tests/persistent_tests.ml", line 1, characters 0-0:
diff --git a/_build/default/test/expect-tests/persistent_tests.ml b/_build/.sandbox/11d2f2745866f56ea24aef378f7a7bc4/default/test/expect-tests/persistent_tests.ml.corrected
index c87263c42..3042d3c43 100644
--- a/_build/default/test/expect-tests/persistent_tests.ml
+++ b/_build/.sandbox/11d2f2745866f56ea24aef378f7a7bc4/default/test/expect-tests/persistent_tests.ml.corrected
@@ -28,7 +28,7 @@ let%expect_test "persistent digests" =
     ---

     DIGEST-DB version 6
-    a4ae8e07cf52a9fb38c47c32b6d59fa6
+    a6df9e528c50debc9264b7a95489392e
     ---

     INSTALL-COOKIE version 1
@@ -40,7 +40,7 @@ let%expect_test "persistent digests" =
     ---

     COPY-LINE-DIRECTIVE-MAP version 1
-    7e311b06ebde9ff1708e4c3a1d3f5633
+    7dac5b11f6f654bb6f230392493b363f
     ---

     merlin-conf version 4
@@ -48,5 +48,5 @@ let%expect_test "persistent digests" =
     ---

     INCREMENTAL-DB version 5
-    fa67cc9b60c9f3a1b9b1ad93a56df691
+    1cc656a4502ef88e70adab1f3c9a868e
     --- |}]

as well as an error that seems relevant:

File "test/blackbox-tests/test-cases/watching/path-pwd.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t b/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t.corrected
index 2152be750..6c993817e 100644
--- a/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t
+++ b/_build/.sandbox/5edf3dab400d052bedb3a5f7236b8b4e/default/test/blackbox-tests/test-cases/watching/path-pwd.t.corrected
@@ -9,6 +9,7 @@ Reproduce #6907
   $ echo "(lang dune 2.0)" > dune-project

   $ start_dune
+  ./helpers.sh: line 3: 41395 Trace/BPT trap: 5       ( dune build "$@" --passive-watch-mode > .#dune-output 2>&1 )

   $ cat > x <<EOF
   > original-contents
@@ -29,3 +30,4 @@ Reproduce #6907
   $ stop_dune
   Success, waiting for filesystem changes...
   Success, waiting for filesystem changes...
+  exit 133

Alizter · 2023-07-31T21:49:32Z

Just to confirm, the test is OK before this PR?

anmonteiro · 2023-07-31T21:55:12Z

I ran ./dune.exe build @test/blackbox-tests/test-cases/watching/path-pwd and that only fails with this PR (fine in main)

EDIT: it only fails with ./dune.exe build @test/blackbox-tests/test-cases/watching/runtest, but the failure isn't present in main nonetheless.

anmonteiro · 2023-07-31T22:01:08Z

Actually it doesn't repro anymore after the latest force-push.

kevinji · 2023-07-31T22:47:40Z

I pushed a new version that fixes the C logic. The original code freed the dispatch queue fields in the wrong place (dune_fsevents_stop); the updated commit moves the changes to the end of dune_fsevents_dispatch_queue_run instead. I also removed some comments in fsevents.mli and some unused functions in fsevents_stubs.c to reflect what the current code actually does.

emillon · 2023-08-01T09:44:21Z

This area had some issues with memory safety before (#6151). We were not convinced of the soundness of the bindings so it's not completely unexpected that touching this will reveal issues.

kevinji · 2023-08-01T22:51:24Z

That's helpful to know. I pushed a new commit with the following small changes:

Dispatch queue cleanup (including for the mutex/condvar) is now moved to the finalize function.
The custom_operations types now are prefixed with build. to be Java-style per the OCaml docs and also to be consistent with the dispatch queue name.
dune_fsevents_dispatch_queue_current is renamed to dune_fsevents_dispatch_queue_create to better reflect what it's doing, as unlike the original runloop code, it's not getting the current thread's runloop, but rather creating a new dispatch queue.

src/fsevents/fsevents_stubs.c

kevinji · 2023-08-02T18:59:01Z

New updates:

The function name dune_fsevents_dispatch_queue_run is now dune_fsevents_dispatch_queue_wait_until_stopped to better reflect that it no longer "runs" anything directly.
pthread_cond_broadcast(&t->dq->dq_finished); is now also called in dune_fsevents_stop so dune_fsevents_dispatch_queue_wait_until_stopped can be stopped gracefully.

Alizter · 2023-08-05T10:56:10Z

When does the dispatch queue spawn new threads?

kevinji · 2023-08-05T20:32:31Z

According to the macOS documentation: "Work submitted to dispatch queues executes on a pool of threads managed by the system." I'm not sure at which specific point the system actually creates the threads though.

rgrinberg · 2023-08-06T21:59:16Z

src/fsevents/fsevents_stubs.c

  }
  CAMLdrop;
  caml_release_runtime_system();
+  caml_c_thread_unregister();


We are calling unregister in every single callback invocation. Are you sure this call is cheap enough for that?

Is there a good metric for what you mean here? From what I can tell caml_c_thread_register creates a thread info block and then attaches it to an existing linked list of thread info blocks, and caml_c_thread_unregister reverses those changes. However, I'm not sure how expensive that is.

I'm not so concerned about the linked list nor the block, but rather the lock that has to be acquired to do this. Given that this callback in our case isn't doing much work (just notifying which memoization nodes are out of date and hence invalidating the build can easily add up) acquiring this lock can easily affect the latency in watch mode.

I'd be more eager if this PR added some feature in exchange for the worse performance. Are there are any concrete benefits to using dispatch queues that I'm unaware of perhaps?

The original reason is that FSEventStreamScheduleWithRunLoop is deprecated with a suggestion to use FSEventStreamSetDispatchQueue instead. I think in practice, since we're using a serial queue, the background thread running the callback should remain the same, but I couldn't find an easy way to manually manage the threads used for the queue, which could allow us to only run caml_c_thread_register as needed.

Alternatively, we could use dispatch_get_main_queue, which uses a serial dispatch queue in the main thread. This would have roughly the same behavior as the original code that use CFRunLoopGetCurrent assuming there was originally only one thread, but I think we would probably need to run the main dispatch queue somehow with either dispatch_main or CFRunLoopRun.

As an alternative, I've introduced some thread-local storage that keeps track of whether caml_c_thread_register has been called already by the same thread, and only unregisters once the thread exits. Since a serial dispatch queue often uses the same thread, this should in principle reduce the number of register/unregister calls needed, but additional testing is probably needed.

Yeah, that's a bit better. If you could do some testing to confirm that we aren't spamming caml_c_thread_register/unregister then this should be good to go.

Another alternative that's a little more bulletproof but requires a bit more code is to have the code in the dispatch queue populate some sort of queue with events, and require the OCaml side to poll this queue for events in a loop. This will require some synchronization and the use of CV's to avoid busy polling, but it's probably the best we can do here.

cc @patricoferris who discussed this issue with me.

@rgrinberg Sorry just got some time to come back to look at this. I can confirm that running the test/blackbox-tests/test-cases/watching test that the same background thread is being reused and caml_c_thread_register is only being run once. Are there some longer workflows that I can test to make sure the behavior is also as expected?

You could try running longer builds on more serious projects. But your test seems enough to me.

If you have some spare cycles, I would suggest looking at the queue based workaround I suggest previous comments. I think that should guarantee this is well behaved.

Replace the use of the deprecated `FSEventStreamScheduleWithRunLoop()` with `FSEventStreamSetDispatchQueue()`. Since a dispatch queue spawns new threads, we need to call `caml_c_thread_register()` and `caml_c_thread_unregister()` for these threads. This is done indirectly via a `pthread_key_t` so that the functions only need to be called more than once if the dispatch queue switches which thread is running the callback. We replicate the blocking nature of the existing code using a mutex and condition variable. See git/git@b022600 for more details about this approach. Signed-off-by: Kevin Ji <[email protected]>

kevinji changed the title ~~feat: replace deprecate runloop in fsevents~~ feat: replace deprecated runloop in fsevents Jul 31, 2023

kevinji marked this pull request as ready for review July 31, 2023 23:08

rgrinberg requested a review from gridbugs August 1, 2023 09:24

rgrinberg reviewed Aug 2, 2023

View reviewed changes

src/fsevents/fsevents_stubs.c Show resolved Hide resolved

gridbugs reviewed Aug 2, 2023

View reviewed changes

src/fsevents/fsevents_stubs.c Outdated Show resolved Hide resolved

rgrinberg approved these changes Aug 3, 2023

View reviewed changes

rgrinberg reviewed Aug 6, 2023

View reviewed changes

rgrinberg changed the title ~~feat: replace deprecated runloop in fsevents~~ refactor: replace deprecated runloop in fsevents Oct 15, 2023

rgrinberg merged commit e6a5199 into ocaml:main Oct 15, 2023
20 checks passed

voodoos mentioned this pull request Oct 26, 2023

Watch mode crashes on macos on main branch #9004

Closed

kevinji deleted the replace-runloop-with-dispatch-queue branch December 5, 2023 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: replace deprecated runloop in fsevents #8304

refactor: replace deprecated runloop in fsevents #8304

kevinji commented Jul 31, 2023 •

edited

Loading

kevinji commented Jul 31, 2023

Alizter commented Jul 31, 2023

kevinji commented Jul 31, 2023

Alizter commented Jul 31, 2023

kevinji commented Jul 31, 2023

kevinji commented Jul 31, 2023 •

edited

Loading

Alizter commented Jul 31, 2023

anmonteiro commented Jul 31, 2023 •

edited

Loading

anmonteiro commented Jul 31, 2023

kevinji commented Jul 31, 2023 •

edited

Loading

emillon commented Aug 1, 2023

kevinji commented Aug 1, 2023 •

edited

Loading

kevinji commented Aug 2, 2023

Alizter commented Aug 5, 2023

kevinji commented Aug 5, 2023

rgrinberg Aug 6, 2023

kevinji Aug 8, 2023

rgrinberg Aug 8, 2023

kevinji Aug 8, 2023

kevinji Aug 17, 2023

rgrinberg Aug 28, 2023 •

edited

Loading

kevinji Oct 13, 2023

rgrinberg Oct 15, 2023

rgrinberg Oct 15, 2023

refactor: replace deprecated runloop in fsevents #8304

refactor: replace deprecated runloop in fsevents #8304

Conversation

kevinji commented Jul 31, 2023 • edited Loading

kevinji commented Jul 31, 2023

Alizter commented Jul 31, 2023

kevinji commented Jul 31, 2023

Alizter commented Jul 31, 2023

kevinji commented Jul 31, 2023

kevinji commented Jul 31, 2023 • edited Loading

Alizter commented Jul 31, 2023

anmonteiro commented Jul 31, 2023 • edited Loading

anmonteiro commented Jul 31, 2023

kevinji commented Jul 31, 2023 • edited Loading

emillon commented Aug 1, 2023

kevinji commented Aug 1, 2023 • edited Loading

kevinji commented Aug 2, 2023

Alizter commented Aug 5, 2023

kevinji commented Aug 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rgrinberg Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinji commented Jul 31, 2023 •

edited

Loading

kevinji commented Jul 31, 2023 •

edited

Loading

anmonteiro commented Jul 31, 2023 •

edited

Loading

kevinji commented Jul 31, 2023 •

edited

Loading

kevinji commented Aug 1, 2023 •

edited

Loading

rgrinberg Aug 28, 2023 •

edited

Loading