-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dune corrupts ~10400 bytes of binary during artifact version substitution causing segfault #6668
Comments
(note that the binary installed by 'dune install' is not corrupted, just the executable in the build directory, but if there are any bugs with artifact substitution it'd be good to find out why the one in the build dir is getting corrupted) |
Normally the binaries in the |
We've been using dune-build-info since ~Oct 4 and looking through our internal testsuite it hasn't found any crashes, but it might be that it is just extremely rare to hit the codepath that happened to be corrupted during normal usage (all the unit tests run prior to artifact substitution usually, and our system-level tests which use installed binaries don't have that much coverage). I'll try to see whether I can modify the testcase to detect corruption without promotion, e.g. diff (the hex transformed) binary pre/post promotion and assert that the only change is really just around the version number placeholders |
Perhaps no need to test it, just does "just the executable in the build directory" was talking about binaries in |
The corruption is for the binary in the source directory (the promoted one). I have a suspicion this is due to the use of a single global 'buf', and multiple fibers corrupting/overwriting each-other's data, which is why with a single binary's 'dune install' it doesn't reproduce, but it does with promote until clean which promotes both the .bc and .exe in quick succession, quick enough that they're probably both active at the same time. |
Fixes ocaml#6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
Yes, moving the place that 'buf' is allocated seems to fix the testcase (lets see what the CI says about the rest of the tests): 48cc447 In general it is probably a bad idea to use global state if you can run multiple builds in parallel (I haven't looked in detail how fibers work in dune or when they're allowed to switch from one fiber to another, but it is probbaly safe to assume that it can switch to another fiber and global mutable state should be avoided). |
However if I'm looking at this correctly this means that Dune can corrupt any binaries that are being promoted/installed in parallel, not just those that use artifact substitution (since there is no way to turn subsitution off, even if unused it'll still copy everything through that shared global 'buf', which may or may not corrupt other binaries too?). |
The only 'scheduling point' I see is |
Fixes ocaml#6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
I've added a few more testcases, I can't make it corrupt the binaries on 'dune install -j16', but it reliably corrupts the binaries with 'promote until-clean'. I've also added a hex diff of the binaries to check that the diff is constrained to the expected small area, and in the buggy case it can be seen that the diff is much bigger (this should make the test more robust against future changes to the compiler/runtime that might cause the testcase to not segfault anymore) |
@edwintorok would you like to submit a PR to unshare the buffer? |
Yes, see the testcase PR , I've updated it to also contain the fix, not just the testcase. |
Fixes ocaml#6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
Fixes ocaml#6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
Fixes ocaml#6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
Fixes #6668 When performing multiple promotions in parallel (e.g. for a .bc and an .exe) then sharing a single global buf means that one of them could get corrupted if the buffer is overwritten by the other build. Do not use global variables: allocate a new per-file buffer in 'parse' instead! `cd test && dune build @unit-tests/artifact_substitution/runtest @blackbox-tests/test-cases/versioncorruption` passes now Signed-off-by: Edwin Török <[email protected]>
…, dune-rpc, dune-rpc-lwt, dune-private-libs, dune-glob, dune-configurator, dune-build-info, dune-action-plugin and chrome-trace (3.7.0~alpha1) CHANGES: - Fix parsing of OCaml errors that contain code excerpts with `...` in them. (ocaml/dune#7008, @rgrinberg) - Pre-emptively clear screen in watch mode (ocaml/dune#6987, fixes ocaml/dune#6884, @rgrinberg) - Fix cross compilation configuration when a context with targets is itself a host of another context (ocaml/dune#6958, fixes ocaml/dune#6843, @rgrinberg) - Fix parsing of the `<=` operator in *blang* expressions of `dune` files. Previously, the operator would be interpreted as `,`. (ocaml/dune#6928, @tatchi) - Fix `--trace-file` output. Dune now emits a single *complete* event for every executed process. Unterminated *async* events are no longer written. (ocaml/dune#6892, @rgrinberg) - Fix preprocessing with `staged_pps` (ocaml/dune#6748, fixes ocaml/dune#6644, @rgrinberg) - Use colored output with MDX when Dune colors are enabled. (ocaml/dune#6462, @MisterDA) - Make `dune describe workspace` return consistent dependencies for executables and for libraries. By default, compile-time dependencies towards PPX-rewriters are from now not taken into account (but runtime dependencies always are). Compile-time dependencies towards PPX-rewriters can be taken into account by providing the `--with-pps` flag. (ocaml/dune#6727, fixes ocaml/dune#6486, @esope) - Print missing newline after `$ dune exec`. (ocaml/dune#6821, fixes ocaml/dune#6700, @rgrinberg, @Alizter) - Fix binary corruption when installing or promoting in parallel (ocaml/dune#6669, fixes ocaml/dune#6668, @edwintorok) - Use colored output with GCC and Clang when compiling C stubs. The flag `-fdiagnostics-color=always` is added to the `:standard` set of flags. (ocaml/dune#4083, @MisterDA) - Fix the parsing of decimal and hexadecimal escape literals in `dune`, `dune-package`, and other dune s-expression based files (ocaml/dune#6710, @shym) - Report an error if `dune init ...` would create a "dune" file in a location which already contains a "dune" directory (ocaml/dune#6705, @gridbugs) - Fix the parsing of alerts. They will now show up in diagnostics correctly. (ocaml/dune#6678, @rginberg) - Fix the compilation of modules generated at link time when `implicit_transitive_deps` is enabled (ocaml/dune#6642, @rgrinberg) - Allow `$ dune utop` to load libraries defined in data only directories defined using `(subdir ..)` (ocaml/dune#6631, @rgrinberg) - Format dune files when they are named `dune-file`. This occurs when we enable the alternative file names project option. (ocaml/dune#6566, @rgrinberg) - Move `$ dune ocaml-merlin -dump-config=$dir` to `$ dune ocaml merlin dump-config $dir`. (ocaml/dune#6547, @rgrinberg) - Allow compilation rules to be impacted by `(env ..)` stanzas that modify the environment or set binaries. (ocaml/dune#6527, @rgrinberg) - Coq native mode is now automatically detected by Dune starting with Coq lang 0.7. `(mode native)` has been deprecated in favour of detection from the configuration of Coq. (ocaml/dune#6409, @Alizter) - Print "Leaving Directory" whenever "Entering Directory" is printed. (ocaml/dune#6149, fixes ocaml/dune#138, @cpitclaudel, @rgrinberg) - Allow `$ dune ocaml dump-dot-merlin` to run in watch mode. Also this command shouldn't print "Entering Directory" mesages. (ocaml/dune#6497, @rgrinberg) - `dune clean` should no longer fail under Windows due to the inability to remove the `.lock` file. Also, bring the implementation of the global lock under Windows closer to that of Unix. (ocaml/dune#6523, @nojb) - Remove "Entering Directory" messages for `$ dune install`. (ocaml/dune#6513, @rgrinberg) - Stop passing `-q` flag in `dune coq top`, which allows for `.coqrc` to be loaded. (ocaml/dune#6848, fixes ocaml/dune#6847, @Alizter) - Fix missing dependencies when detecting the kind of C compiler we're using (ocaml/dune#6610, fixes ocaml/dune#6415, @emillon) - Allow `(include_subdirs qualified)` for OCaml projects. (ocaml/dune#6594, fixes ocaml/dune#1084, @rgrinberg) - Accurately determine merlin configuration for all sources selected with `copy#` and `copy_files#`. The old heuristic of looking for a module in parent directories is removed (ocaml/dune#6594, @rgrinberg) - Fix inline tests with *js_of_ocaml* and whole program compilation mode enabled (ocaml/dune#6645, @hhugo) - Fix *js_of_ocaml* separate compilation rules when `--enable=effects` ,`--enable=use-js-string` or `--toplevel` is used. (ocaml/dune#6714, ocaml/dune#6828, ocaml/dune#6920, @hhugo) - Fix *js_of_ocaml* separate compilation in presence of linkall (ocaml/dune#6832, ocaml/dune#6916, @hhugo) - Remove spurious build dir created when running `dune init proj ...` (ocaml/dune#6707, fixes ocaml/dune#5429, @gridbugs) - Allow `--sandbox` to affect `ocamldep` invocations. Previously, they were wrongly marked as incompatible (ocaml/dune#6749, @rgrinberg) - Validate the command line arguments for `$ dune ocaml top-module`. This command requires one positional argument (ocaml/dune#6796, fixes ocaml/dune#6793, @rgrinberg) - Add a `dune cache size` command for displaying the size of the cache (ocaml/dune#6638, @Alizter) - Add 4.14.0 MSVC to CI (ocaml/dune#6917, @jonahbeckford) - Fix dependency cycle when installing files to the bin section with `glob_files` (ocaml/dune#6764, fixes ocaml/dune#6708, @gridbugs) - Handle "Too many links" errors when using Dune cache on Windows (ocaml/dune#6993, @nojb)
…, dune-rpc, dune-rpc-lwt, dune-private-libs, dune-glob, dune-configurator, dune-build-info, dune-action-plugin and chrome-trace (3.7.0) CHANGES: - Allow running `$ dune exec` in watch mode (with the `-w` flag). In watch mode, `$ dune exec` the executed binary whenever it is recompiled. (ocaml/dune#6966, @gridbugs) - `coqdep` is now called once per theory, instead of one time per Coq file. This should significantly speed up some builds, as `coqdep` startup time is often heavy (ocaml/dune#7048, @Alizter, @ejgallego) - Add `map_workspace_root` dune-project stanza to allow disabling of mapping of workspace root to `/workspace_root`. (ocaml/dune#6988, fixes ocaml/dune#6929, @richardlford) - Fix handling of support files generated by odoc. (ocaml/dune#6913, @jonludlam) - Fix parsing of OCaml errors that contain code excerpts with `...` in them. (ocaml/dune#7008, @rgrinberg) - Pre-emptively clear screen in watch mode (ocaml/dune#6987, fixes ocaml/dune#6884, @rgrinberg) - Fix cross compilation configuration when a context with targets is itself a host of another context (ocaml/dune#6958, fixes ocaml/dune#6843, @rgrinberg) - Fix parsing of the `<=` operator in *blang* expressions of `dune` files. Previously, the operator would be interpreted as `<`. (ocaml/dune#6928, @tatchi) - Fix `--trace-file` output. Dune now emits a single *complete* event for every executed process. Unterminated *async* events are no longer written. (ocaml/dune#6892, @rgrinberg) - Fix preprocessing with `staged_pps` (ocaml/dune#6748, fixes ocaml/dune#6644, @rgrinberg) - Use colored output with MDX when Dune colors are enabled. (ocaml/dune#6462, @MisterDA) - Make `dune describe workspace` return consistent dependencies for executables and for libraries. By default, compile-time dependencies towards PPX-rewriters are from now not taken into account (but runtime dependencies always are). Compile-time dependencies towards PPX-rewriters can be taken into account by providing the `--with-pps` flag. (ocaml/dune#6727, fixes ocaml/dune#6486, @esope) - Print missing newline after `$ dune exec`. (ocaml/dune#6821, fixes ocaml/dune#6700, @rgrinberg, @Alizter) - Fix binary corruption when installing or promoting in parallel (ocaml/dune#6669, fixes ocaml/dune#6668, @edwintorok) - Use colored output with GCC and Clang when compiling C stubs. The flag `-fdiagnostics-color=always` is added to the `:standard` set of flags. (ocaml/dune#4083, @MisterDA) - Fix the parsing of decimal and hexadecimal escape literals in `dune`, `dune-package`, and other dune s-expression based files (ocaml/dune#6710, @shym) - Report an error if `dune init ...` would create a "dune" file in a location which already contains a "dune" directory (ocaml/dune#6705, @gridbugs) - Fix the parsing of alerts. They will now show up in diagnostics correctly. (ocaml/dune#6678, @rginberg) - Fix the compilation of modules generated at link time when `implicit_transitive_deps` is enabled (ocaml/dune#6642, @rgrinberg) - Allow `$ dune utop` to load libraries defined in data only directories defined using `(subdir ..)` (ocaml/dune#6631, @rgrinberg) - Format dune files when they are named `dune-file`. This occurs when we enable the alternative file names project option. (ocaml/dune#6566, @rgrinberg) - Move `$ dune ocaml-merlin -dump-config=$dir` to `$ dune ocaml merlin dump-config $dir`. (ocaml/dune#6547, @rgrinberg) - Allow compilation rules to be impacted by `(env ..)` stanzas that modify the environment or set binaries. (ocaml/dune#6527, @rgrinberg) - Coq native mode is now automatically detected by Dune starting with Coq lang 0.7. `(mode native)` has been deprecated in favour of detection from the configuration of Coq. (ocaml/dune#6409, @Alizter) - Print "Leaving Directory" whenever "Entering Directory" is printed. (ocaml/dune#6419, fixes ocaml/dune#138, @cpitclaudel, @rgrinberg) - Allow `$ dune ocaml dump-dot-merlin` to run in watch mode. Also this command shouldn't print "Entering Directory" mesages. (ocaml/dune#6497, @rgrinberg) - `dune clean` should no longer fail under Windows due to the inability to remove the `.lock` file. Also, bring the implementation of the global lock under Windows closer to that of Unix. (ocaml/dune#6523, @nojb) - Remove "Entering Directory" messages for `$ dune install`. (ocaml/dune#6513, @rgrinberg) - Stop passing `-q` flag in `dune coq top`, which allows for `.coqrc` to be loaded. (ocaml/dune#6848, fixes ocaml/dune#6847, @Alizter) - Fix missing dependencies when detecting the kind of C compiler we're using (ocaml/dune#6610, fixes ocaml/dune#6415, @emillon) - Allow `(include_subdirs qualified)` for OCaml projects. (ocaml/dune#6594, fixes ocaml/dune#1084, @rgrinberg) - Accurately determine merlin configuration for all sources selected with `copy#` and `copy_files#`. The old heuristic of looking for a module in parent directories is removed (ocaml/dune#6594, @rgrinberg) - Fix inline tests with *js_of_ocaml* and whole program compilation mode enabled (ocaml/dune#6645, @hhugo) - Fix *js_of_ocaml* separate compilation rules when `--enable=effects` ,`--enable=use-js-string` or `--toplevel` is used. (ocaml/dune#6714, ocaml/dune#6828, ocaml/dune#6920, @hhugo) - Fix *js_of_ocaml* separate compilation in presence of linkall (ocaml/dune#6832, ocaml/dune#6916, @hhugo) - Remove spurious build dir created when running `dune init proj ...` (ocaml/dune#6707, fixes ocaml/dune#5429, @gridbugs) - Allow `--sandbox` to affect `ocamldep` invocations. Previously, they were wrongly marked as incompatible (ocaml/dune#6749, @rgrinberg) - Validate the command line arguments for `$ dune ocaml top-module`. This command requires one positional argument (ocaml/dune#6796, fixes ocaml/dune#6793, @rgrinberg) - Add a `dune cache size` command for displaying the size of the cache (ocaml/dune#6638, @Alizter) - Fix dependency cycle when installing files to the bin section with `glob_files` (ocaml/dune#6764, fixes ocaml/dune#6708, @gridbugs) - Handle "Too many links" errors when using Dune cache on Windows (ocaml/dune#6993, @nojb) - Allow the `cinaps` stanza to set a custom alias. By default, if the alias is not set then the cinaps actions will be attached to both `@cinaps` and `@runtest` (ocaml/dune#6991, @rgrinberg) - Add `(using ctypes 0.3)`. When used, paths in `(ctypes)` are interpreted relative to where the stanza is defined. (ocaml/dune#6883, fixes ocaml/dune#5325, @emillon) - Auto-detect `dune-workspace` files as `dune` files in Emacs (ocaml/dune#7061, @ilankri) - Add native support for polling mode on Windows (ocaml/dune#7010, @yams-yams, @nojb)
Expected Behavior
No segfault
Actual Behavior
Segfault, and a hex comparison of before/after artifact substitution shows it has changed ~10400 bytes after the end of the placeholder:
Full output is here:
good.txt
bad.txt
Note that this isn't always a segfault, sometimes you get a fatal out of memory error, or it just hangs forever
(depending how exactly the binary is corrupted), but on a given dune+compiler version it always seems to corrupt the binary the same way.
This is pure OCaml code, no C stubs or Obj.magic and if you look at where the crash actually it, it is a NULL pointer deref in the C runtime startup code, which obviously became corrupted:
Reproduction
Specifications
Version of
dune
(output ofdune --version
):multiple
3.2.0
3.6.1
Version of
ocaml
(output ofocamlc --version
)4.13.1
4.14.0
Operating system (distribution and version):
Ubuntu 22.04.1 LTS
Centos Stream release 8
Additional information
I have reproduced this on 2 different machines running different Linux distributions, and also tried various dune and OCaml versions on the same machine too (see above for versions), they all fail:
Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz
Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
dune
with the--verbose
flag):output.txt
The text was updated successfully, but these errors were encountered: