-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes and locks under Windows MingW #12230
Comments
FWIW, I came across a related MingW bug report that ticks some of the right keywords. I have yet to understand the new runtime well enough to judge whether this may be the underlying cause or a wild goose chase... 😬 Consider yourself warned! |
I believe this is fixed by #12882. In more detail:
As the test uses a random combination of systhreads and domains performing both computation and allocation, (in retrospect) it makes sense that unsafe systhread yielding could be causing it. Let me know if you want me to perform more experiments in connection with this. |
This is good news, thanks for the investigation! |
Since 5.0 we have observed that a combination of
Domain
s andThread
s can cause either segfaults or dead/live-locks in the MingW Windows port. We have observed the issue when testing both backends (native code and bytecode) but it seems easier to trigger in bytecode mode. We suspect that both kinds of failures may be caused by the same underlying problem.The test itself generates a combination of
Domain
s andThread
s as a dependency tree, encoded as a record of arrays.For the generation-part, there's a QCheck dependency (for now).
To recreate:
dune
andqcheck-core
packagesdune build src/threadomain/threadomain.bc
while _build/default/src/threadomain/threadomain.bc -v -s 377546401; do :; done
The last line of the above, simply repeats a bytecode version of the test until failure:
A live(or dead)lock is observed by no progress happening (and no QCheck callbacks executed to update the test status), after 2secs or so:
For a while we have observed these timeouts and crashes occasionally on this test in our CI, but have struggled to cook up reproduction steps: ocaml-multicore/multicoretests#203
To get a sense of the behaviour here's a summary of 5 runs to get a sense of the behaviour:
Above I use the seed 377546401 which works on my machine/setup.
I initially found this particular seed by running the same loop with random seeds:
Eventually this crashed on the 22th iteration on
random seed: 377546401
which made me pass that with-s 377546401
.To recreate others may have more luck following the same process rather than simply using the same seed.
Credit to @shym for having written this nice torture instrument 😄
The text was updated successfully, but these errors were encountered: