-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ocaml compiler distribution: use parallel build for OCaml 4.07 and above #14257
ocaml compiler distribution: use parallel build for OCaml 4.07 and above #14257
Conversation
996a2ee
to
dc1917c
Compare
🌤️ opam-lint warnings 7dbaa58
☀️ Installability check (+0) |
dc1917c
to
edca9a9
Compare
As I've said before, I'm very much in favor. |
[make "world"] | ||
[make "world.opt"] | ||
[make "world.opt"] {os = "cygwin"} | ||
[make "-j%{jobs}%" "world.opt"] {os != "cygwin"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late to this - you can use filters on the terms in a command as well, so the line above could be deleted and this changed to:
[make "-j%{jobs}%" "world.opt"] {os != "cygwin"} | |
[make "-j%{jobs}%" {os != "cygwin"} "world.opt"] |
I'm not particularly sure which more clearly conveys that parallelism is disabled on Cygwin!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your proposal is clearer and more concise.
Is it definitely a good idea to enable this for the |
No strong opinion on Thanks for the local-condition suggestion, I'll amend the PR. |
Pinning is quite explicit - I agree that it’s fine there. I was thinking of possible users create their switches based on the trunk branch (if they exist - but I see this in other projects, and marvel somewhat...) |
edca9a9
to
91344af
Compare
I rebased this PR and used the I kept the option for |
How do we decide if we have consensus to merge? (I think it should be merged, FWIW.) |
+1 for merge |
And clearly @gasche wants to merge this or he wouldn't have proposed it, so that's three people. Any other votes? |
Thanks for the show of support! But note that this is something that has to be done carefully (this is why we dragged our feet for a long time). In particular, I would like to be sure to hear another round of feedback from @dra27 (playing the role of the opponent here, which improved the PR quite a bit) before you consider merging. |
Thinking more about the One last thought might be to put a build error message in the opam file to recommend retrying with |
Let's agree on a message formulation before I update the PR. Would the following work:
Do we agree with the idea of adding this message to all parallel-build-enabled compiler descriptions, not just the +trunk ones? |
That message seems fine. (I would have said "may be caused by a parallel build" and "to force a sequential build" (not "enforce") but it's understandable either way. |
I changed "enforce" into "force", thanks. "Build parallelism" sounds more complex but the idea is that it's really the fact of being parallel that causes a failure, not "a parallel build". |
As I said, either phrasing for either of them would be understood. Personally I'm pretty happy; on the question of adding this on all parallel builds, that seems fine, too. |
Note that it's
and for released branches, the message should be more severe for reporting a bug:
There is a problem, however, which is that the |
Just an update on this. The filters on the This can obviously (and should, given the nuisance that rebasing it is) be merged prior to opam 2.0.5 being released, although that's due in the next few weeks. |
Actually I never "rebase" this PR in the traditional sense, I update the script that is included at the end of the commit message and then I run it from a fresh branch off the current |
Hang on, I'm really not ok with this message formulation that the build 'might work'. opam isn't a package manager where builds work some of the time -- they should either be work every time for a given build environment and dependency graph, or fixed until they do work every time. In the case of OCaml itself, we compile it thousands of times a week in CI, and so we will find parallel build problems fairly quickly, and thus can either fix it (by reverting this parallel build option) or patching the offending rule. Exposing this workaround to users in the message for such a core package seems like extremely poor form to me. Do we or do we not believe that parallel builds work? If we're not sure, then we should leave a multicore machine running in a build loop of the compiler with different -j options for a few days and check it works reliably. Has anyone done that yet? If not, and noone else can do it, I'll do it when back -- but I'm travelling this week. |
@avsm - the most severe message is for trunk (i.e. developer builds) where such a guarantee can never be made (which was my original suggestion not to have it at all for trunk builds). This message is also only displayed if it goes wrong, being released in the belief that it should work all the time. Your bar for this is too high, given OCaml’s present build system - the overwhelming number of CI builds of OCaml are not run in parallel and, even on CI systems where they can be, they’re likely to be run with low parallelism. |
(I’m wondering if you misread it - the message is not formulated that it might work but that it may have failed because?) |
opam 2.0.5 was just released with the fix that makes OPAMJOBS=1 work as expected in this circumstance. I would like to converge on the messages to update the PR once again with something that people agree on. @avsm: with the wording proposed by @dra27 above, the message shown to non-trunk-build users is as follows:
Does that work with you? Would you like to suggest an alternative formulation, or do you strongly think that it is better not to have a message at all? |
(@gasche - I realise looking again that the message should also have the |
91344af
to
1599b27
Compare
In the old days, the OCaml compiler distribution did not support parallel build. This support has been contributed over time, and while the dependencies are still not 100% reliable (in particular, parallel rebuild from a partial build often fails with cmi inconsistencies), builds from a clean state are now very robust: they are routinely used by developers and tested daily on our CI machines, and we haven't seen a failure in a long while. Enabling parallel build improves build times for everyone, at the cost of a very low risk of breaking the build. Some context: - On my machine, a recent sequential build took 5m15s, and a parallel build from the same sources took 2m53s: we can save 2m20s for every new switch creation for users with similar hardware. When creating a switch with a small number of packages, the compiler build time may dominate the total installation time. - New switches are created much more often now that opam2 makes local switches easy. It is typical for users to create a switch for each project to isolate their dependencies, ending up with dozens of switches. The install time quickly builds up. If a user was to find a build failure due to parallel build, a workaround is to pass `-j 1` to opam to force sequential switch creation: opam switch create <switch-name> <compiler-version> -j 1 setting the environment variable OPAMJOBS=1 would also work. This commit was semi-automatically generated using the following script run from the `packages/` directory: ``` for f in ocaml-base-compiler/*4.07.*/opam ocaml-variants/*4.0[789].*/opam ocaml-variants/*4.10.*/opam; do echo $f sed 's/ \[make "world.opt"\]/ [make "-j%{jobs}%" {os != "cygwin"} "world.opt"]/' -i $f sed 's/ \[make "world"\]/ [make "-j%{jobs}%" {os != "cygwin"} "world"]/' -i $f if [[ $f =~ .*trunk.* ]] then cat <<EOF >> $f post-messages: [ "A failure in the middle of the build may be caused by build parallelism (enabled by default). See ocaml#14257 for more info." {failure & jobs > 1 & opam-version >= "2.0.5" & os != "cygwin"} "You can try installing again including --jobs=1 to force a sequential build instead." {failure & jobs > 1 & os != "cygwin" & opam-version >= "2.0.5"} ] EOF else cat <<EOF >> $f post-messages: [ "A failure in the middle of the build may be caused by build parallelism (enabled by default). Please file a bug report at https://github.com/ocaml/ocaml/issues" {failure & jobs > 1 & & os != "cygwin"} "You can try installing again including --jobs=1 to force a sequential build instead." {failure & jobs > 1 & os != "cygwin" & opam-version >= "2.0.5"} ] EOF fi done ```
I rebased the PR with the new version. I used the following variant on the proposed message, where only the workaround suggestion is conditioned on the opam version:
|
1599b27
to
7dbaa58
Compare
In the old days, the OCaml compiler distribution did not support parallel build. This support has been contributed over time, and while the dependencies are still not 100% reliable (in particular, parallel rebuild from a partial build often fails with cmi inconsistencies), builds from a clean state are now very robust: they are routinely used by developers and tested daily on our CI machines, and we haven't seen a failure in a long while. Enabling parallel build improves build times for everyone, at the cost of a very low risk of breaking the build. Some context: - On my machine, a recent sequential build took 5m15s, and a parallel build from the same sources took 2m53s: we can save 2m20s for every new switch creation for users with similar hardware. When creating a switch with a small number of packages, the compiler build time may dominate the total installation time. - New switches are created much more often now that opam2 makes local switches easy. It is typical for users to create a switch for each project to isolate their dependencies, ending up with dozens of switches. The install time quickly builds up. If a user was to find a build failure due to parallel build, a workaround is to pass `-j 1` to opam to force sequential switch creation: opam switch create <switch-name> <compiler-version> -j 1 setting the environment variable OPAMJOBS=1 would also work. This commit was semi-automatically generated using the following script run from the `packages/` directory: ``` for f in ocaml-base-compiler/*4.07.*/opam ocaml-variants/*4.0[789].*/opam ocaml-variants/*4.10.*/opam; do echo $f sed 's/ \[make "world.opt"\]/ [make "-j%{jobs}%" {os != "cygwin"} "world.opt"]/' -i $f sed 's/ \[make "world"\]/ [make "-j%{jobs}%" {os != "cygwin"} "world"]/' -i $f if [[ $f =~ .*trunk.* ]] then cat <<EOF >> $f post-messages: [ "A failure in the middle of the build may be caused by build parallelism (enabled by default). See ocaml#14257 for more info." {failure & jobs > 1 & os != "cygwin"} "You can try installing again including --jobs=1 to force a sequential build instead." {failure & jobs > 1 & os != "cygwin" & opam-version >= "2.0.5"} ] EOF else cat <<EOF >> $f post-messages: [ "A failure in the middle of the build may be caused by build parallelism (enabled by default). Please file a bug report at https://github.com/ocaml/ocaml/issues" {failure & jobs > 1 & os != "cygwin"} "You can try installing again including --jobs=1 to force a sequential build instead." {failure & jobs > 1 & os != "cygwin" & opam-version >= "2.0.5"} ] EOF fi done ```
Do everyone agree on this PR? |
I think there is general consensus. |
Thanks a lot! |
This being committed makes me happy! |
I think because this PR was so long-lived, |
Done, thanks! |
TL;DR: This is a follow-up on #14118 that enables parallel builds on 4.07 and more recent, except for Cygwin.
In the old days, the OCaml compiler distribution did not support
parallel build. This support has been contributed over time, and while
the dependencies are still not 100% reliable (in particular, parallel
rebuild from a partial build often fails with cmi inconsistencies),
builds from a clean state are now very robust: they are routinely used
by developers and tested daily on our CI machines, and we haven't seen
a failure in a long while.
Enabling parallel build improves build times for everyone, at the cost
of a very low risk of breaking the build. Some context:
On my machine, a recent sequential build took 5m15s, and a parallel
build from the same sources took 2m53s: we can save 2m20s for every
new switch creation for users with similar hardware. When creating
a switch with a small number of packages, the compiler build time
may dominate the total installation time.
New switches are created much more often now that opam2 makes local
switches easy. It is typical for users to create a switch for each
project to isolate their dependencies, ending up with dozens of
switches. The install time quickly builds up.
If a user was to find a build failure due to parallel build,
a workaround is to pass
-j 1
to opam to force sequential switchcreation:
setting the environment variable OPAMJOBS=1 would also work.
This commit was semi-automatically generated
using the following script run from the
packages/
directory: