-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bazel's own build is not reproducible on Mac #4770
Comments
Is this the same issue as #4769 ? |
No, it's different. The #4769 is about digestKey of This is about ijar (that's not even written in Java) having different bits across Macs configured the same way. I could only trigger this on Mac and not on Linux. |
Sorry about the silence! Do you also see different binaries if you |
I've been helping Grzegorz debug this a bit. We've seen different ijar SHA1s from |
I built ijar on my corp and personal laptop, and verified Apple's Clang is generating different object code and linking options on different machines at the same compiler major version. Some of this is probably due to the OS version skew, but it's evidence that the binaries are far more machine-dependent on MacOS than they are on Linux. Notably, the Clang on my personal laptop seems to be linking its outputs with Objective C runtimes.
|
On Grzegorz's machine (same compiler as my I looked a little bit at the CROSSTOOL generator but it doesn't seem to be doing anything obviously wrong. |
I think I've found the root cause: my work laptop doesn't have Xcode installed, which makes Bazel think it's not using Apple's toolchain. Some unknown set of MacOS-specific options are not being set in CROSSTOOL, but compilation seems to work fine for c/c++ stuff. Here's the output of
|
@jmillikin-stripe : thanks for your debugging efforts! |
Has this been fixed? The test works on my macbook with latest released Bazel. |
@buchgr Which test? I don't think there's been any test added to verify that presence of XCode doesn't affect ijar builds. |
I think I have mixed things up. I was under the impression that this made the |
I have verified that the
|
I'm likely the wrong assignee for this bug. Mac is neither my domain of expertise, nor do I have the capacity to work on this. |
Here's the log of the failure with #4945 in:
|
I can manually repro this on a CI machine, but not on my personal iMac. This is without any of our special flags, so without remote caching, etc. - interestingly, it's the same file with the same hashes in the diff, although I ran it on a different machine. Note that the CI machines have the Bazel with embedded JDK installed and on my personal machine I have the version without an embedded JDK. I'm not sure if that might play a role here. I copied the tools-ijar.jar from the out1 and out2 directories from the test_tmpdir and diffed their contents: philwo@philwo-macbookpro ~/ijar jar tvf run1_tools-ijar.jar > 1
philwo@philwo-macbookpro ~/ijar jar tvf run2_tools-ijar.jar > 2
philwo@philwo-macbookpro ~/ijar diff -u 1 2
--- 1 2018-03-30 23:37:17.000000000 +0200
+++ 2 2018-03-30 23:37:21.000000000 +0200
@@ -2326,8 +2326,9 @@
302 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/Pretty$UncheckedIOException.class
10460 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/Pretty.class
16253 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeCopier.class
+ 569 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeInfo$PosKind.class
489 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeInfo$TypeAnnotationFinder.class
- 6884 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeInfo.class
+ 7031 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeInfo.class
1936 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeMaker$AnnotationBuilder.class
26481 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeMaker.class
8291 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/javac/tree/TreeScanner.class
@@ -2889,7 +2890,7 @@
392 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/StratumLineInfo.class
284 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/StringReferenceImpl.class
1079 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/SunCommandLineLauncher.class
- 372 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/TargetVM$EventController.class
+ 356 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/TargetVM$EventController.class
578 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/TargetVM.class
278 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/ThreadAction.class
522 Fri Jan 01 00:00:00 CET 2010 com/sun/tools/jdi/ThreadGroupReferenceImpl$Cache.class |
Don't ask me how so many things can be wrong in a single test... Progress towards #4770. FYI @rupertks @buchgr @ulfjack ## Replace 25000 invocations of perl with a single "sha256sum" This speeds up the test by a factor 2x on my iMac (before: 1200s, now: 600s). On macOS, "shasum" is a Perl script. Instead of simply passing all input files to the thing at once, we were invoking it once per file. This means roughly 25,000 invocations of Perl per test run. And it's even worse - it wasn't just a call to that Perl script, it was wrapped in a "cat | shasum | cut" pipeline, resulting in silent data loss when you accidentally passed multiple input files to the thing, 75,000 processes being spawned just to compute hashes and losing the file name of what was actually hashed. WTF. Also, we were using SHA256 to essentially verify that two directory trees are equal. For this purpose, relying on SHA1 should be absolutely fine - and that is, provided by a good native implementation, four times faster than `shasum`. It saves another 10 seconds of the overall run. With this change, the test also prints the result of a failed determinism check in an easier to read format "filename hash" instead of "hash filename" and on top of that, it also prints the filenames in the diff on macOS, which was missing formerly. Without this, it was basically impossible to debug failures of this test on macOS, as you couldn't see *which files were different*. You had *one* job, bazel_determinism_test. Before: ``` -- Test log: ----------------------------------------------------------- --- /private/var/tmp/_bazel_buildkite/30004132848cb6cbb0d8bc124cd9712b/bazel-sandbox/8820973750646175047/execroot/io_bazel/_tmp/e503f3f3df14b71e247bc3d7d9bf3608/sum1 2018-03-28 18:00:43.000000000 +0000 +++ /private/var/tmp/_bazel_buildkite/30004132848cb6cbb0d8bc124cd9712b/bazel-sandbox/8820973750646175047/execroot/io_bazel/_tmp/e503f3f3df14b71e247bc3d7d9bf3608/sum2 2018-03-28 18:10:34.000000000 +0000 @@ -10417,0 +10418 @@ +ecd53ba69a8d479d3fa4234e959f869cd10f7ebc68860d2b7915879f8b8b2c54 @@ -10605 +10605,0 @@ -f1954b59039b74d0a0ee3b2bced748604b95b8455a5bf80489296bd81878a5c8 ------------------------------------------------------------------------ ``` Now (I artificially introduced non-hermeticism to show how a failure would look like): ``` -- Test log: ----------------------------------------------------------- --- /private/var/tmp/_bazel_philwo/7a01905b4627ca044e5e3f5ad5b14d26/bazel-sandbox/5464595340038418595/execroot/io_bazel/_tmp/e503f3f3df14b71e247bc3d7d9bf3608/sum1 2018-03-30 17:12:39.000000000 +0000 +++ /private/var/tmp/_bazel_philwo/7a01905b4627ca044e5e3f5ad5b14d26/bazel-sandbox/5464595340038418595/execroot/io_bazel/_tmp/e503f3f3df14b71e247bc3d7d9bf3608/sum2 2018-03-30 17:17:27.000000000 +0000 @@ -903 +903 @@ -bazel-bin/src/bazel 31d811338ca364f0631560dd4d29406dd6a778ce +bazel-bin/src/bazel 8f009173894730b00a1d1d6349af7d10f4d21cf3 @@ -5656 +5656 @@ -bazel-bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar f5ec8c4415ad8ecdc0385affc68f2dd4dbf241ef +bazel-bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar 9899ae35cf431087a34a830bfdaf19d99616689c @@ -8343 +8343 @@ -bazel-bin/src/main/java/com/google/devtools/build/lib/worker/_javac/worker/libworker_classes/com/google/devtools/build/lib/worker/WorkerFactory.class 780baa17c19ef99ef0b9291db1791ed8e0f1b231 +bazel-bin/src/main/java/com/google/devtools/build/lib/worker/_javac/worker/libworker_classes/com/google/devtools/build/lib/worker/WorkerFactory.class d45c14f09e73e7fcdf01f96aa32646c87b704bc2 @@ -8359 +8359 @@ -bazel-bin/src/main/java/com/google/devtools/build/lib/worker/libworker.jar 60e3afbfec17da7e44c1f0f61cf2a446196717be +bazel-bin/src/main/java/com/google/devtools/build/lib/worker/libworker.jar 70f557e87d1b32b2e46c79554fe6bf3b89aeaf6e @@ -11343 +11343 @@ -bazel-genfiles/src/install_base_key 3fad754e4ea19bd1120df5bf16e1f39372e6b9fe +bazel-genfiles/src/install_base_key 7d7e8b62493912c5ec153032e104640e3980e6b3 @@ -11376 +11376 @@ -bazel-genfiles/src/package.zip 1ce3431b021ca338806162eca72ff84118001df5 +bazel-genfiles/src/package.zip 65f4801d91bbe10cba0d2d4d55c7cf319cd6722d ------------------------------------------------------------------------ test_determinism FAILED: Non-deterministic outputs found! . ``` ## Remove obsolete check for BAZEL_TEST_XTRACE That string does not appear anywhere in our repo, except for these two lines in the test, so there's no point in checking for it. ## Remove obsolete check for Java 7 That was about time. ## Performance improvements and usability fixes - There's no need to use mktemp to create a unique directory under TEST_TMPDIR, as every test suite has its own TEST_TMPDIR. - There's no need to remove stuff, as this will just degrade performance and make debugging harder. The surrounding Bazel or system will clean up later. - There's no need to copy bazel-bin/src/bazel to ./bazel1 before calling it, as you can just call the built bazel from its original location. - There's no need to run "bazel clean" before the second "bazel build" invocation - it's better to just use two separate output_bases. This is faster and also makes debugging easier, as you can compare the two output_bases in case of a test failure. - There's no need to call "diff" twice - we can just save the output immediately in the `if` block. Closes #4945. PiperOrigin-RevId: 191118833
this is a bit of a tangent but i found bazel without embedded jdk to give non-reproducible results: #4769 |
@philwo it seems that the two bazel binaries are using a different javac? |
This class was added to the JDK by the fix for JDK-8180660, which was backported to JDK 8u. I don't know how the determinism test works, but it looks like you're seeing skew between two different JDKs.
|
Is this still problem? |
well, in 6a0a8de#diff-68be8e4b177e1489fffa0557873b6943 we enabled the determinism test again for Mac, so I assume it's fixed. If not, please reopen. |
Description of the problem / feature request:
We build bazel out of a pinned source code in
-dist.zip
downloaded from https://github.com/bazelbuild/bazel/releases/download.Bazel is built with the
./compile.sh
on each Mac laptop separately. It turns out that despite all Macs having the same configuration, bazel's build digests are different. E.g. the ijar on each machine has a different hash.Feature requests: what underlying problem are you trying to solve with this feature?
Share JDK-based (e.g. java rules) artifacts between Macs.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Build bazel out of
-dist.zip
on two Macs with the same configuration and in a sample bazel project run:What operating system are you running Bazel on?
Mac
What's the output of
bazel info release
?release 0.9.0- (@non-git)
If
bazel info release
returns "development version" or "(@non-git)", tell us how you built Bazel.See above.
Have you found anything relevant by searching the web?
Earlier discussion of me debugging java rules reproducibility: https://groups.google.com/d/msg/bazel-discuss/5M-QoZ4gPq8/d_y1dEWnAAAJ
Any other information, logs, or outputs that you want to share?
Digest logging (or some other simple way of recursively seeing what's going on into each action cache entry) would make it easier to pin down in the future.
The text was updated successfully, but these errors were encountered: