-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bazel coverage
fails over remote ex for java_test
#4685
Comments
I see the same issue when running some of our C++/Python tests with remote execution. Bazel version is a slightly patched version of 0.25.3. |
I've come up with a workaround that basically involves the below patch. I have a feeling there is a more correct way to do this--possibly by including some implicit dependencies in the relevant coverage-aware Bazel rules, but I am not familiar enough with the codebase to efficiently tackle that. commit ba16394fa207709b2b2a2fdc22de30cc72cf4398 (HEAD -> rbe-coverage)
Author: Victor Robertson <[email protected]>
Date: Wed Sep 11 23:01:49 2019 -0700
Fix coverage for RBE [EP-12079]
This patch fixes code coverage in RBE by removing the distinction
between JAVA_RUNFILES and TEST_SRCDIR in the collect_coverage.sh script.
This is important as Bazel will not send JAVA_RUNFILES to the remote
environment. Unfortunately, this means that a Java sdk will be included
with every coverage test--perhaps this isn't so bad.
This also means that tests with coverage support must include the
dependencies normally provided by Bazel. In cruise/cruise, we use the
following additional dependencies:
- @embedded_jdk//:jdk
- @bazel_tools//tools/jdk:JacocoCoverageRunner
- @bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main
diff --git a/tools/test/collect_coverage.sh b/tools/test/collect_coverage.sh
index a9e212ce5d..df3b22cc78 100755
--- a/tools/test/collect_coverage.sh
+++ b/tools/test/collect_coverage.sh
@@ -188,7 +188,8 @@ if [[ $DISPLAY_LCOV_CMD ]] ; then
echo "-----------------"
fi
-# JAVA_RUNFILES is set to the runfiles of the test, which does not necessarily
-# contain a JVM (it does only if the test has a Java binary somewhere). So let
-# the LCOV merger discover where its own runfiles tree is.
-JAVA_RUNFILES= exec $LCOV_MERGER_CMD
+# cruise: Since JAVA_RUNFILES would never be pushed to RBE, we allow it to
+# remain as it is and ensure that TEST_SRCDIR has the right dependencies to run
+# coverage reports. Normally this would be preceeded with JAVA_RUNFILES= to
+# clear it out.
+exec $LCOV_MERGER_CMD As mentioned in the patch, this means our test configurations include this little kludge. You can imagine how this is used ( def add_coverage_test_runtime_data():
return select({
"@//tools:is_coverage_build": [
"@embedded_jdk//:jdk",
"@bazel_tools//tools/jdk:JacocoCoverageRunner",
"@bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main",
],
"//conditions:default": [],
}) Hopefully this information helps. |
Requires: - bazelbuild#10379 is merged - a new remote coverage tools zip pushed - coverage.WORKSPACE updated to the new tools This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator. This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them. Progress on bazelbuild#4685. Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad
Requires: - bazelbuild#10379 is merged - a new remote coverage tools zip pushed - coverage.WORKSPACE updated to the new tools This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator. This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them. Progress on bazelbuild#4685. Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad
The test action only stages the files-to-build, but not the runfiles tree. The coverage tool used by Google is a single, self-contained executable file. This is at odds with the new coverage tool, which is built as a Java binary, and the wrapper script for java_binary expects a runfiles tree. This happens to work locally because the wrapper script manages to escape the symlink sandbox by following a symlink back to the exec root (it wouldn't work with stricter local sandboxing). I'm still not sure how best to fix it, but I can improve the situation. I'm reluctant to change how the test action stages the coverage tool. However, I have two changes to make the coverage tools repository self-contained (this is a nice cleanup in any case and makes it more flexible in the future), and I have a prototype for making the coverage tools repository compatible with the test action setup by making it work without a runfiles tree. That works on Linux and MacOS because it currently assumes the presence of /bin/bash. |
Currently, the coverage tool setup is smeared across @bazel_tools//tools/test/BUILD, @bazel_tools//tools/test/CoverageOutputGenerator/.../BUILD (both shipped inside Bazel), and @remote_coverage_tools//, which is separately uploaded as a zipped repository. The second BUILD file contains java_* rules which are actually used. The remote_coverage_tools repository contains a copy of the java_* rules, but they are never used, and the java_import in the former directly references the deploy jar in the latter. Instead, the plan is to only ship @bazel_tools//tools/test/BUILD in Bazel, and use alias rules to point to the @remote_coverage_tools repository. There, we define two rules (:lcov_merger and :coverage_report_generator), which fully define the implementation. This allows the repository to be swapped out for another repository with a different implementation using --override_repository, which facilitates independent work on the merger and generator. The underlying problem is that the merger currently does not work with remote execution, and there is no workaround because all the relevant parts are hard-coded in Bazel, which requires a Bazel release to fix. By making the coverage tools self-contained, we make it easier to solve such problems in the future. This should not cause any problems for the existing workaround described in #4685. Progress on #4685. Change-Id: I26758db573a1966b40169314d4aec61eff83f83b Closes #10379. Change-Id: I26758db573a1966b40169314d4aec61eff83f83b PiperOrigin-RevId: 284581461
Why the reluctance? Wouldn't adding the coverage binary properly as a tool to the test action (i.e., fix #4033) be the cleanest resolution? |
Definitely maybe. Merging the runfiles tree into the tests runfiles tree can potentially cause conflicts, and we also don't want to tests to interact with it directly - both the tool and the contract are subject to change. We could stage the runfiles tree separately from the test runfiles tree, although I suspect that that isn't possible in Bazel right now. At the same time, the tool is a single self-contained deploy jar - it's only the java_binary wrapper script that needs a runfiles tree, not the tool itself. |
The JVM could be in runfiles, so I don't think they're trivial to dispense with for the coverge merger. I suppose you could split the coverage merging into a separate spawn like |
Actually, I discussed splitting it into a separate spawn earlier today, and I think that would have some advantages. We are already close to the per-action execution time limit in some cases, and doing coverage work in the same action is problematic. We can use tree artifacts to track the intermediate artifacts to avoid zipping / unzipping. You're right about the JVM. |
Requires: - #10379 is merged - a new remote coverage tools zip pushed - coverage.WORKSPACE updated to the new tools This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator. This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them. Progress on #4685. Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad Closes #10383. Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad PiperOrigin-RevId: 285246323
I'm wondering if moving the lcov merger to a post-process would allow us to use an aspect to attach it to the test. We currently require that all test rules depend on the lcov_merger in order to get the postprocessing into the test action, and that requires extra work from rule authors and is also not documented anywhere. |
That will probably fix #6293, too. |
Fully integrating Go coverage requires some path to generate lcov data. However, I agree that moving to a post-process (aspect or no) would make it more flexible; it allows pulling back the coverage data even if the coverage tools for language X don't support lcov conversion yet (or ever). There's a similar problem with Python coverage which I looked into today - coverage.py doesn't support generating lcov, so right now it is impossible to integrate into Bazel's coverage system, and you can't even manually post-process because the files are silently dropped. |
I'll look into declaring a tree artifact for the coverage output dir, if I find the time. |
Summary: Coverage runs were getting `nobuild_runfile_links` from a expansion of `remote_download_minimal`. This happens to cause issues. So explicitly set `build_runfile_links` for coverage runs. See bazelbuild/bazel#4685 (comment) Test Plan: Ran the coverage build, most of the failures go away. Reviewers: zasgar, michelle, jamesbartlett Reviewed By: jamesbartlett Signed-off-by: Vihang Mehta <[email protected]> Differential Revision: https://phab.corp.pixielabs.ai/D11654 GitOrigin-RevId: d469d5d
When using
|
That should go away if you set |
I tried it, and I'm getting the same error on Bazel@HEAD (b598c51). I'm running it against Gerrit project: $ bazeldev coverage --experimental_split_coverage_postprocessing \
--remote_download_outputs=all \
--config=remote \
--remote_instance_name=projects/$PROJECT/instances/default_instance \
--coverage_report_generator=@bazel_tools//tools/test:coverage_report_generator \
--combined_report=lcov
javatests/com/google/gerrit/common/... |
Summary: Coverage runs were getting `nobuild_runfile_links` from a expansion of `remote_download_minimal`. This happens to cause issues. So explicitly set `build_runfile_links` for coverage runs. See bazelbuild/bazel#4685 (comment) Test Plan: Ran the coverage build, most of the failures go away. Reviewers: zasgar, michelle, jamesbartlett Reviewed By: jamesbartlett Signed-off-by: Vihang Mehta <[email protected]> Differential Revision: https://phab.corp.pixielabs.ai/D11654 GitOrigin-RevId: d469d5d
Summary: Coverage runs were getting `nobuild_runfile_links` from a expansion of `remote_download_minimal`. This happens to cause issues. So explicitly set `build_runfile_links` for coverage runs. See bazelbuild/bazel#4685 (comment) Test Plan: Ran the coverage build, most of the failures go away. Reviewers: zasgar, michelle, jamesbartlett Reviewed By: jamesbartlett Signed-off-by: Vihang Mehta <[email protected]> Differential Revision: https://phab.corp.pixielabs.ai/D11654 GitOrigin-RevId: d469d5d
Summary: Coverage runs were getting `nobuild_runfile_links` from a expansion of `remote_download_minimal`. This happens to cause issues. So explicitly set `build_runfile_links` for coverage runs. See bazelbuild/bazel#4685 (comment) Test Plan: Ran the coverage build, most of the failures go away. Reviewers: zasgar, michelle, jamesbartlett Reviewed By: jamesbartlett Signed-off-by: Vihang Mehta <[email protected]> Differential Revision: https://phab.corp.pixielabs.ai/D11654 GitOrigin-RevId: d469d5d
@davido Could you try again with |
@adam-azarchs I was able to reproduce the With this PR and the two flags @c-mita Do you see any potential issues if the two experimental flags were enabled by default? |
I don't really know of a fundamental reason why they couldn't be enabled by default, although there may be one or two lingering issues that should be resolved first (#15363 comes to mind). |
I'm on PTO this week but can check it out next week. However the last time I tried those flags in combination I got something that passed but didn't record any coverage information. |
The same issue can come up with a |
@fmeum I tried a coverage build with |
Could you also try #16556 when it's ready? That approach is much more reliable than #16475.
The most common reason is that |
@fmeum With this option I'm getting a different error:
|
@davido At least that error doesn't seem to be directly related to Bazel anymore. Only place I can find it is at bazelbuild/bazel-toolchains#870. |
@fmeum Thanks for confirming, but the above issue seems to be entirely unrelated to Gerrit Code Review project, where we are trying (for years) to make I've just tried to run
So that I'm still wondering if someone can run |
@davido Which remote backend are you running against? I can't find that error message in the Bazel codebase. |
We use Google cloud RBE, see the documentation upstream how to set it up and running: [1], [2]. We also use custom The relevant http_archive(
name = "rbe_jdk11",
sha256 = "dbcfd6f26589ef506b91fe03a12dc559ca9c84699e4cf6381150522287f0e6f6",
strip_prefix = "rbe_autoconfig-3.1.0",
urls = [
"https://gerrit-bazel.storage.googleapis.com/rbe_autoconfig/v3.1.0.tar.gz",
"https://github.com/davido/rbe_autoconfig/archive/v3.1.0.tar.gz",
],
) So, I tried to log Bazel's gRPC communication in binary protocol buffer format to
If I pass this option: [1] https://gerrit.googlesource.com/gerrit/+/master/Documentation/dev-bazel.txt#585 |
My time for testing this stuff has been limited, but with bazel 6.0.0 and the following in my
Without We do have some internal rules which were collecting coverage properly (both locally and remotely) without |
Do the docs need to be updated? https://bazel.build/configure/coverage#remote-execution |
Have a question about adding the downloader regex. Was using the one describe above.
When running java tests it pulls back large
Seems to have worked and is a lot faster is that jar file needed for anything? Using bazel Also a little worried that removing |
Adds the new `test/shell/test_coverage_helper.sh` file, which defines (amongst other helpers) `COVERAGE_FLAGS` as containing: - `--experimental_fetch_all_coverage_outputs` - `--experimental_split_coverage_postprocessing` This resolves the following error under Bazel 7.4.1 with Bzlmod enabled: ```txt /mnt/engflow/worker/work/3/exec/bazel-out/ darwin_arm64-opt-exec-ST-f4dfef26580e/bin/external/ bazel_tools~remote_coverage_tools_extension~remote_coverage_tools/Main: Cannot locate runfiles directory. (Set $JAVA_RUNFILES to inhibit searching.) ``` This error resembles bazelbuild/bazel#20577, but wasn't due to the presence of `--nobuild_runfile_links`, but to the lack of the aforementioned `--experimental_*` flags. I learned about these flags from: - bazelbuild/bazel#4685 (comment) - bazelbuild/bazel#16556
Description of the problem / feature request:
Tooling for coverage fails when running under remote execution (LcovMerger specifically)
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Run a test target with
bazel coverage --spawn_strategy=remote --remote_executor=... //path/to:test
What operating system are you running Bazel on?
Ubuntu 14.04
What's the output of
bazel info release
?release 0.10.1
Any other information, logs, or outputs that you want to share?
Relevant portion of coverage execution
The text was updated successfully, but these errors were encountered: