Custom test rule without `_lcov_merger` attribute always fails under `bazel coverage` #13978

fmeum · 2021-09-11T21:24:28Z

Description of the problem / feature request:

When a custom Starlark test rule without explicit coverage support in the form of the _lcov_merger attribute is invoked via bazel coverage with Bazel after 5b216b2, it fails with

FAIL: //:custom_test (see /home/fmeum/.cache/bazel/_bazel_fhenneke/767fd922858d516eb34a4dddb64ca1a0/execroot/__main__/bazel-out/k8-fastbuild/testlogs/custom_test/test.log)
ERROR: /home/fmeum/git/bazel-custom-test-coverage/BUILD:3:12: output 'custom_test/coverage.dat' was not created
ERROR: /home/fmeum/git/bazel-custom-test-coverage/BUILD:3:12: Testing //:custom_test failed: not all outputs were created or valid
Target //:custom_test up-to-date:
  bazel-bin/custom_test
INFO: Elapsed time: 5.635s, Critical Path: 0.17s
INFO: 7 processes: 1 disk cache hit, 5 internal, 1 linux-sandbox.
FAILED: Build did NOT complete successfully
//:custom_test                                                           FAILED in 0.0s
  /home/fmeum/.cache/bazel/_bazel_fhenneke/767fd922858d516eb34a4dddb64ca1a0/execroot/__main__/bazel-out/k8-fastbuild/testlogs/custom_test/test.log

The log says:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:custom_test
-----------------------------------------------------------------------------
--
Coverage runner: cannot locate file

With Bazel 4.2.1, the build succeeds:

INFO: Build completed successfully, 5 total actions
//:custom_test                                                           PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command
INFO: Build completed successfully, 5 total actions

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

#!/usr/bin/env sh
touch WORKSPACE

cat << EOF > BUILD
load("//:rules.bzl", "custom_test")

custom_test(
    name = "custom_test",
)
EOF

cat << EOF > rules.bzl
def _custom_test_impl(ctx):
    executable = ctx.actions.declare_file(ctx.attr.name)
    ctx.actions.write(executable, "", is_executable = True)
    return [
        DefaultInfo(
            executable = executable,
        )
    ]

custom_test = rule(
    implementation = _custom_test_impl,
    test = True,
)
EOF

bazel coverage //:custom_test

What operating system are you running Bazel on?

Linux

What's the output of `bazel info release`?

release 5.0.0-pre.20210831.2

I bisected the origin of this issue back to 5b216b2.

Have you found anything relevant by searching the web?

With 5b216b27435aeb9eb9c3bd3c552d6498e1050cc7, the custom_test rule in the reproducer now provides an InstrumentedFilesInfo, which causes TestActionBuilder to collect coverage. However, since the rule does not have the :lcov_merger attribute set, the LCOV_MERGER env variable isn't set when collect_coverage.sh is invoked, which causes the error message reproduced above.

The text was updated successfully, but these errors were encountered:

keith · 2021-09-12T03:23:18Z

I'm kinda surprised this ever worked before, I think the real core issue to fix is #10642

Since 5b216b2, all Starlark-defined test rules either need to set the `_lcov_merger` attribute or fail if run with `bazel coverage`. This is fixed by attaching the default lcov merger as a late-bound default to all Starlark rules via a new attribute and falling back to that attribute in the TestActionBuilder. This commit ensures backwards compatibility is maintained with existing Starlark test rules that set the `_lcov_merger` attribute. This is also verified in the included integration test. Fixes bazelbuild#13978.

fmeum · 2021-09-13T10:09:00Z

@keith Could you take a look at #13983 and verify whether it also fixes your issue?

Since 5b216b2, all Starlark-defined test rules either need to set the `_lcov_merger` attribute or fail if run with `bazel coverage`. This is fixed by attaching the default lcov merger as a late-bound default to all Starlark rules via a new attribute and falling back to that attribute in the TestActionBuilder. This commit ensures backwards compatibility is maintained with existing Starlark test rules that set the `_lcov_merger` attribute. This is also verified in the included integration test. Fixes bazelbuild#13978.

keith · 2021-09-13T16:42:56Z

With that patch it does remove the need to pass --test_env=LCOV_MERGER=/usr/bin/true to coverage invocations

brentleyjones · 2021-11-11T14:01:09Z

A friendly 2 month ping on this.

meteorcloudy · 2021-12-01T11:55:53Z

Can anyone send a PR to cherry-pick aa52f2d into release-5.0.0rc2?

Rules that provide InstrumentedFilesInfo but do not define the _lcov_merger attribute will result in this script being called without the required LCOV_MERGER environment variable being set. Since _all_ rules now provide an InstrumentedFilesInfo provider by default, but are not required to provide the attribute, this scenario will be encountered far more frequently and should not be an error. Because of the way coverage collection within Google differs to Bazel, it isn't safe to change TestActionBuilder to simply not prepare the collect_coverage.sh script the way it would if no coverage was to be collected, so we have to change the script itself. Fixes bazelbuild#13978 Closes bazelbuild#14352. PiperOrigin-RevId: 413369871 (cherry picked from commit aa52f2d)

brentleyjones · 2021-12-01T15:18:25Z

#14359

Rules that provide InstrumentedFilesInfo but do not define the _lcov_merger attribute will result in this script being called without the required LCOV_MERGER environment variable being set. Since _all_ rules now provide an InstrumentedFilesInfo provider by default, but are not required to provide the attribute, this scenario will be encountered far more frequently and should not be an error. Because of the way coverage collection within Google differs to Bazel, it isn't safe to change TestActionBuilder to simply not prepare the collect_coverage.sh script the way it would if no coverage was to be collected, so we have to change the script itself. Fixes #13978 Closes #14352. PiperOrigin-RevId: 413369871 (cherry picked from commit aa52f2d) Co-authored-by: Charles Mita <[email protected]>

Rules that provide InstrumentedFilesInfo but do not define the _lcov_merger attribute will result in this script being called without the required LCOV_MERGER environment variable being set. Since _all_ rules now provide an InstrumentedFilesInfo provider by default, but are not required to provide the attribute, this scenario will be encountered far more frequently and should not be an error. Because of the way coverage collection within Google differs to Bazel, it isn't safe to change TestActionBuilder to simply not prepare the collect_coverage.sh script the way it would if no coverage was to be collected, so we have to change the script itself. Fixes bazelbuild#13978 Closes bazelbuild#14352. PiperOrigin-RevId: 413369871

mattem · 2022-02-10T13:37:13Z

@meteorcloudy Hasn't this change in #14359 essentially made it so that any Starlark test rule not defining _lcov_merger or users setting --test_env=LCOV_MERGER=/usr/bin/true fail open? That exit 0 is causing the tests to pass, incorrectly.

For example, using diff_test from sky_lib now fails open. In the original repro in this, under bazel 5, adding exit 1 into generated test script does not cause the test to fail, as it's never called.

meteorcloudy · 2022-02-10T14:42:19Z

For example, using diff_test from sky_lib now fails open. In the original repro in this, under bazel 5, adding exit 1 into generated test script does not cause the test to fail, as it's never called.

@fmeum @c-mita Can you respond this? I'm not familiar with this feature.

fmeum · 2022-02-10T14:46:39Z

@meteorcloudy I didn't look into it much after #13983 was closed, but just at a glance, @mattem seems to have caught a real regression: I also think that tests executed with bazel coverage will generally pass even if they shouldn't. But @c-mita would know the details here.

Should probably be marked as a release blocker for 5.0.1.

mattem · 2022-02-10T14:49:09Z

Yeah, the diff_test was just an example of a OSS rule that doesn't have the _lcov_merger attr set. Adding the exit 1 to the generated test script above will repro the issue and the test will pass incorrectly under bazel coverage.

#!/usr/bin/env sh
touch WORKSPACE

cat << EOF > BUILD
load("//:rules.bzl", "custom_test")

custom_test(
    name = "custom_test",
)
EOF

cat << EOF > rules.bzl
def _custom_test_impl(ctx):
    executable = ctx.actions.declare_file(ctx.attr.name)
    ctx.actions.write(executable, "exit 1", is_executable = True)
    return [
        DefaultInfo(
            executable = executable,
        )
    ]

custom_test = rule(
    implementation = _custom_test_impl,
    test = True,
)
EOF

bazel coverage //:custom_test

I can open a new issue if that's easier too.

fmeum · 2022-02-10T14:51:53Z

However, the alternative is that all these tests will now fail until the _lcov_merger attribute is added to the rule definition, which would bring back the original regression. I think we need a fundamentally better fix here.

mattem · 2022-02-10T14:58:30Z

True, but in the current state, bazel 5.x is passing tests when they should be failing for those running coverage over their entire build. Is a reasonable workaround here to set --test_env=LCOV_MERGER=...? If so then there's a workaround, otherwise it's a little scary to have tests fail open.

alexeagle · 2022-02-10T15:26:40Z

@fmeum in your original post you point out that 5b216b2 is what actually broke this. Maybe we should try to revert that flag flip in a 5.0.x release?

fmeum · 2022-02-10T15:40:36Z

True, but in the current state, bazel 5.x is passing tests when they should be failing for those running coverage over their entire build. Is a reasonable workaround here to set --test_env=LCOV_MERGER=...? If so then there's a workaround, otherwise it's a little scary to have tests fail open.

Yes, both fail open and fail close are bad here, but I agree that fail open is even worse. I'm just saying that there is no obvious "fix forward" approach.

@fmeum in your original post you point out that 5b216b2 is what actually broke this. Maybe we should try to revert that flag flip in a 5.0.x release?

It would fix this issue, but would technically cause a regression from 5.0.0: bazel coverage would suddenly return coverage information in much fewer cases because the flip has been disabled. I don't know whether this is acceptable.

The only way I am aware of that preserves backwards compatible with both 4.2.2 (fixing the regression) and 5.0.0 (not introducing a new regression) is to provide some default _lcov_merger, either via the approach originally proposed in #13983 or the equivalent of --test_env=LCOV_MERGER=... as suggested by @mattem. But that may cause conflicts with Google's internal setup as per #13983 (comment).

mattem · 2022-02-11T13:53:46Z

The --test_env=LCOV_MERGER=... works internally for us, and all tests are now correctly green. It seems like this issue should be more broadly communicated though, as for some they could have any number of failing open tests and not realize.

Previously this short circuit meant the tests weren't actually run, and they would always exit 0, in the case the test rule didn't set the lcov related attributes. More context: bazelbuild#13978

keith · 2022-02-13T18:35:37Z

Here's a fix #14807

c-mita · 2022-02-14T16:57:47Z

Sorry for the delay in responding.

#14807 looks to me to be an ok patch for this scenario, allowing us to proceed without regressing, but I'm not satisfied with how things are.
Bazel needs to be more intelligent in handling the case of rules not providing an _lcov_merger attribute, especially since this isn't exactly a well documented feature.

keith · 2022-02-14T17:36:00Z

Related #10642

Previously this short circuit meant the tests weren't actually run, and they would always exit 0, in the case the test rule didn't set the lcov related attributes. More context: #13978 Closes #14807. RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test. PiperOrigin-RevId: 428756211

Previously this short circuit meant the tests weren't actually run, and they would always exit 0, in the case the test rule didn't set the lcov related attributes. More context: bazelbuild#13978 Closes bazelbuild#14807. RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test. PiperOrigin-RevId: 428756211 (cherry picked from commit 16de035)

Previously this short circuit meant the tests weren't actually run, and they would always exit 0, in the case the test rule didn't set the lcov related attributes. More context: #13978 Closes #14807. RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test. PiperOrigin-RevId: 428756211 (cherry picked from commit 16de035) Co-authored-by: Keith Smiley <[email protected]>

fmeum · 2022-03-12T18:52:15Z

I think that the fix is incomplete: Without LCOV_MERGER, tests will have the exec root set as their working directory rather than the runfiles root, see #15030.

fmeum mentioned this issue Sep 13, 2021

Provide the default lcov_merger to all Starlark tests #13983

Closed

gregestren added team-Rules-CPP Issues for C++ rules type: bug untriaged labels Sep 13, 2021

fmeum mentioned this issue Nov 28, 2021

Release 5.0 - January 2022 #14013

Closed

9 tasks

meteorcloudy assigned c-mita and oquenchil Nov 29, 2021

meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) release blocker and removed untriaged labels Nov 29, 2021

fmeum mentioned this issue Nov 29, 2021

Add coverage support for fuzzing_regression_test bazel-contrib/rules_fuzzing#174

Closed

c-mita added the coverage label Nov 29, 2021

c-mita mentioned this issue Nov 30, 2021

Exit collect_coverage.sh early if LCOV_MERGER is not set. #14352

Closed

bazel-io closed this as completed in aa52f2d Dec 1, 2021

meteorcloudy reopened this Dec 1, 2021

brentleyjones mentioned this issue Dec 1, 2021

5.x: Exit collect_coverage.sh early if LCOV_MERGER is not set. #14359

Merged

meteorcloudy closed this as completed Dec 1, 2021

keith mentioned this issue Feb 13, 2022

Fix bazel coverage false negative #14807

Closed

brentleyjones mentioned this issue Feb 16, 2022

[5.1] Fix bazel coverage false negative #14836

Merged

kormide mentioned this issue Dec 26, 2022

feat: jest coverage output aspect-build/rules_jest#71

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom test rule without `_lcov_merger` attribute always fails under `bazel coverage` #13978

Custom test rule without `_lcov_merger` attribute always fails under `bazel coverage` #13978

fmeum commented Sep 11, 2021 •

edited

Loading

keith commented Sep 12, 2021

fmeum commented Sep 13, 2021

keith commented Sep 13, 2021

brentleyjones commented Nov 11, 2021

meteorcloudy commented Dec 1, 2021

brentleyjones commented Dec 1, 2021

mattem commented Feb 10, 2022

meteorcloudy commented Feb 10, 2022

fmeum commented Feb 10, 2022 •

edited

Loading

mattem commented Feb 10, 2022

fmeum commented Feb 10, 2022

mattem commented Feb 10, 2022 •

edited

Loading

alexeagle commented Feb 10, 2022

fmeum commented Feb 10, 2022

mattem commented Feb 11, 2022

keith commented Feb 13, 2022

c-mita commented Feb 14, 2022

keith commented Feb 14, 2022

fmeum commented Mar 12, 2022

Custom test rule without _lcov_merger attribute always fails under bazel coverage #13978

Custom test rule without _lcov_merger attribute always fails under bazel coverage #13978

Comments

fmeum commented Sep 11, 2021 • edited Loading

Description of the problem / feature request:

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

Have you found anything relevant by searching the web?

keith commented Sep 12, 2021

fmeum commented Sep 13, 2021

keith commented Sep 13, 2021

brentleyjones commented Nov 11, 2021

meteorcloudy commented Dec 1, 2021

brentleyjones commented Dec 1, 2021

mattem commented Feb 10, 2022

meteorcloudy commented Feb 10, 2022

fmeum commented Feb 10, 2022 • edited Loading

mattem commented Feb 10, 2022

fmeum commented Feb 10, 2022

mattem commented Feb 10, 2022 • edited Loading

alexeagle commented Feb 10, 2022

fmeum commented Feb 10, 2022

mattem commented Feb 11, 2022

keith commented Feb 13, 2022

c-mita commented Feb 14, 2022

keith commented Feb 14, 2022

fmeum commented Mar 12, 2022

Custom test rule without `_lcov_merger` attribute always fails under `bazel coverage` #13978

Custom test rule without `_lcov_merger` attribute always fails under `bazel coverage` #13978

fmeum commented Sep 11, 2021 •

edited

Loading

What's the output of `bazel info release`?

fmeum commented Feb 10, 2022 •

edited

Loading

mattem commented Feb 10, 2022 •

edited

Loading