Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom test rule without _lcov_merger attribute always fails under bazel coverage #13978

Closed
fmeum opened this issue Sep 11, 2021 · 19 comments
Closed
Assignees
Labels
coverage P2 We'll consider working on this in future. (Assignee optional) team-Rules-CPP Issues for C++ rules type: bug

Comments

@fmeum
Copy link
Collaborator

fmeum commented Sep 11, 2021

Description of the problem / feature request:

When a custom Starlark test rule without explicit coverage support in the form of the _lcov_merger attribute is invoked via bazel coverage with Bazel after 5b216b2, it fails with

FAIL: //:custom_test (see /home/fmeum/.cache/bazel/_bazel_fhenneke/767fd922858d516eb34a4dddb64ca1a0/execroot/__main__/bazel-out/k8-fastbuild/testlogs/custom_test/test.log)
ERROR: /home/fmeum/git/bazel-custom-test-coverage/BUILD:3:12: output 'custom_test/coverage.dat' was not created
ERROR: /home/fmeum/git/bazel-custom-test-coverage/BUILD:3:12: Testing //:custom_test failed: not all outputs were created or valid
Target //:custom_test up-to-date:
  bazel-bin/custom_test
INFO: Elapsed time: 5.635s, Critical Path: 0.17s
INFO: 7 processes: 1 disk cache hit, 5 internal, 1 linux-sandbox.
FAILED: Build did NOT complete successfully
//:custom_test                                                           FAILED in 0.0s
  /home/fmeum/.cache/bazel/_bazel_fhenneke/767fd922858d516eb34a4dddb64ca1a0/execroot/__main__/bazel-out/k8-fastbuild/testlogs/custom_test/test.log

The log says:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:custom_test
-----------------------------------------------------------------------------
--
Coverage runner: cannot locate file

With Bazel 4.2.1, the build succeeds:

INFO: Build completed successfully, 5 total actions
//:custom_test                                                           PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command
INFO: Build completed successfully, 5 total actions

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

#!/usr/bin/env sh
touch WORKSPACE

cat << EOF > BUILD
load("//:rules.bzl", "custom_test")

custom_test(
    name = "custom_test",
)
EOF

cat << EOF > rules.bzl
def _custom_test_impl(ctx):
    executable = ctx.actions.declare_file(ctx.attr.name)
    ctx.actions.write(executable, "", is_executable = True)
    return [
        DefaultInfo(
            executable = executable,
        )
    ]

custom_test = rule(
    implementation = _custom_test_impl,
    test = True,
)
EOF

bazel coverage //:custom_test

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

release 5.0.0-pre.20210831.2

I bisected the origin of this issue back to 5b216b2.

Have you found anything relevant by searching the web?

With 5b216b27435aeb9eb9c3bd3c552d6498e1050cc7, the custom_test rule in the reproducer now provides an InstrumentedFilesInfo, which causes TestActionBuilder to collect coverage. However, since the rule does not have the :lcov_merger attribute set, the LCOV_MERGER env variable isn't set when collect_coverage.sh is invoked, which causes the error message reproduced above.

@keith
Copy link
Member

keith commented Sep 12, 2021

I'm kinda surprised this ever worked before, I think the real core issue to fix is #10642

fmeum added a commit to fmeum/bazel that referenced this issue Sep 13, 2021
Since 5b216b2, all Starlark-defined test rules either need to set the
`_lcov_merger` attribute or fail if run with `bazel coverage`. This is
fixed by attaching the default lcov merger as a late-bound default to
all Starlark rules via a new attribute and falling back to that
attribute in the TestActionBuilder.

This commit ensures backwards compatibility is maintained with existing
Starlark test rules that set the `_lcov_merger` attribute. This is also
verified in the included integration test.

Fixes bazelbuild#13978.
fmeum added a commit to fmeum/bazel that referenced this issue Sep 13, 2021
Since 5b216b2, all Starlark-defined test rules either need to set the
`_lcov_merger` attribute or fail if run with `bazel coverage`. This is
fixed by attaching the default lcov merger as a late-bound default to
all Starlark rules via a new attribute and falling back to that
attribute in the TestActionBuilder.

This commit ensures backwards compatibility is maintained with existing
Starlark test rules that set the `_lcov_merger` attribute. This is also
verified in the included integration test.

Fixes bazelbuild#13978.
fmeum added a commit to fmeum/bazel that referenced this issue Sep 13, 2021
Since 5b216b2, all Starlark-defined test rules either need to set the
`_lcov_merger` attribute or fail if run with `bazel coverage`. This is
fixed by attaching the default lcov merger as a late-bound default to
all Starlark rules via a new attribute and falling back to that
attribute in the TestActionBuilder.

This commit ensures backwards compatibility is maintained with existing
Starlark test rules that set the `_lcov_merger` attribute. This is also
verified in the included integration test.

Fixes bazelbuild#13978.
@fmeum
Copy link
Collaborator Author

fmeum commented Sep 13, 2021

@keith Could you take a look at #13983 and verify whether it also fixes your issue?

fmeum added a commit to fmeum/bazel that referenced this issue Sep 13, 2021
Since 5b216b2, all Starlark-defined test rules either need to set the
`_lcov_merger` attribute or fail if run with `bazel coverage`. This is
fixed by attaching the default lcov merger as a late-bound default to
all Starlark rules via a new attribute and falling back to that
attribute in the TestActionBuilder.

This commit ensures backwards compatibility is maintained with existing
Starlark test rules that set the `_lcov_merger` attribute. This is also
verified in the included integration test.

Fixes bazelbuild#13978.
@keith
Copy link
Member

keith commented Sep 13, 2021

With that patch it does remove the need to pass --test_env=LCOV_MERGER=/usr/bin/true to coverage invocations

@brentleyjones
Copy link
Contributor

A friendly 2 month ping on this.

@meteorcloudy
Copy link
Member

Can anyone send a PR to cherry-pick aa52f2d into release-5.0.0rc2?

brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Dec 1, 2021
Rules that provide InstrumentedFilesInfo but do not define the
_lcov_merger attribute will result in this script being called without
the required LCOV_MERGER environment variable being set.

Since _all_ rules now provide an InstrumentedFilesInfo provider by
default, but are not required to provide the attribute, this scenario
will be encountered far more frequently and should not be an error.

Because of the way coverage collection within Google differs to Bazel,
it isn't safe to change TestActionBuilder to simply not prepare the
collect_coverage.sh script the way it would if no coverage was to be
collected, so we have to change the script itself.

Fixes bazelbuild#13978

Closes bazelbuild#14352.

PiperOrigin-RevId: 413369871
(cherry picked from commit aa52f2d)
@brentleyjones
Copy link
Contributor

#14359

meteorcloudy pushed a commit that referenced this issue Dec 1, 2021
Rules that provide InstrumentedFilesInfo but do not define the
_lcov_merger attribute will result in this script being called without
the required LCOV_MERGER environment variable being set.

Since _all_ rules now provide an InstrumentedFilesInfo provider by
default, but are not required to provide the attribute, this scenario
will be encountered far more frequently and should not be an error.

Because of the way coverage collection within Google differs to Bazel,
it isn't safe to change TestActionBuilder to simply not prepare the
collect_coverage.sh script the way it would if no coverage was to be
collected, so we have to change the script itself.

Fixes #13978

Closes #14352.

PiperOrigin-RevId: 413369871
(cherry picked from commit aa52f2d)

Co-authored-by: Charles Mita <[email protected]>
Bencodes pushed a commit to Bencodes/bazel that referenced this issue Jan 10, 2022
Rules that provide InstrumentedFilesInfo but do not define the
_lcov_merger attribute will result in this script being called without
the required LCOV_MERGER environment variable being set.

Since _all_ rules now provide an InstrumentedFilesInfo provider by
default, but are not required to provide the attribute, this scenario
will be encountered far more frequently and should not be an error.

Because of the way coverage collection within Google differs to Bazel,
it isn't safe to change TestActionBuilder to simply not prepare the
collect_coverage.sh script the way it would if no coverage was to be
collected, so we have to change the script itself.

Fixes bazelbuild#13978

Closes bazelbuild#14352.

PiperOrigin-RevId: 413369871
@mattem
Copy link
Contributor

mattem commented Feb 10, 2022

@meteorcloudy Hasn't this change in #14359 essentially made it so that any Starlark test rule not defining _lcov_merger or users setting --test_env=LCOV_MERGER=/usr/bin/true fail open? That exit 0 is causing the tests to pass, incorrectly.

For example, using diff_test from sky_lib now fails open. In the original repro in this, under bazel 5, adding exit 1 into generated test script does not cause the test to fail, as it's never called.

@meteorcloudy
Copy link
Member

For example, using diff_test from sky_lib now fails open. In the original repro in this, under bazel 5, adding exit 1 into generated test script does not cause the test to fail, as it's never called.

@fmeum @c-mita Can you respond this? I'm not familiar with this feature.

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 10, 2022

@meteorcloudy I didn't look into it much after #13983 was closed, but just at a glance, @mattem seems to have caught a real regression: I also think that tests executed with bazel coverage will generally pass even if they shouldn't. But @c-mita would know the details here.

Should probably be marked as a release blocker for 5.0.1.

@mattem
Copy link
Contributor

mattem commented Feb 10, 2022

Yeah, the diff_test was just an example of a OSS rule that doesn't have the _lcov_merger attr set. Adding the exit 1 to the generated test script above will repro the issue and the test will pass incorrectly under bazel coverage.

#!/usr/bin/env sh
touch WORKSPACE

cat << EOF > BUILD
load("//:rules.bzl", "custom_test")

custom_test(
    name = "custom_test",
)
EOF

cat << EOF > rules.bzl
def _custom_test_impl(ctx):
    executable = ctx.actions.declare_file(ctx.attr.name)
    ctx.actions.write(executable, "exit 1", is_executable = True)
    return [
        DefaultInfo(
            executable = executable,
        )
    ]

custom_test = rule(
    implementation = _custom_test_impl,
    test = True,
)
EOF

bazel coverage //:custom_test

I can open a new issue if that's easier too.

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 10, 2022

However, the alternative is that all these tests will now fail until the _lcov_merger attribute is added to the rule definition, which would bring back the original regression. I think we need a fundamentally better fix here.

@mattem
Copy link
Contributor

mattem commented Feb 10, 2022

True, but in the current state, bazel 5.x is passing tests when they should be failing for those running coverage over their entire build. Is a reasonable workaround here to set --test_env=LCOV_MERGER=...? If so then there's a workaround, otherwise it's a little scary to have tests fail open.

@alexeagle
Copy link
Contributor

@fmeum in your original post you point out that 5b216b2 is what actually broke this. Maybe we should try to revert that flag flip in a 5.0.x release?

@fmeum
Copy link
Collaborator Author

fmeum commented Feb 10, 2022

True, but in the current state, bazel 5.x is passing tests when they should be failing for those running coverage over their entire build. Is a reasonable workaround here to set --test_env=LCOV_MERGER=...? If so then there's a workaround, otherwise it's a little scary to have tests fail open.

Yes, both fail open and fail close are bad here, but I agree that fail open is even worse. I'm just saying that there is no obvious "fix forward" approach.

@fmeum in your original post you point out that 5b216b2 is what actually broke this. Maybe we should try to revert that flag flip in a 5.0.x release?

It would fix this issue, but would technically cause a regression from 5.0.0: bazel coverage would suddenly return coverage information in much fewer cases because the flip has been disabled. I don't know whether this is acceptable.

The only way I am aware of that preserves backwards compatible with both 4.2.2 (fixing the regression) and 5.0.0 (not introducing a new regression) is to provide some default _lcov_merger, either via the approach originally proposed in #13983 or the equivalent of --test_env=LCOV_MERGER=... as suggested by @mattem. But that may cause conflicts with Google's internal setup as per #13983 (comment).

@mattem
Copy link
Contributor

mattem commented Feb 11, 2022

The --test_env=LCOV_MERGER=... works internally for us, and all tests are now correctly green. It seems like this issue should be more broadly communicated though, as for some they could have any number of failing open tests and not realize.

keith added a commit to keith/bazel that referenced this issue Feb 13, 2022
Previously this short circuit meant the tests weren't actually run, and
they would always exit 0, in the case the test rule didn't set the lcov
related attributes.

More context: bazelbuild#13978
@keith
Copy link
Member

keith commented Feb 13, 2022

Here's a fix #14807

@c-mita
Copy link
Member

c-mita commented Feb 14, 2022

Sorry for the delay in responding.

#14807 looks to me to be an ok patch for this scenario, allowing us to proceed without regressing, but I'm not satisfied with how things are.
Bazel needs to be more intelligent in handling the case of rules not providing an _lcov_merger attribute, especially since this isn't exactly a well documented feature.

@keith
Copy link
Member

keith commented Feb 14, 2022

Related #10642

bazel-io pushed a commit that referenced this issue Feb 15, 2022
Previously this short circuit meant the tests weren't actually run, and
they would always exit 0, in the case the test rule didn't set the lcov
related attributes.

More context: #13978

Closes #14807.

RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test.
PiperOrigin-RevId: 428756211
brentleyjones pushed a commit to brentleyjones/bazel that referenced this issue Feb 16, 2022
Previously this short circuit meant the tests weren't actually run, and
they would always exit 0, in the case the test rule didn't set the lcov
related attributes.

More context: bazelbuild#13978

Closes bazelbuild#14807.

RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test.
PiperOrigin-RevId: 428756211
(cherry picked from commit 16de035)
Wyverald pushed a commit that referenced this issue Feb 16, 2022
Previously this short circuit meant the tests weren't actually run, and
they would always exit 0, in the case the test rule didn't set the lcov
related attributes.

More context: #13978

Closes #14807.

RELNOTES: Fixed an issue where Bazel could erroneously report a test passes in coverage mode without actually running the test.
PiperOrigin-RevId: 428756211
(cherry picked from commit 16de035)

Co-authored-by: Keith Smiley <[email protected]>
@fmeum
Copy link
Collaborator Author

fmeum commented Mar 12, 2022

I think that the fix is incomplete: Without LCOV_MERGER, tests will have the exec root set as their working directory rather than the runfiles root, see #15030.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coverage P2 We'll consider working on this in future. (Assignee optional) team-Rules-CPP Issues for C++ rules type: bug
Projects
None yet
9 participants