Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests with "exclusive" tag do not get results from remote cache #3791

Closed
grandseiken opened this issue Sep 22, 2017 · 24 comments
Closed

Tests with "exclusive" tag do not get results from remote cache #3791

grandseiken opened this issue Sep 22, 2017 · 24 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@grandseiken
Copy link

grandseiken commented Sep 22, 2017

Description of the problem / feature request / question:

I am trying out the remote cache functionality described here. Currently I'm running against a local hazelcast instance. This works great most of the time. One thing I noticed is that tests with the exclusive tag, despite being cached as normal in the local cache, will not seem to use the remote cache and will always rerun if there is no passing result in the local cache. This seems like an oversight?

If possible, provide a minimal example to reproduce the problem:

BUILD

cc_test(
  name = "test",
  srcs = ["test.cc"],
  tags = ["exclusive"], # comment this out to fix the problem
)

test.cc

#include <chrono>
#include <thread>

int main() {
  std::this_thread::sleep_for(std::chrono::seconds(10));
  return 0;
}

.bazelrc

startup --host_jvm_args=-Dbazel.DigestFunction=SHA1

build --spawn_strategy=remote
build --experimental_strict_action_env
build --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

test --spawn_strategy=remote
test --experimental_strict_action_env
test --remote_rest_cache=http://localhost:5701/hazelcast/rest/maps/cache

Environment info

  • Operating System:
    Ubuntu

  • Bazel version (output of bazel info release):
    release 0.5.3

  • Hazelcast version
    3.8.6

Anything else, information or logs or outputs that would be helpful?

Here's an example run. The second test will pass in 0.0s if either the exclusive tag is removed, or the second bazel clean command is omitted.

$ bazel clean && bazel test //... && bazel clean && bazel test //...                                         
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.405s, Critical Path: 10.20s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/stu/tmp/.bazelrc: --host_jvm_args=-Dbazel.DigestFunction=SHA1
INFO: Analysed target //:test (9 packages loaded).
INFO: Found 1 test target...
Target //:test up-to-date:
  bazel-bin/test
INFO: Elapsed time: 10.209s, Critical Path: 10.02s
INFO: Build completed successfully, 7 total actions
//:test                                                                  PASSED in 10.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
@damienmg damienmg added P1 I'll work on this now. (Assignee required) type: bug labels Sep 22, 2017
@damienmg
Copy link
Contributor

/cc @philwo because I remember there was some discussion because it also drop out of sandbox when we do that.

@hlopko
Copy link
Member

hlopko commented Oct 10, 2017

@buchgr

@philwo
Copy link
Member

philwo commented Jul 19, 2018

Reassigning to @buchgr, because remote execution.

@buchgr buchgr added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Jul 19, 2018
@jin jin added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed team-Execution labels Jan 14, 2019
@RNabel
Copy link
Contributor

RNabel commented Aug 5, 2019

@buchgr Is there a timeline for fixing this?

@buchgr
Copy link
Contributor

buchgr commented Aug 6, 2019

@RNabel yes. should be in 0.29.0

@phb
Copy link

phb commented Aug 20, 2019

@buchgr it looks like your fix #8983 got rolled back and as far as I can tell, not included in 0.29

@buchgr
Copy link
Contributor

buchgr commented Aug 21, 2019

yeah it did get rolled back :(. I broke tons of code internally. I ll try to rollforward for the 1.0 release.

@mandrean
Copy link

mandrean commented Dec 2, 2019

@buchgr any updates on this ticket?

@buchgr
Copy link
Contributor

buchgr commented Dec 2, 2019

Yes, unfortunately it got rolled back because it broke many internal tests. I'll need to fix those before I can roll it forward. I don't have the cycles to do this right now.

@buchgr buchgr removed their assignment Jan 9, 2020
@Qinusty
Copy link
Contributor

Qinusty commented Jan 30, 2020

This patch is extremely beneficial, any timeline towards getting this merged?

@buchgr
Copy link
Contributor

buchgr commented Feb 3, 2020

I unfortunately no longer work on Bazel and won't have the time to fix all the internal tests broken by this change.

@olib963
Copy link

olib963 commented Feb 21, 2020

Is there anyone that would be able to pick this up or is it possible for someone external to pick this work up? This would be a really helpful, we currently have to work around this by calling the exclusive tests ourselves after the tests that can be run in parallel.

@cocreature
Copy link

Hi, we’ve been using the patch from #8983 for a couple of months now without any issues. What I noticed is that this change enables both sandboxing and caching. Without knowing how the internal tests failed, could it be that you just need to add a local tag to disable sandboxing?

@mcwilson07
Copy link

We would be really interested in getting this into Bazel as well. We have tests that need to run exclusively but we occasionally miss dependencies that would be caught with sandboxing.

@stepango
Copy link

stepango commented Jun 4, 2020

Is there any timelines to get it fixed? If not how about introducing the flag to enable #8983 conditionally, in this case many teams could benefit from exclusive tests caching as well as keep internal tests green. Thoughts?

@JayThomason
Copy link

Here is a concrete use-case for why this bug is important:
We have a lot of tests that require usage of the GPU. These tests use the exclusive tag to prevent OOM issues that can arise when running multiple tests that require the GPU at once. Because of this bug these tests have to run every single time that we do a test CI build.

In my opinion this bug breaks one of the core features of bazel: targets are being built which do not need to be built because the dependencies did not change.

@JayThomason
Copy link

After some hours I was able to figure out a workaround for my use case.

We have a macro that wraps our test rule which I edited such that when the "exclusive" tag is set for a test instead of just creating that test target normally we actually create two test targets.

The first target is tagged exclusive as normal but also gets tagged gpu_test. This ensures that developers can still run bazel test foo/... and still have all their tests run as expected.

The second target is tagged manual and gpu_test. This test is effectively identical to the first but it will be excluded from any sort of foo/... queries in bazel.

The effect is that for CI we run bazel test with --test_tag_filters=-exclusive,-gpu_test to exclude the exclusive targets from our bazel test step while also adding a new test step where we query for the tests tagged as manual and gpu_test and pass those to bazel test with --jobs=1 to effectively run the tests one at a time while maintaining remote caching.

Note that the purpose of the gpu_test tag is to ensure that we aren't running other unnecessary manual tests in this step.

@philwo
Copy link
Member

philwo commented Sep 21, 2020

@coeuvre For reference, this was fixed in cl/260916180, but rolled back in cl/261644804.

dhalperi added a commit to batfish/batfish that referenced this issue Oct 13, 2020
This tells Bazel that the test requires 4 CPUs to run. On enormous machines,
tests will run in parallel, on laptops tests will run serially or maybe 2 at a
time. This strikes a better balance than exclusive tags.

It also enables caching of ref tests; there is a known issue that prevents
caching for `exclusive` tests: bazelbuild/bazel#3791
@coeuvre
Copy link
Member

coeuvre commented Oct 15, 2020

#8983 is rolled forward with fix as 5e5eb86:

  1. Add --incompatible_exclusive_test_sandboxed flag so users can enable this feature conditionally.
  2. With that flag enabled, users who want to run exclusive tests locally can add a 'local' tag.

@MikhailTymchukFT
Copy link

Which release contains this flag?

@coeuvre
Copy link
Member

coeuvre commented Dec 29, 2020

It should be contained in release 4.0. You can track the release here #12455.

@gasparev
Copy link
Contributor

Hi, I'm trying to fix this but I need to test the change on the downstream projects. From the doc, it seems that the labels incompatible-change and migration-ready are needed. Can someone add them?

@fmeum
Copy link
Collaborator

fmeum commented Nov 28, 2022

@gasparev Could you create a separate tracking issue as explained in the docs? You can then ping @meteorcloudy to have him add the necessary flags.

copybara-service bot pushed a commit that referenced this issue Dec 6, 2022
Solves #3791 #16871 bazel-contrib/SIG-rules-authors#40
Steps to do before merging:

- [x] Bazel checks green #16868
- [x] Downstream projects green https://buildkite.com/bazel/bazelisk-plus-incompatible-flags/builds/1348

Closes #16867.

PiperOrigin-RevId: 493257706
Change-Id: I0f46d092a47a7c08e2436183c9cdc2cd92a0c379
@gasparev
Copy link
Contributor

gasparev commented Dec 6, 2022

This can be closed after 23580aa

@coeuvre coeuvre closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests