Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules_go tests are failing on Windows with Bazel 1.0 #9995

Closed
meteorcloudy opened this issue Oct 11, 2019 · 23 comments
Closed

Rules_go tests are failing on Windows with Bazel 1.0 #9995

meteorcloudy opened this issue Oct 11, 2019 · 23 comments
Labels
area-Windows Windows-specific issues and feature requests breakage P1 I'll work on this now. (Assignee required) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug

Comments

@meteorcloudy
Copy link
Member

https://buildkite.com/bazel/rules-go-golang/builds/1491

Executing tests from //tests/legacy/examples/stamped_bin:stamped_test
-----------------------------------------------------------------------------
--- FAIL: TestBuild (1.83s)
    stamped_test.go:101: Starting local Bazel server and connecting to it...
        Server crashed during startup. Now printing c:\users\b\_bazel_b\ibnjdh6i\server\jvm.out
        I/O Error: C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto.tmp -> C:/users/b/_bazel_b/ibnjdh6i/server/server_info.rawproto (Permission denied)
        exit status 37
--- FAIL: TestBuildWithoutStamp (1.83s)
    stamped_test.go:111: expected tests to have failed (instead got exit code 37)
FAIL
cleanup error: remove D:\b\jylqiqkr\bazel_testing\bazel_go_test\main: The process cannot access the file because it is being used by another process

Bazel 1.0 crashes inside the test due to some permission denied error.

We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.

@meteorcloudy meteorcloudy added type: bug breakage area-Windows Windows-specific issues and feature requests release blocker labels Oct 11, 2019
@meteorcloudy
Copy link
Member Author

@dslomov This is probably a release blocker for 1.0.1, but I'm still investigating the true culprit.

@meteorcloudy
Copy link
Member Author

b7f5605 is probably the culprit.

@meteorcloudy
Copy link
Member Author

But I found Bazel@HEAD didn't fail, so there must be a commit after 1.0 that already fixed the problem.

@meteorcloudy
Copy link
Member Author

Another bisect told me b698e20 fixed this issue. We'll need to either revert b7f5605 or cherry-pick b698e20 into 1.0

@meteorcloudy
Copy link
Member Author

/cc related parties
@dslomov for 1.0.1 release
@philwo for Bazelisk improvement to catch similar problem in future
@michajlo for the original culprit
@laszlocsomor for the fix
@jayconrod for the rules_go owner

@meteorcloudy
Copy link
Member Author

What I still don't understand is how did b7f5605 caused the failure and how did b698e20 fix it...

@meteorcloudy meteorcloudy added the P1 I'll work on this now. (Assignee required) label Oct 11, 2019
@ozio85
Copy link

ozio85 commented Oct 14, 2019

I can just confirm this happening for me as well. Occationally on my local machine, but every time on the server.

@dslomov
Copy link
Contributor

dslomov commented Oct 14, 2019

Should this block 1.1? (#9982)

@meteorcloudy
Copy link
Member Author

Yes, although this issue is not happening from HEAD since b698e20, but I don't think that's a proper fix.

@laszlocsomor
Copy link
Contributor

I've seen this bug on local builds too. I have no stable repro, but once it hits, it stays; you must remove the ".rawproto" file to fix it.

@meteorcloudy
Copy link
Member Author

I have a stable reproduce locally and found a fix for this. Sending a CL now.

dslomov pushed a commit that referenced this issue Oct 14, 2019
renameTo is sometimes flaky on Windows, moveFile is a more robust implementation.

Fixes #9995

RELNOTES: None
PiperOrigin-RevId: 274539649
@michajlo michajlo reopened this Oct 14, 2019
@michajlo
Copy link
Contributor

The mv there is supposed to be atomic, moveFile isn't. Can we find another way to work around the issue?

@laszlocsomor
Copy link
Contributor

Reopening because of a rollback: ac99c29

@jayconrod
Copy link
Contributor

@meteorcloudy I wanted to follow up on your original comment:

We didn't catch this in downstream test because the rules_go tests invoke bazel from PATH. On CI, it would actually be Bazelisk, but Bazelisk always points to the latest stable version instead of using Bazel@HEAD.

Is there a way that rules_go tests can invoke the version of Bazel being tested? From an example test log, I see that bazel is invoked directly as D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe.

I remember running into an issue like this a long time ago back on Jenkins, and there used to be a BAZEL environment variable. I'd be happy to make the tests use something like that if it were added back, but it might be easiest just to add that temp directory to the front of PATH, assuming there's nothing else in the way.

@meteorcloudy
Copy link
Member Author

meteorcloudy commented Oct 18, 2019

So currently, we set USE_BAZEL_VERSION=D:\temp\tmpzmgcwe5d\bazel-bin\src\bazel.exe for downstream pipeline. If you just invoke Bazel from path, Bazelisk will correctly invoke the bazel binary being tested. You don't have to do anything now. :)

tomaszstrejczek added a commit to tomaszstrejczek/bazel that referenced this issue Oct 20, 2019
Signed-off-by: Tomasz Strejczek <[email protected]>
@ozio85
Copy link

ozio85 commented Oct 21, 2019

Why was this fix not included in 1.1? :(

Edit: It seems like the fix is in.
However, if the file exists from a previous version, Bazel won't recover. I think it is a bit unstable to assume there is no file? Maybe check if serverInfoFile exists, and if it is, delete it first?

If Bazel crashes before "deleteAtExit" happens, this unrecoverable error (or manual deletion is required) happens.

@meteorcloudy
Copy link
Member Author

meteorcloudy commented Oct 22, 2019

618e5a2 is included in 1.1, the moveFile function always try to delete the target file before moving that's why switching to this function will work.
Rules_go is green now: https://buildkite.com/bazel/rules-go-golang

@ozio85
Copy link

ozio85 commented Oct 22, 2019

I just did a quick test in Bazel 1.1 on Windows:

touch /path/to/workspace/server/server_info.rawproto
bazel build //...

Starting local Bazel server and connecting to it...
Server crashed during startup. Now printing /path/to/workspace/server\jvm.out
I/O Error: /path/to/workspace/server/server_info.rawproto.tmp -> /path/to/workspace/server/server_info.rawproto (Permission denied)

@meteorcloudy
Copy link
Member Author

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel clean --expunge
INFO: Invocation ID: b3f448cb-677e-4372-9533-50a08d489c76
INFO: Reading 'startup' options from c:\tools\msys64\home\pcloudy\.bazelrc: --output_user_root=C:/src/tmp
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=320
INFO: Options provided by the client:
  Inherited 'build' options: --python_path=C:/Python36/python.exe
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\workspace\my_tests\bazel\.bazelrc:
  Inherited 'build' options: --android_aapt=aapt2
INFO: Reading rc options for 'clean' from c:\tools\msys64\home\pcloudy\.bazelrc:
  Inherited 'build' options: --curses=yes --color=yes --verbose_failures --announce_rc --disk_cache=C:/src/tmp/bazel_disk_cache
INFO: Starting clean.
WARNING: Waiting for server process to terminate (waited 5 seconds, waiting at most 60)
WARNING: Waiting for server process to terminate (waited 10 seconds, waiting at most 60)

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ mkdir -p /c/src/tmp/wa6d46al/server

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ touch /c/src/tmp/wa6d46al/server/server_info.rawproto

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ ls /c/src/tmp/wa6d46al/server/server_info.rawproto
/c/src/tmp/wa6d46al/server/server_info.rawproto

pcloudy@pcloudy0-w MSYS ~/workspace/my_tests/bazel
$ bazel
Starting local Bazel server and connecting to it...
                                                           [bazel release 1.1.0]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  aquery              Analyzes the given targets and queries the action graph.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  coverage            Generates code coverage report for specified test targets.
  cquery              Loads, analyzes, and queries the specified targets w/ configurations.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  license             Prints the license of this software.
  mobile-install      Installs targets to mobile devices.
  print_action        Prints the command line args for compiling a file.
  query               Executes a dependency graph query.
  run                 Runs the specified target.
  shutdown            Stops the bazel server.
  sync                Syncs all repositories specified in the workspace file
  test                Builds and runs the specified test targets.
  version             Prints version information for bazel.

Getting more help:
  bazel help <command>
                   Prints help and options for <command>.
  bazel help startup_options
                   Options for the JVM hosting bazel.
  bazel help target-syntax
                   Explains the syntax for specifying targets.
  bazel help info-keys
                   Displays a list of keys used by the info command.

@meteorcloudy
Copy link
Member Author

@ozio85 Are you sure you are using 1.1? I don't have this problem when I tried it.

@ozio85
Copy link

ozio85 commented Oct 22, 2019

I too can run bazel, you have to run an actual command, like build or clean.

Edit: clean seems to work though..

Edit 2: Hmm.. after running clean, it seems like it is working again. I will keep investigating if it has something todo with first running 0.29.1, then 1.1.0

@ozio85
Copy link

ozio85 commented Oct 24, 2019

My error! I had by mistake uploaded Bazel 1.0.0 to our artifactory, but named it Bazel 1.1.0 .. so i was running 1.1.0 locally, but the server ran 1.0.0..! Sorry for the trouble!

@meteorcloudy
Copy link
Member Author

No worries at all, glad it worked! \o/

@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this issue Sep 4, 2022
    Fixes bazelbuild/bazel#9995

    RELNOTES: None
    PiperOrigin-RevId: 274772226
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this issue Sep 4, 2022
    *** Reason for rollback ***

    This operation is supposed to be atomic. The windows flakiness should be addressed more
    directly.

    *** Original change description ***

    Use FileSystemUtils.moveFile instead of renameTo function of Path

    renameTo is sometimes flaky on Windows, moveFile is a more robust implementation.

    Fixes bazelbuild/bazel#9995

    RELNOTES: None
    PiperOrigin-RevId: 274606746
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests breakage P1 I'll work on this now. (Assignee required) team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website type: bug
Projects
None yet
Development

No branches or pull requests

7 participants