Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Windows support #144

Merged
merged 17 commits into from
Jan 18, 2019
Merged

Add Windows support #144

merged 17 commits into from
Jan 18, 2019

Conversation

jchv
Copy link
Contributor

@jchv jchv commented Oct 7, 2018

This PR makes ibazel build and run on Windows. Tests do not pass on Windows, and Process Group is not tested at all (at least not directly) so it is definitely not quite ready. I am posting it mostly to get a litmus test on the approach as well as to potentially sync efforts.

Issue: #105.

@jchv jchv force-pushed the windows-support branch from d8b799f to 31fffd0 Compare October 7, 2018 19:38
@dslomov dslomov self-assigned this Oct 8, 2018
@dslomov dslomov requested a review from achew22 October 8, 2018 12:57
@dslomov dslomov removed their assignment Oct 8, 2018
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of all the commit author(s), set the cla label to yes (if enabled on your project), and then merge this pull request when appropriate.

@googlebot googlebot added cla: no and removed cla: yes labels Oct 8, 2018
@meteorcloudy
Copy link
Member

The tests are failing with

2018/10/09 00:19:16 could not change to test directory: chdir C:\users\b\_bazel_b\dnil4uob\execroot\__main__\bazel-out\x64_windows-fastbuild\bin\bazel\windows_amd64_stripped\go_default_test.exe.runfiles\__main__\bazel: The system cannot find the file specified.

On Windows we don't have runfiles tree enabled by default, it's recommend to use a runfiles library to access all the data files.
You can also try to use `--experimental_enable_runfiles" flag, maybe that will make the test pass.

@achew22
Copy link
Member

achew22 commented Oct 19, 2018

bazel test --show_progress_rate_limit=5 --curses=yes --color=yes --keep_going --jobs=32 --build_event_json_file=D:\temp\tmp3dc_dunz\test_bep.json --experimental_build_event_json_file_path_conversion=false --announce_rc --sandbox_tmpfs_path=/tmp --experimental_multi_threaded_digest --flaky_test_attempts=3 --build_tests_only --local_test_jobs=8 --test_tag_filters=-nowindows --experimental_enable_runfiles ...
  | INFO: Options provided by the client:
  | Inherited 'common' options: --isatty=0 --terminal_columns=80
  | INFO: Options provided by the client:
  | Inherited 'build' options: --python_path=C:/python3/python.exe
  | INFO: Reading rc options for 'test' from d:\b\bk-worker-windows-java8-w5ct\bazel\bazel-watcher\tools\bazel.rc:
  | Inherited 'build' options: --workspace_status_command=tools/workplace_status.sh
  | ERROR: building runfiles is not supported on Windows
  | ERROR: Build options are invalid
  | INFO: Elapsed time: 0.398s
  | INFO: 0 processes.
  | ERROR: Couldn't start the build. Unable to run tests
  | FAILED: Build did NOT complete successfully (0 packages loaded)
  | At least one test failed or was flaky.

@achew22
Copy link
Member

achew22 commented Oct 19, 2018

@meteorcloudy Do you know what's up with the | ERROR: building runfiles is not supported on Windows error output from Bazel on Windows?

@meteorcloudy
Copy link
Member

Looks like we're still using Bazel 0.17.2 on CI, but building runfiles tree on Windows only works with Bazel 0.18.0 or later. I'll ping you when it's upgraded.

@meteorcloudy
Copy link
Member

Hi @achew22, Bazel has been upgraded to 0.18.0 on CI, I'm rerunning the test on Windows

@meteorcloudy
Copy link
Member

Three tests are passing now

//bazel:go_default_test                                                  PASSED in 0.7s
//ibazel:go_default_test                                                 PASSED in 0.8s
//ibazel/output_runner:go_default_test                                   PASSED in 0.7s

The rest are failing with:

panic: unable to find file "ibazel/windows_amd64_pure_stripped/ibazel"

Is it supposed to be ibazel.exe on Windows?

@meteorcloudy
Copy link
Member

Another question, does ibazel itself needs the runfiles symlink tree to run? Or it's jsut needed for the tests.

@achew22
Copy link
Member

achew22 commented Oct 23, 2018

Another question, does ibazel itself needs the runfiles symlink tree to run? Or it's jsut needed for the tests.

No, ibazel takes all its data dependencies and converts them to string variables in a go_embed_data rule which means I shouldn't have any production runtime deps. I wrote this in go so it could be statically linked and distributed without any .sos.

bazel-io pushed a commit to bazelbuild/bazel that referenced this pull request Oct 26, 2018
--script_path should write a batch file instead of bash file on Windows.

Related: bazelbuild/bazel-watcher#144
(ibazel uses --script_path)

RELNOTES: None
PiperOrigin-RevId: 218828314
@meteorcloudy
Copy link
Member

meteorcloudy commented Oct 26, 2018

Hi @jchv , I debugged ibazel on Windows with your PR, and found ibazel run still doesn't work on Windows and it requires fixes from both Bazel and ibazel.

From Bazel side:
We need to fix --script_path on Windows, because it always writes a bash script, but we need a batch script for Windows.
It is already done at bazelbuild/bazel@4634c20. Please build a bazel from HEAD and use it for ibazel

From ibazel side:

John, since you authored this change, can you help debug and make ibazel run work with Bazel@HEAD?

@jchv
Copy link
Contributor Author

jchv commented Oct 26, 2018

Sure, I'll take another look this weekend. I had been using ibazel test which was at some point working for me, so I didn't catch that ibazel run wasn't working.

There's probably bugs in process_group_windows.go, but those aside I also need to go and clean up the error handling, which is probably not sufficient at the moment.

@jchv
Copy link
Contributor Author

jchv commented Nov 3, 2018

Sorry for lack of updates. I went at it again this weekend, but I'm sad to say I don't know how to progress.

So, I updated to Bazel HEAD today, and ran into some new issues, specifically in @io_bazel_rules_go//:go_context_data.

Starlark APIs accessing compilation flags has been removed. Use the new API on cc_common (see --incompatible_disable_legacy_flags_cc_toolchain_api onhttps://docs.bazel.build/versions/master/skylark/backward-compatibility.html#disable-legacy-c-toolchain-api for migration notes).

I re-enabled it. Then got this:

Starlark APIs accessing compilation flags has been removed. Use the new API on cc_common (see --incompatible_disable_legacy_flags_cc_toolchain_api onhttps://docs.bazel.build/versions/master/skylark/backward-compatibility.html#disable-legacy-c-toolchain-api for migration notes).

Sadly, looks like this one can't be re-enabled.

I was hoping maybe there was a version of Bazel I could use before this deprecation, so I went ahead and built bazel at ea225fe. This didn't seem to work either, same exact issues. I continued on with this version of Bazel.

The problem was fixed in io_bazel_rules_go 0.6.1. I tried that:

ERROR: C:/bazel/n7nxee47/external/com_google_protobuf/BUILD:70:1: C++ compilation of rule '@com_google_protobuf//:protobuf_lite' failed (Exit 2)
cl : Command line error D8021 : invalid numeric argument '/Wwrite-strings'

I traced it down to this:

COPTS = select({
    ":msvc" : MSVC_COPTS,
    "//conditions:default": [
        "-DHAVE_PTHREAD",
        "-Wall",
        "-Wwrite-strings",
        "-Woverloaded-virtual",
        "-Wno-sign-compare",
        "-Wno-unused-function",
        # Prevents ISO C++ const string assignment warnings for pyext sources.
        "-Wno-writable-strings",
    ],
})

:msvc is

config_setting(
    name = "msvc",
    values = { "compiler": "msvc-cl" },
)

OK, so maybe if I explicitly specify --compiler=msvc-cl it will work?

Well, I got past that issue. But then:

ERROR: C:/projects/go/src/github.com/bazelbuild/bazel-watcher/npm/BUILD:17:1: Executing genrule //npm:package failed (Exit 2)
panic: The version string was not overriden. Please rebuild with --stamp

Adding --stamp does not help.

This is where I am currently at. Sadly, trying to get anything working on Windows is an exercise in frustration at the moment. I suppose in order to progress further, there are more upstream changes that will need to be made, and right now I do not have the expertise to trace this any further.

I might try to take another look at the problem later, but if anyone has any insight as to how I can move forward it would be appreciated.

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

@jchv, Thanks so much for the work you've done here! I can't tell you how much I appreciate you putting in so much effort.

I just spent the last hour doing a bit of cleanup around the repo and I think some of these problems should be fixed for you. For example, I just merged #153 which I believe fixes the --stamp problem. It turns out that Windows doesn't have a .sh interpreter so you have to use python. Who knew? I also moved the .bazelrc file into the correct location so that if you're using a modern bazel it will behave correctly (#154). I upgraded rules_go past the version you specified to 0.16.1 in (#152).

I think this fixes a few more of the problems but there are almost certainly more lingering around.

I might try to take another look at the problem later, but if anyone has any insight as to how I can move forward it would be appreciated.

@meteorcloudy is the person on the Bazel team who is responsible for Bazel on Windows and he has been really helpful to me in debugging issues. I think I fixed the stamp problem but if you see something else please comment on here and we can figure it out together!

Really, thank you so much! I don't think I could have done this.

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

For the record, we're back to

-----------------------------------------------------------------------------
goroutine 5 [running]:
runtime/debug.Stack(0xc000020200, 0xc00000a1e0, 0x57)
	GOROOT/src/runtime/debug/stack.go:24 +0xae
runtime/debug.PrintStack()
	GOROOT/src/runtime/debug/stack.go:16 +0x29
github.com/bazelbuild/bazel-watcher/e2e.(*IBazelTester).ExpectOutput(0xc000475e10, 0x70c497, 0x1e)
	e2e/ibazel.go:72 +0x438
github.com/bazelbuild/bazel-watcher/e2e/live_reload.TestLiveReload(0xc000020200)
	e2e/live_reload/live_reload_test.go:79 +0x30b
testing.tRunner(0xc000020200, 0x719598)
	GOROOT/src/testing/testing.go:827 +0xc6
created by testing.(*T).Run
	GOROOT/src/testing/testing.go:878 +0x35a
--- FAIL: TestLiveReload (10.01s)
    ibazel.go:71: Expected iBazel output after 10s to be:
        Wanted [Live reload url: http://.+:\d+], got []
    live_reload_test.go:81: Output: ''
panic: runtime error: slice bounds out of range
	panic: TerminateProcess: Access is denied. [recovered]
	panic: TerminateProcess: Access is denied.

https://storage.googleapis.com/bazel-buildkite-artifacts/2c8cf4c0-f0d3-43a4-af93-4d8937bac77e/e2e%5Clive_reload%5Cgo_default_test%5Cbazel0.15.2%5Cattempt_1.log

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

golang/go#5615 might have some more info that's useful to us here.

That is expected - if you start process, you get permission to kill it by default.
I think, if there are no objections, I will try to do what I suggested in #1. I won't be
able to try for a few days.

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

I'm wondering if this might be a problem that is finicky enough that we don't care to solve it. Poking around at the error I think it comes from the test case triggering an event in iBazel such that it needs to restart the process (and thus kills it). This wouldn't be a problem normally, but we are using process groups which means everything is hard. Maybe we can enhance our permissions when we start the process. I will now do something that I never thought I would ever do -- link to some MS win32api documentation. Shudder

https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-openprocess

Looks like there is a permission that can be passed at construction time where you can allow TERMINATE_PROCESS.

We may have to use the windows speific golang api to do this https://godoc.org/golang.org/x/sys/windows which exports that value.

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

https://godoc.org/golang.org/x/sys/windows#OpenProcess is the API that takes that permission

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

I was hoping for a windows specific SysProcAttr, but I don't think we are so lucky

https://godoc.org/golang.org/x/sys/windows#SysProcAttr

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

Looks like there are two problems

  1. The ibazel.go e2e helper isn't properly killing subtasks in Windows
  2. The e2e helper is not hooking up reading stdout properly which means that GetOutput() isn't working. One thing to check is to expand the ExpectedOutput method to panic if the expected output isn't received. If that happens it feels like printing out all the process information is not a terrible next step.

Control flow that is causing this problem:

  1. Enter test
  2. Launch iBazel
  3. Defer a kill of iBazel
  4. GetOutput() fails I think this is the key right here.
  5. The defer is executed which then gives us the following stacktrace
goroutine 19 [running]:
testing.tRunner.func1(0xc00013c100)
	GOROOT/src/testing/testing.go:792 +0x38e
panic(0x6b1820, 0xc0000046e0)
	GOROOT/src/runtime/panic.go:513 +0x1c7
github.com/bazelbuild/bazel-watcher/e2e.(*IBazelTester).Kill(0xc000473e10)
	e2e/ibazel.go:105 +0x6a
runtime.Goexit()
	GOROOT/src/runtime/panic.go:397 +0xfd
testing.(*common).FailNow(0xc00013c100)
	GOROOT/src/testing/testing.go:590 +0x40
testing.(*common).Fatal(0xc00013c100, 0xc000473d10, 0x1, 0x1)
	GOROOT/src/testing/testing.go:628 +0x76
github.com/bazelbuild/bazel-watcher/e2e/live_reload.TestLiveReload(0xc00013c100)
	e2e/live_reload/live_reload_test.go:84 +0x1008
testing.tRunner(0xc00013c100, 0x7195c8)
	GOROOT/src/testing/testing.go:827 +0xc6
created by testing.(*T).Run
	GOROOT/src/testing/testing.go:878 +0x35a

@achew22
Copy link
Member

achew22 commented Nov 4, 2018

With even more staring at the log output:

    ibazel.go:71: Expected iBazel output after 10s to be:
        Wanted [Live reload url: http://.+:\d+], got []

Looks like there is something hinky there. On the plus side, it is actually running in CI now. It's really late at night for me right now so I'm going to go to sleep and contemplate this in the morning. STDOUT isn't hooked up in Windows? Is that possible?

@meteorcloudy
Copy link
Member

@achew22 Thanks for making the --workspace_command work on Windows! We also need to fix the script file extension with

	suffix := ""
	if runtime.GOOS == "windows" {
		suffix = ".bat"
	}

@jchv Yes, the protoc compile failure can be work around by --compiler=msvc-cl. We have to update protobuf version to include fixes from @scentini https://github.com/protocolbuffers/protobuf/commits?author=scentini

With addition fixes from @achew22 , I think you'll be able to move forward now. Please ping me if you have any other question.

@achew22
Copy link
Member

achew22 commented Nov 5, 2018

I just did the .bat suffix code in #158

Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm, thank you so much for the fix!!!

@achew22
Copy link
Member

achew22 commented Jan 14, 2019

@meteorcloudy, I'm not sure that's the case. It looks like the CI server isn't handling things correctly. It only ran 4 of the tests and none of the e2e tests were included. I'm aware that it is due to an incompatible change in Bazel, but it really seems like that should result in a CI failure.

INFO: 4 processes: 4 local.
//bazel:go_default_test                                                  PASSED in 0.6s
//ibazel:go_default_test                                                 PASSED in 0.7s
//ibazel/command:go_default_test                                         PASSED in 0.6s
//ibazel/output_runner:go_default_test                                   PASSED in 0.6s
 
Executed 4 out of 4 tests: 4 tests ``

@meteorcloudy
Copy link
Member

@achew22 Yes, that is definitely a Bazel bug, filed bazelbuild/bazel#7115

To make the test work again, the fastest way is to fix bazel_integration_testing and upgrade it in bazel-watcher.

As for ibazel on Windows, I manually tested ibazel run and ibazel test on Windows, they both work quite well!

@meteorcloudy
Copy link
Member

@achew22 Can you help fixing bazel_integration_testing? I'm currently busy with other CI breakages.

@achew22
Copy link
Member

achew22 commented Jan 14, 2019

Did the e2e tests pass on your box?

@meteorcloudy
Copy link
Member

Still need to get e2e tests working. That work depends on bazelbuild/bazel-integration-testing#96

As @jchv pointed out, we need to make bazel-integration-testing work on Windows first

@meteorcloudy
Copy link
Member

meteorcloudy commented Jan 14, 2019

The e2e test are not even running on Linux, I think that's due to the incompatible change. Let's not merge this PR until we enable them again.

@meteorcloudy
Copy link
Member

@jchv Can you rebase to HEAD so that we can trigger the presubmit again.

@achew22
Copy link
Member

achew22 commented Jan 16, 2019

@meteorcloudy I was able to trigger this from my side. Thanks for pinging the PR

@meteorcloudy
Copy link
Member

@achew22 Thanks! Looks like the Windows failures are expected, e2e tests are failing because bazel_integration_test doesn't work on Windows yet.

@jchv
Copy link
Contributor Author

jchv commented Jan 17, 2019

I'm interested in getting the e2e tests working, but in the meantime should we mark the tests to not run on Windows? There's really no need to rush to get this PR ready to merge imo, but some Windows support is probably better than none, and it seems movements to the other projects may take a long time.

(I still have a couple of other things to fix before I think it could possibly be reviewed and merged, namely the tests are still hacky.)

@jchv jchv mentioned this pull request Jan 17, 2019
@achew22
Copy link
Member

achew22 commented Jan 17, 2019

@jchv TBH, I'm pretty reluctant to release into windows without any real test coverage. I know that a small amount comes in from the tests in //ibazel/... but if someone were to file a bug against me on Windows I would be shooting in the dark as to what happened.

Since I can only release to windows one time and I think there will be a flood of people (50-100 ppl), I don't want to leave a bad taste in their mouth for their first, and probably only if things go poorly, experience with ibazel.

@jchv
Copy link
Contributor Author

jchv commented Jan 18, 2019

SGTM. Maybe to help reduce code rot I could at least split some of the less Windows-related work into other PRs to reduce the surface area, though. I am admittedly paranoid of having a large and invasive rebase down the line if it takes as long as I'm imagining to fix the upstream e2e test problems, since it's pending on something that seems to have been stalled for a while.

@meteorcloudy
Copy link
Member

@achew22 Merging this PR doesn't mean Windows support is ready and we have to do a release for Windows, right? We can still keep #105 open until e2e tests are also enabled for Windows.

I think merging this change now could reduce the rebasing work @jchv had to do. Also, we'll at least have some test coverage on Bazel CI to prevent any breakage from Bazel itself. I just enabled Bazel watcher in downstream and it already caught a Bazel remote execution bug.

@achew22
Copy link
Member

achew22 commented Jan 18, 2019

You two are both totally correct and I'm 100% wrong. I'm going to tag the tests as nowindows and merge it then we can add the release of the exe in another pr when the e2e is good to go.

@achew22 achew22 added cla: yes and removed cla: no labels Jan 18, 2019
@googlebot
Copy link

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

@achew22
Copy link
Member

achew22 commented Jan 18, 2019

Dear CLA bot, @jchv and @achew22 are both Google employees and we have both signed the CLA.

@jchv, thanks so much for all your hard work on this! I'm really excited to get to a place where we are releasing windows binaries.

@achew22 achew22 merged commit da33b5a into bazelbuild:master Jan 18, 2019
luca-digrazia pushed a commit to luca-digrazia/DatasetCommitsDiffSearch that referenced this pull request Sep 4, 2022
    --script_path should write a batch file instead of bash file on Windows.

    Related: bazelbuild/bazel-watcher#144
    (ibazel uses --script_path)

    RELNOTES: None
    PiperOrigin-RevId: 218828314
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants