Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Building py binaries remotely is broken ("rule is missing dependency declarations") #5087

Closed
jasharpe opened this issue Apr 24, 2018 · 21 comments
Labels
area-Windows Windows-specific issues and feature requests team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged

Comments

@jasharpe
Copy link

Description of the problem / feature request:

Building py binaries remotely on Windows (on the local remote worker) is broken, due to an error like this:

ERROR: C:/temp/_bazel_jsharpe/0vhzcpls/external/bazel_tools/third_party/ijar/BUILD:40:1: undeclared inclusion(s) in rule '@bazel_tools//third_party/ijar:zlib_client':
this rule is missing dependency declarations for the following files included by 'external/bazel_tools/third_party/ijar/zlib_client.cc':
  'C:/tmp/lre/build-908a6a0b-0ac0-457d-b58c-06ef6220bf72/external/bazel_tools/third_party/zlib/zconf.h'

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Build and run Local Remote Worker:
$ cd $home
$ git clone https://github.com/bazelbuild/bazel.git
$ cd bazel
$ bazel build //src/tools/remote:worker
$ bazel-out/x64_windows-fastbuild/bin/src/tools/remote/worker.exe --listen_port=8080 --work_path=C:\tmp\lre --debug
  1. Create an empty WORKSPACE and add these files: https://gist.github.com/jasharpe/bd4f346d3338abe430ef6022a6c85e39

  2. Demonstrate that it works locally:

$ bazel clean; bazel build :py_bin
  1. Demonstrate that it doesn't work remotely with flags from bazelrc:
$ bazel --bazelrc=repro.bazelrc clean; bazel --bazelrc=repro.bazelrc build :py_bin

Get error similar to:

ERROR: C:/temp/_bazel_jsharpe/0vhzcpls/external/bazel_tools/third_party/ijar/BUILD:40:1: undeclared inclusion(s) in rule '@bazel_tools//third_party/ijar:zlib_client':
this rule is missing dependency declarations for the following files included by 'external/bazel_tools/third_party/ijar/zlib_client.cc':
  'C:/tmp/lre/build-908a6a0b-0ac0-457d-b58c-06ef6220bf72/external/bazel_tools/third_party/zlib/zconf.h'

What operating system are you running Bazel on?

Windows Server 2016

What's the output of bazel info release?

release 0.12.0

(Note, I've also tried this with 0.11.1 and HEAD Bazel and it fails in the same way.)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

N/A

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

No, lots of "rule is missing dependency declarations" type of bugs, but nothing quite like this.

Any other information, logs, or outputs that you want to share?

No.

@jasharpe
Copy link
Author

jasharpe commented Apr 24, 2018

@jasharpe
Copy link
Author

jasharpe commented Apr 24, 2018

Some progress. It turns out that adding "third_party/zlib" prefix (similar to what is done in third_party/ijar, third_party/def_parser) to all the includes fixes the issue.

See this commit in my fork: jasharpe@8c016bf

I'm not sure if this is the correct fix (someone else with more context may be able to say), but sheds some light on the origin of the problem.

@jasharpe
Copy link
Author

More partial progress. The problem is here:

problems.assertProblemFree(action, sourceFile);

If you comment out this line then the build succeeds (without the change I have above). Something about this is overzealous for remote builds (or underzealous for local builds).

@jasharpe
Copy link
Author

jasharpe commented Apr 24, 2018

Ah, I have figured out the specific issue, but I'm not sure what the correct fix is.

We're hitting this "problem":

problems.add(execPathFragment.getPathString());

The execPathFragment here is something like C:/tmp/lre/build-9902c525-6e0e-4a2e-99f2-88256f891b4e/external/bazel_tools/third_party/zlib/zutil.h. Note that this is a path under the local remote worker work path.

This problem is not encountered locally because we fall into the "funky but tolerable" exception here (since the file when running locally is under the local exec root, but it isn't when executing remotely):

execPathFragment = execPath.relativeTo(execRoot); // funky but tolerable path

So, the question remains: what should be done about it? Is it WAI that we should have to add the "third_party/zlib" prefixes to includes so we don't fall into the "funky but tolerable" bucket? If we leave it in this state it's pretty hostile to remote execution use cases, since they're treated much more strictly and we may run into this problem again.

@meteorcloudy
Copy link
Member

I think we should make Bazel realize the remote exec root and use it during header checking.
@lberki can confirm if that's viable.
BTW, do we have the similar issue on Linux?

@lberki
Copy link
Contributor

lberki commented Apr 26, 2018

/cc @mhlopko

Well, it's definitely not trivial, but I think the right thing to do here is to make input validation work with remote execution. In a pinch, we could do with just not doing input validation and rely on the right files being shipped to the executor.

I haven't thought very deeply about the implementation, but CppCompileActionContext returning enough data to make sense of the .d files / /showIncludes output makes sense. An alternative would be do do the reported path -> exec path parsing remotely, but that would imply wrapper scripts, where I'd prefer not to go if at all possible.

@jasharpe
Copy link
Author

This include validation is blocking us pretty hard on building non-trivial things remotely on Windows.

Any possibility that a short-term fix like an experimental flag to disable include validation would be okay?

@meteorcloudy
Copy link
Member

@jasharpe Do you know why it's not an issue on Linux?

@jasharpe
Copy link
Author

@nlopezgi had some insight into how this works on Linux (AFAIK it is not an issue anymore).

@jasharpe
Copy link
Author

A workaround that doesn't involve making changes to Bazel itself is to add something like this to CROSSTOOL (path matching whatever is given to the local remote worker's --work_path flag):

cxx_builtin_include_directory: "C:\\tmp\\lre"

@meteorcloudy
Copy link
Member

cxx_builtin_include_directory: "C:\\tmp\\lre" should work, you can use it to unblock you. But it also means disabling all header checking, so I don't think what's what we want eventually.
I'm thinking maybe we can use a similar way as Linux to solve this problem. But I don't know how header checking on Linux remote execution works.
Can you explain a bit here? @nlopezgi

@nlopezgi
Copy link
Contributor

nlopezgi commented May 1, 2018

iirc, the headers file produced by gcc/clang in Linux have relative paths (e.g., instead of "C:/tmp/lre/build-908a6a0b-0ac0-457d-b58c-06ef6220bf72/external/bazel_tools/third_party/zlib/zconf.h" the line would be "external/bazel_tools/third_party/zlib/zconf.h"), if an only if, it was invoked in "the right way". I.e., invoking it from an absolute path resulted in a params file with absolute paths and invoking it in a relative path (from the bazel execroot) resulted in a params file with relative paths.

If the files are produced with relative paths, header checking in Linux verifies these relative paths as files in the Bazel runfiles tree, and for files in absolute paths (which are always outside of the runfiles tree) we use cxx_builtin_include_directory. We did have some issues with this check when we were doing hermetic toolchains, and at some point we did think we would have to include wrapper scripts to change paths, but we got around it by calling the compiler in "the right way".

Now, I don't know if this is possible for Windows. I think that issue here is that the folder in which the bazel command runs inside the remote worker (inside docker) should not be leaking in the outputs (in the params file), so we have two options:

  • Somehow get the compiler to produce the params file with relative paths
  • Use wrapper scripts as part of the toolchain to strip these absolute paths (i'd advice to try the first one to exhaustion before trying this second option)

@jasharpe
Copy link
Author

jasharpe commented May 1, 2018

For the record, here's a typical compile step (found with the --subcommands flag, imagining I have a "foo/repro.cc" that includes "foo/foo.h"):

    SET INCLUDE=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE;C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt;C:\Program Files (x86)\Windows Kits\8.1\include\shared;C:\Program Files (x86)\Windows Kits\8.1\include\um;C:\Program Files (x86)\Windows Kits\8.1\include\winrt;
    SET LIB=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64;
    SET PATH=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Windows\Microsoft.NET\Framework64\;C:\Program Files (x86)\Windows Kits\8.1\bin\x64;C:\Program Files (x86)\Windows Kits\8.1\bin\x86;;C:\Windows\system32
    SET PWD=/proc/self/cwd
    SET TEMP=C:\tmp
    SET TMP=C:\tmp
"C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe" /c foo/repro.cc /Fobazel-out/x64_windows-fastbuild/bin/foo/_objs/repro/foo/repro.o /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D_CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm500 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /I. /Ibazel-out/x64_windows-fastbuild/genfiles /Iexternal/bazel_tools /Ibazel-out/x64_windows-fastbuild/genfiles/external/bazel_tools /showIncludes /MD /Od /Z7 /DDEBUG

When you run this, it prints out a bunch of system includes, but also the problematic:

Note: including file: c:\tmp\lre\build-d31427fb-1ab1-4c21-b7ea-fdb610bda377\foo\foo.h

The /showIncludes flag is doing the magic. So far I haven't figured out a way to have it provide relative paths.

@jasharpe
Copy link
Author

jasharpe commented May 1, 2018

I tried out whether cl.exe cares whether it's invoked via absolute path or via the PATH (as Nick said gcc/clang care about), and it doesn't seem to make a difference to the output of /showIncludes.

@jasharpe
Copy link
Author

jasharpe commented May 2, 2018

So, here's a naive suggestion. If we were able to modify the cl.exe command to be this (note the cd on the first line):

cd
    SET INCLUDE=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE;C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt;C:\Program Files (x86)\Windows Kits\8.1\include\shared;C:\Program Files (x86)\Windows Kits\8.1\include\um;C:\Program Files (x86)\Windows Kits\8.1\include\winrt;
    SET LIB=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\LIB\amd64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64;
    SET PATH=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Windows\Microsoft.NET\Framework64\;C:\Program Files (x86)\Windows Kits\8.1\bin\x64;C:\Program Files (x86)\Windows Kits\8.1\bin\x86;;C:\Windows\system32
    SET PWD=/proc/self/cwd
    SET TEMP=C:\tmp
    SET TMP=C:\tmp
"C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/amd64/cl.exe" /c foo/repro.cc /Fobazel-out/x64_windows-fastbuild/bin/foo/_objs/repro/foo/repro.o /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D_CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm500 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /I. /Ibazel-out/x64_windows-fastbuild/genfiles /Iexternal/bazel_tools /Ibazel-out/x64_windows-fastbuild/genfiles/external/bazel_tools /showIncludes /MD /Od /Z7 /DDEBUG

Then we could trivially parse the exec root in https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/rules/cpp/ShowIncludesFilter.java and remove this directory from included headers wherever it appears as a prefix.

The issue with this is adding an extra command here doesn't seem that easy, or even possible. Maybe this is tantamount to a wrapper in any case.

@meteorcloudy
Copy link
Member

@jasharpe Sorry for the late reply.
I don't quite understand the above proposal. What does the cd command do? And how could ShowIncludesFilter.java know the exec root?

@jasharpe
Copy link
Author

Sorry, let me clarify. Running cd by itself in Windows prints out the current working directory. ShowIncludesFilter could read this output and then use it to strip off the current working directory (i.e. remote exec root).

I don't think this is actually feasible since it gets out of the realm of running a single program for an action.

Another possible solution off the top of my head could be to add a "remote exec root" field to ExecuteResponse returned by the remote execution API, and have Bazel use this.

@meteorcloudy
Copy link
Member

I like the second solution, it would also work for other platforms if needed.

@jasharpe
Copy link
Author

jasharpe commented Jun 5, 2018

Here's another idea. I just discovered the USE_MSVC_WRAPPER option for Windows CC configuration. We could likely add the absolute => relative transformation of the reported paths into the wrapper script, and then generate the CROSSTOOL for remote use with USE_MSVC_WRAPPER=1. This is attractive to me because it doesn't require figuring out how to do a wrapper from scratch, and doesn't require an API change (instead, what is reported back from the worker is already agnostic to execution root).

Is there some reason I shouldn't use USE_MSVC_WRAPPER?

@meteorcloudy
Copy link
Member

@jasharpe I not sure we should re-enable the wrapper script again.
The python wrapper script was introduced a long time ago because we didn't have good CROSSTOOL support. I and Marcel worked hard to remove it. Currently, the only reason it still exists is that TensorFlow needs it for their Windows GPU build (for transforming options for nvcc compiler). And my plan is to move it to TensorFlow repo (they already have similar thing for Linux GPU build) so that we don't have wrapper scripts in Bazel at all.

The main reason we don't like it is because it's slow. I had a 40% build time reduce for //src/main/cpp:client after getting rid of it. Besides, it adds Python as a dependency.

@jin jin added area-Windows Windows-specific issues and feature requests and removed team-Rules-Python Native rules for Python labels Feb 20, 2019
@katre
Copy link
Member

katre commented May 13, 2020

Closed for lack of activity. Please ping the reviewers to reopen if this is still a problem.

@katre katre closed this as completed May 13, 2020
@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website untriaged
Projects
None yet
Development

No branches or pull requests

8 participants