Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Could not find module" error on Windows with remote caching #1260

Closed
aherrmann opened this issue Mar 4, 2020 · 5 comments · Fixed by #1281
Closed

"Could not find module" error on Windows with remote caching #1260

aherrmann opened this issue Mar 4, 2020 · 5 comments · Fixed by #1281

Comments

@aherrmann
Copy link
Member

Describe the bug
On Windows with remote caching enabled it can happen that builds fail with an error of the form

ERROR: D:/a/1/s/compiler/damlc/BUILD.bazel:112:1: HaskellBuildLibrary //compiler/damlc:damlc-lib failed (Exit 1)
compiler\damlc\lib\DA\Cli\Damlc\IdeState.hs:18:1: error:
    Could not find module `DA.Daml.Compiler.Scenario'
    Use -v to see a list of the files searched for.
   |
18 | import qualified DA.Daml.Compiler.Scenario as Scenario
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Even though a build without remote cache enabled passes.

I.e. that target should build without error. The modules that the compiler could not find are in fact present. The fact that a non-cached build passes suggests that the cache is poisoned or Bazel fetches from cache improperly.

To Reproduce
Unfortunately, I haven't found a way to reproduce the issue. On the daml repository this happens occasionally when we update rules_haskell.

Expected behavior
The target should build without error whether or not remote caching is enabled.

Environment

  • OS name + version: Windows Server 2016
  • Bazel version: 2.0.0
  • Version of the rules: eaa8985

Additional context
Looking at the contents of exposed-modules-<target>, <target>.static.manifest, or the generated package configuration, reveals that no modules or object files are listed in any of these. These files are generated by actions that take the objects_dir or interfaces_dir as an input. I.e. they depend on directories rather than files. It seems that the contents of these directories are lost somewhere between the remote cache and the build.

I have tried adding the target's haskell source files as inputs to the corresponding actions. However, this did not fix the issue.

I have tried to reproduce the issue on a simpler example, but was unsuccessful.

One possibility to avoid this issue could be to generate the exposed modules list and object manifest files directly during the compilation action, so that they don't depend on a directory, but rather directly on the source files. It is unclear if this would just shift the error to the linking step though. Another possibility would be to explicitly track the object and interface files rather than only the directories containing them. This is more difficult as their paths are dependent on the module names and not the source file paths and therefore cannot generally be predicted within Bazel. However, the compilation action could contain a post-processing step that moves these files into the positions that Bazel expects.

@aherrmann
Copy link
Member Author

I was able to reproduce this issue with a rules_haskell update on the daml repository from revision eaa8985 to 14f61c4. The failing target was //libs-haskell/bazel-runfiles with the error message

ERROR: D:/a/1/s/libs-haskell/bazel-runfiles/BUILD.bazel:6:1: HaskellBuildLibrary //libs-haskell/bazel-runfiles:bazel-runfiles failed (Exit 1)

libs-haskell\bazel-runfiles\src\DA\Bazel\Runfiles.hs:13:1: error:
    Could not find module `Bazel.Runfiles'
    Use -v to see a list of the files searched for.
   |
13 | import qualified Bazel.Runfiles
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The execution log reveals that the following files are empty:

  • bazel-out/x64_windows-opt/bin/external/rules_haskell/tools/runfiles/exposed-modules-runfiles
  • bazel-out/x64_windows-opt/bin/external/rules_haskell/tools/runfiles/_obj/runfiles.static.manifest

Their hash is e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 which is the hash returned by sha256sum /dev/null. We also see that the compile action for //libs-haskell/bazel-runfiles does not receive an input for bazel-out/x64_windows-opt/bin/external/rules_haskell/tools/runfiles/externalZSrulesZUhaskellZStoolsZSrunfilesZSrunfiles/_iface/Bazel/Runfiles.hi. I.e. it is as if the compile action did not compile any modules.

To test if this is a reproducible issue in rules_haskell I've rerun the build on rules_haskell revision eaa8985 and introduced a trivial change to @rules_haskell//tools/runfiles to force a rebuild. The execution log shows that runfiles.static.manifest lists the expected object file and the compile action for //libs-haskell/bazel-runfiles does indeed receive an input for bazel-out/x64_windows-opt/bin/external/rules_haskell/tools/runfiles/externalZSrulesZUhaskellZStoolsZSrunfilesZSrunfiles/_iface/Bazel/Runfiles.hi. Meaning the issue was not reproducible with the older revision. So, some transient error caused a past build to upload empty outputs into the remote cache.

The object and interface outputs of the compilation action are only tracked as directories (ctx.actions.declare_directory). I.e. Bazel does not know exactly which artifacts should be produced.

@aherrmann
Copy link
Member Author

I have encountered such an issue on the rules_haskell Windows CI pipeline on PR #1410, the changes should not have caused that error and after a rerun CI did indeed succeed.

Azure build log
Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Build completed successfully, 38 total actions
Loading: 
Loading: 0 packages loaded
Analyzing: 9 targets (3 packages loaded, 0 targets configured)
INFO: Analyzed 9 targets (3 packages loaded, 12 targets configured).
INFO: Found 8 targets and 1 test target...
[0 / 3] [Prepa] BazelWorkspaceStatusAction stable-status.txt
SUBCOMMAND: # //tests/package-id-clash-binary/b:foo [action 'HaskellBuildLibrary //tests/package-id-clash-binary/b:foo', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
  SET LANG=C.UTF-8
  bazel-out/host/bin/haskell/ghc_wrapper.exe bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/b/compile_flags_foo_HaskellBuildLibrary bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/b/extra_args_foo_HaskellBuildLibrary
SUBCOMMAND: # //tests/package-id-clash-binary/a:foo [action 'HaskellBuildLibrary //tests/package-id-clash-binary/a:foo', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
  SET LANG=C.UTF-8
  bazel-out/host/bin/haskell/ghc_wrapper.exe bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/compile_flags_foo_HaskellBuildLibrary bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/extra_args_foo_HaskellBuildLibrary
SUBCOMMAND: # //tests/package-id-clash-binary:bin [action 'HaskellRegisterPackage tests/package-id-clash-binary/link-config-bin/link-config-bin.conf', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
  SET PATH=C:/Program Files/Git/usr/bin
  external/rules_haskell_ghc_windows_amd64/bin/ghc-pkg.exe recache --package-db=bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/link-config-bin -v0 --no-expand-pkgroot
SUBCOMMAND: # //tests/package-id-clash-binary/a:foo [action 'Action tests/package-id-clash-binary/a/exposed-modules-foo', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
bazel-out/host/bin/haskell/ls_modules.exe False bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/testsZSpackage-id-clash-binaryZSaZSfoo/_iface bazel-out/host/bin/external/rules_haskell_ghc_windows_amd64/ghc-global-pkgdb bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/hidden-modules-foo bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/reexported-modules-foo bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/exposed-modules-foo
SUBCOMMAND: # //tests/package-id-clash-binary/a:foo [action 'Action tests/package-id-clash-binary/a/_obj/foo.static.manifest', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
C:/Program Files/Git/usr/bin/bash.exe -c 
        "C:/Program Files/Git/usr/bin/find.exe" bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/_obj/foo -name '*.o' | "C:/Program Files/Git/usr/bin/sort.exe" > bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/_obj/foo.static.manifest
        
SUBCOMMAND: # //tests/package-id-clash-binary/a:foo [action 'HaskellLinkStaticLibrary tests/package-id-clash-binary/a/libHStestsZSpackage-id-clash-binaryZSaZSfoo.a', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
external/rules_haskell_ghc_windows_amd64/mingw/bin/ar qc bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/libHStestsZSpackage-id-clash-binaryZSaZSfoo.a @bazel-out/x64_windows-fastbuild/bin/tests/package-id-clash-binary/a/_obj/foo.static.manifest
SUBCOMMAND: # //tests/package-id-clash-binary/a:foo [action 'Action tests/package-id-clash-binary/a/testsZSpackage-id-clash-binaryZSaZSfoo/testsZSpackage-id-clash-binaryZSaZSfoo.conf', configuration: 63f8c1aee551c642405568e6da6b8a844eeebe0a77095f8ba225466dd5618815]
cd C:/users/vssadministrator/_bazel_vssadministrator/w3d6ug6o/execroot/rules_haskell
INFO: Elapsed time: 3.481s, Critical Path: 2.27s
INFO: 13 processes: 13 local.
FAILED: Build did NOT complete successfully
//tests/package-id-clash-binary:bin                             FAILED TO BUILD

@aherrmann
Copy link
Member Author

And another instance on #1411 which succeeded after a CI rerun.

Azure build log

@aherrmann
Copy link
Member Author

With #1281 in place in the daml repo we have encountered possibly related issues:

ERROR: D:/a/2/s/compiler/daml-lf-reader/BUILD.bazel:6:19: output 'compiler/daml-lf-reader/_obj/daml-lf-reader/DA/Daml/LF/Reader.o' was not created
ERROR: D:/a/2/s/compiler/daml-lf-reader/BUILD.bazel:6:19: output 'compiler/daml-lf-reader/compilerZSdaml-lf-readerZSdaml-lf-reader/_iface/DA/Daml/LF/Reader.hi' was not created
ERROR: D:/a/2/s/compiler/daml-lf-reader/BUILD.bazel:6:19: not all outputs were created or valid

This could be related to this bazel issue where presence of items in the action cache without corresponding items in the CAS leads to missing outputs.

@cocreature
Copy link
Collaborator

cocreature commented Jan 27, 2021

After some debugging with @aherrmann and @garyverhaegen-da and it looks like the one above is a red herring. We tracked down which AC was queried by Bazel (if someone has a better way of doing that than wireshark, I’m all ears, the eventlog and execlog don’t seem to work). The AC that was queried contains no output_files nor any output_directories and has an empty stdout and stderr. We haven’t figured out so far what exactly causes that AC to be created but it does correspond to the broken target. Upon deleting this target, the target gets rebuilt and everything seems to be working as expected.

At least in our current cache, there don’t appear to be any targets with no output directors, files and an empty stdout and stderr so as a workaround we are considering to iterate over all ACs periodically and delete ACs of this type.

@mergify mergify bot closed this as completed in #1281 Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants