Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected IO error (Not a directory) #4751

Closed
buchgr opened this issue Mar 2, 2018 · 37 comments
Closed

Unexpected IO error (Not a directory) #4751

buchgr opened this issue Mar 2, 2018 · 37 comments
Assignees
Labels
P0 This is an emergency and more important than other current work. (Assignee required) platform: apple team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-CPP Issues for C++ rules

Comments

@buchgr
Copy link
Contributor

buchgr commented Mar 2, 2018

ERROR: /Users/buildkite/builds/darwin-x86-64-1-1/bazel/re2/BUILD:26:1: Couldn't build
file _objs/re2/re2/filtered_re2.o: C++ compilation of rule '//:re2' failed: Unexpected IO error.:
/private/var/tmp/_bazel_buildkite/90b05a586a8f6522c740e54d1334c7a0/execroot/
com_googlesource_code_re2/external/local_config_cc/wrapped_clang (Not a directory)

We see this error frequently on macOS. @mhlopko suggested that it's a somewhat known bug in external repositories.

cc: @aehlig @dslomov

@laszlocsomor
Copy link
Contributor

Is anyone investigating?

@hlopko
Copy link
Member

hlopko commented Mar 5, 2018

I just tried to collect evidence that this is an instance of existing issue, but I cannot find any. But I vaguely remember strange issues with skylark repositories on mac like this one.

@laszlocsomor
Copy link
Contributor

Thanks. Marcel, may I assign this to you or is there someone for whom it's a better fit?

@hlopko
Copy link
Member

hlopko commented Mar 5, 2018

I'd suggest @aehlig (for external repositories expertise) or @philwo (for mac expertise). I don't plan to investigate this issue further.

@laszlocsomor
Copy link
Contributor

Thanks! Assigning to @philwo because @aehlig is out on vacation.

@hlopko
Copy link
Member

hlopko commented Mar 23, 2018

@hlopko
Copy link
Member

hlopko commented Mar 23, 2018

failed: Unexpected IO error.: /private/var/tmp/_bazel_buildkite/5325b99c04dd286d6e24d5b36b141d80/execroot/com_googlesource_code_re2/external/bazel_tools/tools/test/test-setup.sh (No such file or directory)
 

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

I have managed to get a stacktrace for the error:

ERROR: /Users/buchgr/code/bazel/src/main/cpp/util/BUILD:126:1: C++ compilation of rule '//src/main/cpp/util:strings' failed: Unexpected IO error.java.io.FileNotFoundException: /private/var/tmp/_bazel_buchgr/5de787c83f067e12ca3f7ef44fb23d3f/execroot/io_bazel/external/local_config_cc/wrapped_ar (No such file or directory)
	at com.google.devtools.build.lib.unix.NativePosixFiles.stat(Native Method)
	at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:177)
	at com.google.devtools.build.lib.unix.UnixFileSystem.isExecutable(UnixFileSystem.java:248)
	at com.google.devtools.build.lib.vfs.Path.isExecutable(Path.java:839)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.getOrComputeDirectory(TreeNodeRepository.java:375)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:407)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:405)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:405)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:98)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:91)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:64)
	at com.google.devtools.build.lib.rules.cpp.SpawnGccStrategy.execWithReply(SpawnGccStrategy.java:65)
	at com.google.devtools.build.lib.rules.cpp.CppCompileAction.execute(CppCompileAction.java:1187)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:892)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:112)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:690)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:644)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:414)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:440)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:194)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:347)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
: /private/var/tmp/_bazel_buchgr/5de787c83f067e12ca3f7ef44fb23d3f/execroot/io_bazel/external/local_config_cc/wrapped_ar (No such file or directory)

After Bazel crashes, the file in question exists however. The error is hard to reproduce and I have only ever managed to reproduce it when the machine was under load. The error only seems to happen for files from remote repositories.

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

Some more instances of the error:

https://buildkite.com/bazel/publish-bazel-binaries/builds/50#fc59eefc-b8d3-4c53-9854-5df86d2061d1
/Users/buildkite/builds/buildkite-macpro-1-1/bazel/publish-bazel-binaries/src/main/cpp/util/BUILD:32:1: Couldn't build file src/main/cpp/util/_objs/file/src/main/cpp/util/file.o: C++ compilation of rule '//src/main/cpp/util:file' failed: Unexpected IO error.: /private/var/tmp/_bazel_buildkite/e8261f14c63cbece55e044e3e278423a/execroot/io_bazel/external/local_config_cc/cc_wrapper.sh (Not a directory)

https://buildkite.com/bazel/bazel-bazel/builds/1058#a5d34e61-0b0e-47f8-a6cf-ddbb4c774628
ERROR: /Users/buildkite/builds/buildkite-macpro-2-1/bazel/bazel-bazel/src/main/cpp/util/BUILD:96:1: Couldn't build file src/main/cpp/util/_objs/logging/src/main/cpp/util/logging.o: C++ compilation of rule '//src/main/cpp/util:logging' failed: Unexpected IO error.: /private/var/tmp/_bazel_buildkite/7f1cd7ef09bcdfc83a912d002941c64f/execroot/io_bazel/external/local_config_cc/wrapped_clang (Not a directory)

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

The code that triggers this error is

execRoot.getRelative(input.getExecPathString()).isExecutable()

with exectRoot being a Path object and Path.isExecutable() leading to the stat system call.

cc: @ulfjack

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

It's also quite possible that this is a bug in NativePosixFileSystem, as it only ever appears on macOS. I have never seen this error on Windows or Linux. I suggest setting io.bazel.EnableJni=0 for macOS on Buildkite and see if the error continues to happen.

@ulfjack
Copy link
Contributor

ulfjack commented Mar 27, 2018

So far, I think we've only seen this error for executable files, right? I wonder if it's something in MacOS, maybe related to its verification of executable files?

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

@ulfjack IIUC all output files of an action are marked executable in Bazel?

@ulfjack
Copy link
Contributor

ulfjack commented Mar 27, 2018

None of the paths posted so far contain "bazel-out". I'm not sure how exactly the files under external/ come into existence - the paths are related to external repositories, but that doesn't really help us narrow it down. It could be coincidence that we only see it happen for external repositories paths. The remote cache is implicitly sorting the files by name, so maybe it's just the external/ paths always get sorted first or something like that.

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

I have disabled the NativePosixFileSystem and this error then happens also with the JavaIoFilesystem.

ERROR: /Users/buchgr/code/bazel/src/main/cpp/util/BUILD:83:1: C++ compilation of rule '//src/main/cpp/util:port' failed: Unexpected IO error.java.io.FileNotFoundException: /private/var/tmp/_bazel_buchgr/5de787c83f067e12ca3f7ef44fb23d3f/execroot/io_bazel/external/local_config_cc/wrapped_clang_pp (No such file or directory)
	at com.google.devtools.build.lib.vfs.JavaIoFileSystem.isExecutable(JavaIoFileSystem.java:164)
	at com.google.devtools.build.lib.vfs.Path.isExecutable(Path.java:839)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.getOrComputeDirectory(TreeNodeRepository.java:375)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:407)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:405)
	at com.google.devtools.build.lib.remote.TreeNodeRepository.computeMerkleDigests(TreeNodeRepository.java:405)
	at com.google.devtools.build.lib.remote.RemoteSpawnCache.lookup(RemoteSpawnCache.java:98)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:91)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:64)
	at com.google.devtools.build.lib.rules.cpp.SpawnGccStrategy.execWithReply(SpawnGccStrategy.java:65)
	at com.google.devtools.build.lib.rules.cpp.CppCompileAction.execute(CppCompileAction.java:1187)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:892)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$900(SkyframeActionExecutor.java:112)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:690)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:644)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:414)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:440)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:194)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:347)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:355)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
: /private/var/tmp/_bazel_buchgr/5de787c83f067e12ca3f7ef44fb23d3f/execroot/io_bazel/external/local_config_cc/wrapped_clang_pp (No such file or directory)

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

@aehlig @dslomov can you comment on how files from external repositories come into existence in the execroot?

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

@ulfjack any ideas on how to proceed?

@ulfjack
Copy link
Contributor

ulfjack commented Mar 27, 2018

The next thing to do is rule out the likely answers. Does the file exist when we do the call?

@ulfjack
Copy link
Contributor

ulfjack commented Mar 27, 2018

Also, if we're writing the file, are we properly closing it afterwards?

@buchgr
Copy link
Contributor Author

buchgr commented Mar 27, 2018

I added the following code before constructing the TreeNodeRepository.

    for (ActionInput input : inputMap.values()) {
      Path p = execRoot.getRelative(input.getExecPathString());
      try {
        p.isExecutable();
      } catch (IOException e) {
        File f = new File(p.getPathString());
        report(Event.debug("stacktrace: " + Throwables
            .getStackTraceAsString(e) + ", exists: " + f.isFile()));
      }
    }

According to java.io.File.isFile() all files for which Path.isExecutable() threw an exception do indeed exist. What's more interesting is that after having added this code, there are no more failures in the TreeNodeRepository. So either the call to Path.isExecutable() or File.isFile() seems to influence the result of the subsequent Path.isExecutable() call. I'd assume that both calls translate to stat. I ll try to find out more.

@buchgr
Copy link
Contributor Author

buchgr commented Mar 28, 2018

The following patch makes the error go away, which makes me conclude that it's a problem with both our FileSystem implementations.

diff --git a/src/main/java/com/google/devtools/build/lib/remote/TreeNodeRepository.java b/src/main/java/com/google/devtools/build/lib/remote/TreeNodeRepository.java
index 7767cb8c7b..c1a4f34817 100644
--- a/src/main/java/com/google/devtools/build/lib/remote/TreeNodeRepository.java
+++ b/src/main/java/com/google/devtools/build/lib/remote/TreeNodeRepository.java
@@ -39,6 +39,7 @@ import com.google.devtools.build.lib.vfs.Symlinks;
 import com.google.devtools.remoteexecution.v1test.Digest;
 import com.google.devtools.remoteexecution.v1test.Directory;
 import com.google.protobuf.ByteString;
+import java.io.File;
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
@@ -372,7 +373,7 @@ public final class TreeNodeRepository {
             b.addFilesBuilder()
                 .setName(entry.getSegment())
                 .setDigest(DigestUtil.getFromInputCache(input, inputFileCache))
-                .setIsExecutable(execRoot.getRelative(input.getExecPathString()).isExecutable());
+                .setIsExecutable(new File(execRoot.getRelative(input.getExecPathString()).getPathString()).canExecute());
           }
         } else {
           Digest childDigest = Preconditions.checkNotNull(treeNodeDigestCache.get(child));

@buchgr buchgr assigned buchgr and unassigned philwo Mar 28, 2018
@lfpino lfpino added the P1 I'll work on this now. (Assignee required) label Apr 5, 2018
@buchgr buchgr added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P1 I'll work on this now. (Assignee required) labels Jan 16, 2019
@buchgr buchgr removed their assignment Jan 16, 2019
@buchgr buchgr added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed category: local execution / caching labels Jan 16, 2019
@buchgr buchgr self-assigned this Jan 17, 2019
@buchgr
Copy link
Contributor Author

buchgr commented Jan 17, 2019

I have finally found the problem and solution to this bug.The issue is that we always create Path objects using execRoot.getRelativePath(rootRelativePath). Now that mostly works except for external dependencies, because these aren't symlinked under the execroot but only below their artifact root, which happens to be the output base.

The solution is to create the path object for an Artifact only using its ArtifactRoot (Path p = artifact.getRoot().getRoot().getRelative(rootRelativePath) and to only use the exec path for ActionInput objects.

I ll send out a fix.

@philwo
Copy link
Member

philwo commented Jan 17, 2019

Wow! Thanks!! 😀

I’m curious: Why does it only sometimes fail though (and mostly on macOS)? Is there a race somewhere that lets this work most of the time even though the path is wrong?

@laszlocsomor
Copy link
Contributor

Wow, that's quite impressive. How did you find this bug?

Also, it seems to be easy to do the wrong thing and hard to do the right thing, i.e. I would never guess that artifact.getRoot() then another getRoot() and finally getRelative() is the way to go. Do you have any suggestion about simplifying the API or somehow making it easy to create Artifacts the right way?

buchgr added a commit to buchgr/bazel that referenced this issue Jan 18, 2019
@buchgr buchgr added P0 This is an emergency and more important than other current work. (Assignee required) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Jan 18, 2019
@buchgr buchgr closed this as completed Jan 19, 2019
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 13, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
bazel-io pushed a commit that referenced this issue Jan 15, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for #4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. #12818.

Fixes #12818.

Closes #12820.

PiperOrigin-RevId: 351940694
coeuvre added a commit to coeuvre/bazel that referenced this issue Jan 18, 2021
When build without bytes is enabled, we use isExecutable field of OutputFile for intermediate input files. This is achieved by injecting the metadata into the MetadataProvider.

The "always mark" was introduced by 3e3b71a which was a workaround for bazelbuild#4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. bazelbuild#12818.
philwo pushed a commit that referenced this issue Mar 15, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for #4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. #12818.

Fixes #12818.

Closes #12820.

PiperOrigin-RevId: 351940694
philwo pushed a commit that referenced this issue Mar 15, 2021
The "always mark" was introduced by 3e3b71a which was a workaround for #4751. However, that issue was then fixed by fc44891. There is no reason to keep the workaround which is causing other issues e.g. #12818.

Fixes #12818.

Closes #12820.

PiperOrigin-RevId: 351940694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 This is an emergency and more important than other current work. (Assignee required) platform: apple team-Remote-Exec Issues and PRs for the Execution (Remote) team team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

No branches or pull requests

8 participants