-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic (i.e. racy) filesystem failures under heavy load #3281
Comments
By the symptoms it looks like the race condition bug of #2712. It should be fixed on latest insider build in the Fast Ring but if it's confirmed that it works a fix will be released for 17134 too. |
I have this same problem - micropython/micropython#3976 |
Still facing the same issue. |
I also have the same issue, while building OpenThread on WSL (Ubuntu 18.04) |
Fixed it installing the windows insider build |
@canda Thanks for the tip. I actually installed build 19H1, and it's fixed. Apparently, build 17677 onwards should contain the fix: Thanks! |
Alright let's call this dupe #2712 until proven otherwise. |
This bug-tracker is monitored by developers and other technical types. We like detail! So please use this form and tell us, concisely but precisely, what's up. Please fill out ALL THE FIELDS!
Your Windows build number: Microsoft Windows [Version 10.0.17134.48]
What you're doing and what's happening: Building a large piece of software with gcc using the ninja build system (so many file accesses happen in parallel across 40+ cores), and randomly files will fail to be included.
What's wrong / what should be happening instead: The compiler generates an error message about not being able to find a particular include file. This is GCC, where when it encounters this type of error it will re-try the same compilation again. The second time, however, it does not fail. This causes GCC to report that it is a non-deterministic failure, and probably a result of some OS or hardware failure.
Strace of the failing command, if applicable: Unfortunately the probelm does not reproduce under strace. Presumably this is because it's a race condition and strace changes the timing of the commands (or more specifically, it vastly slows everything down). So some internal buffer which is probably reaching its limit is not getting hit under strace because there is not as much load on the filesystem).
Luckily, it's (relatively) easy to reproduce, you just have to get set up to build an open source project (which thankfully is pretty easy)
After running for some time, you will get an error such as:
However, if you try again, the problem will not happen but it will happen again for the same file, but it will happen again later on a different file.
Update: Of (possible) importance here is that the source code is on an NTFS mount. That is, in my particular setup, the
llvm-project
folder is on an NTFS mount on my D drive, and I've created a symlink from~/llvm-project
to/mnt/d/src/llvm-project
. That being said, there should be no writes going to NTFS, only reads.The text was updated successfully, but these errors were encountered: