Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Every .bzl file must have a corresponding package, but X does not have one #12630

Closed
GMNGeoffrey opened this issue Dec 4, 2020 · 25 comments
Closed
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols type: bug

Comments

@GMNGeoffrey
Copy link
Contributor

GMNGeoffrey commented Dec 4, 2020

Description of the problem / feature request:

Twice in the last two days when returning to rebuild a Bazel project, I've seen an error about it being unable to find the .bzl file for rules_cc rules. I think this has to do with the implicit loading of rules_cc by Bazel, as you'll notice that nowhere is that file mentioned in the repository (https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/google/llvm-bazel%24+%22cc:defs.bzl%22&patternType=regexp).

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

As this is a non-deterministic thing, it's a bit hard to reproduce. It seems to happen basically every time I fetch new commits in https://github.com/google/iree or https://github.com/google/llvm-bazel

What operating system are you running Bazel on?

Linux Debian

What's the output of bazel info release?

Happens with Bazel 3.3.1 and 3.7.1

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

$ git remote get-url origin ; git rev-parse main ; git rev-parse HEAD
[email protected]:GMNGeoffrey/llvm-bazel.git
e463a75a963dc583ef1ccc6122aceaaabd5b8e09
e463a75a963dc583ef1ccc6122aceaaabd5b8e09

Have you found anything relevant by searching the web?

No

@GMNGeoffrey
Copy link
Contributor Author

This happens to me just about every time I fetch new stuff from the repository at this point.

@GMNGeoffrey
Copy link
Contributor Author

Oops it looks like I didn't finish filling out the issue report. I am now running bazel clean --expunge multiple times a day for different repositories because of this issue

@vnghia
Copy link
Contributor

vnghia commented Dec 31, 2020

@GMNGeoffrey I got this issue recently. I found there are 2 ways for fixing it.

  • Just load it by using load("@rules_cc//cc:defs.bzl", "cc_library") without adding rules_cc to your WORKSPACE ( I think it is the rules_cc that is packed by default with Bazel.
  • If you add rules_cc in your WORKSPACE, you have to add a strip_prefix, that works for me ( With this rules_cc, you can use more advanced features like cc_shared_library ).
http_archive(
    name = "rules_cc",
    strip_prefix = "",
    urls = ["https://github.com/bazelbuild/rules_cc/archive/TODO"],
    sha256 = "TODO",
)

@GMNGeoffrey
Copy link
Contributor Author

Thanks. That then brings us to bazelbuild/buildtools#923 😁 I also find I get this with rules_python

@GMNGeoffrey
Copy link
Contributor Author

Now also getting this with @bazel_sklyib//:workspace.bzl

ERROR: error loading package '': Every .bzl file must have a corresponding package, but '@bazel_skylib//:workspace.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.

We use this in our WORKSPACE file and do load it normally, I think, so maybe this isn't just a rules_cc thing?

https://github.com/google/llvm-bazel/blob/4c8b546e53eebc708c77ba19a2110926a8732642/llvm-bazel/WORKSPACE#L60-L69

@vnghia
Copy link
Contributor

vnghia commented Jan 7, 2021

@GMNGeoffrey

Just add the strip_prefix will solve your problem.

And I think it is how bazel works indeed. If you don't add strip_prefix, bazel will download the http_archive https://github.com/bazelbuild/bazel-skylib/releases/download/1.0.2/bazel-skylib-1.0.2.tar.gz to a folder named bazel-skylib-1.0.2. When you want to load something from that folder, you might have to do something like @bazel-skylib//bazel-skylib-1.0.2/....

What do you think ?

@GMNGeoffrey
Copy link
Contributor Author

Hmmm what you describe has been my experience with other http_archive usage from GitHub releases. I believe I copied this from the instructions at https://github.com/bazelbuild/bazel-skylib/releases/tag/1.0.2 though and it looks identical. Also the error I experience here is non-deterministic, which isn't the kind of error I'd expect here.

And testing it out, the Bazel skylib release archive does have different behavior here. I believe it does something fancy to avoid the need for the strip_prefix

gcmn@lt 2021-01-07 16:56 ~/Downloads/bazel-extract-test$ mv ../bazel-toolchains-3.3.1.tar.gz ./

gcmn@lt 2021-01-07 16:56 ~/Downloads/bazel-extract-test$ tar -xzf bazel-toolchains-3.3.1.tar.gz 

gcmn@lt 2021-01-07 16:56 ~/Downloads/bazel-extract-test$ ls
bazel-toolchains-3.3.1  bazel-toolchains-3.3.1.tar.gz

gcmn@lt 2021-01-07 16:56 ~/Downloads/bazel-extract-test$ rm -rf *

gcmn@lt 2021-01-07 16:57 ~/Downloads/bazel-extract-test$ mv ../bazel-skylib-1.0.2.tar.gz 
mv: missing destination file operand after '../bazel-skylib-1.0.2.tar.gz'
Try 'mv --help' for more information.

gcmn@lt 2021-01-07 16:57 ~/Downloads/bazel-extract-test$ mv ../bazel-skylib-1.0.2.tar.gz ./

gcmn@lt 2021-01-07 16:57 ~/Downloads/bazel-extract-test$ tar -xzf bazel-skylib-1.0.2.tar.gz 

gcmn@lt 2021-01-07 16:57 ~/Downloads/bazel-extract-test$ ls
bazel-skylib-1.0.2.tar.gz  bzl_library.bzl  CONTRIBUTORS       internal_setup.bzl  lib.bzl  rules                toolchains   workspace.bzl
BUILD                      CODEOWNERS       internal_deps.bzl  lib                 LICENSE  skylark_library.bzl  version.bzl

So note that the archive structure for bazel_skylib doesn't have the nesting

@oquenchil oquenchil added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Jan 12, 2021
@GMNGeoffrey
Copy link
Contributor Author

Running clean multiple times a day... I'll just drop this quote from the Bazel docs

Bazel's design is such that these problems are fixable; we consider such bugs a high priority, and will do our best fix them. If you ever find an incorrect incremental build, please file a bug report. We encourage developers to get out of the habit of using clean and into that of reporting bugs in the tools.

https://docs.bazel.build/versions/master/user-manual.html#the-clean-command

P3 👀

@lberki lberki added P1 I'll work on this now. (Assignee required) team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules labels Apr 7, 2021
@lberki lberki assigned philwo and unassigned philwo Apr 7, 2021
@lberki
Copy link
Contributor

lberki commented Apr 7, 2021

Sounds like a correctness issue, why was this classified as P3 in the first place?

@jin @oquenchil It doesn't seem to have a lot to do with C++ though, it's more like an issue with either the Starlark loading phase machinery or the external repositories.

@GMNGeoffrey , it'll be hard to fix this without a reproduction that works at least some of the time. Mind coming up with something that reproduces the issue at least sometimes?

@lberki
Copy link
Contributor

lberki commented Apr 7, 2021

@philwo , the external repository machinery is more likely to be the culprit than Starlark because otherwise we'd se regular reports of this in google3, so assigning this to you for further routing (or a clever idea)

@philwo philwo assigned Wyverald and unassigned philwo Apr 7, 2021
@philwo
Copy link
Member

philwo commented Apr 7, 2021

@Wyverald Could you have a look at this?

@Wyverald
Copy link
Member

Wyverald commented Apr 7, 2021

@GMNGeoffrey Could you provide some repro steps? Running what Bazel command gives you the error message?

I tried to clone the llvm-bazel repo and build it according to the Usage section, but only got the following error immediately:

$ bazel build --config=generic_clang @llvm-project//...
INFO: Invocation ID: 0152e67e-8646-4027-9203-893578cd7630
INFO: Repository llvm-project instantiated at:
  no stack (--record_rule_instantiation_callstack not enabled)
Repository rule overlay_directories defined at:
  /Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.bzl:70:38: in <toplevel>
ERROR: An error occurred during the fetch of repository 'llvm-project':
   Failed to execute overlay script: '/usr/local/bin/python3 /Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py --src /Users/wyv/github/llvm-bazel/llvm-bazel/../third_party/llvm-project --overlay /Users/wyv/github/llvm-bazel/llvm-bazel/llvm-project-overlay --target .'
Exited with code 1
stdout:

stderr:
Traceback (most recent call last):
  File "/Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py", line 92, in <module>
    main(parse_arguments())
  File "/Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py", line 83, in main
    for src_entry in os.listdir(os.path.join(args.src, rel_root)):
FileNotFoundError: [Errno 2] No such file or directory: '/Users/wyv/github/llvm-bazel/llvm-bazel/../third_party/llvm-project/clang'

ERROR: Failed to execute overlay script: '/usr/local/bin/python3 /Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py --src /Users/wyv/github/llvm-bazel/llvm-bazel/../third_party/llvm-project --overlay /Users/wyv/github/llvm-bazel/llvm-bazel/llvm-project-overlay --target .'
Exited with code 1
stdout:

stderr:
Traceback (most recent call last):
  File "/Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py", line 92, in <module>
    main(parse_arguments())
  File "/Users/wyv/github/llvm-bazel/llvm-bazel/overlay_directories.py", line 83, in main
    for src_entry in os.listdir(os.path.join(args.src, rel_root)):
FileNotFoundError: [Errno 2] No such file or directory: '/Users/wyv/github/llvm-bazel/llvm-bazel/../third_party/llvm-project/clang'

INFO: Elapsed time: 0.252s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)

@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Apr 7, 2021

@Wyverald looks like you didn't init submodules (git submodule update --init). One sec I'll provide a more detailed review of the things I see.

@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Apr 7, 2021

Ok further details:

This happens to me with both https://github.com/google/iree and https://github.com/google/llvm-bazel. These repositories use different versions of Bazel: 3.3.1 and 3.7.1. Fetching new commits does not appear to be the proximate cause, as I had originally guessed. Rather it seems to happen whenever the Bazel server starts up again. The error is not for a consistent package (like rules_cc). Here is the one I saw most recently:

ERROR: error loading package '': Every .bzl file must have a corresponding package, but '@bazel_skylib//:workspace.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.

My home .bazelrc is:

build --disk_cache=~/.cache/bazel-disk-cache
build --sandbox_base=/dev/shm

which is another common factor between these issues. I generally build with rbe however and our rbe config sets --disk_cache="" (because that's required), so I don't think the disk_cache should be causing the issue. I'm a bit suspicious that this is somehow based on a spooky interaction between these two builds because no one else on my team building IREE has reported an issue. --nosystem_rc --nohome_rc still gives the issue though.

Is there some kind of archive I can package up to help demonstrate the issue? I made an archive of my entire git repository (happy to share in whatever manner, though it's 1.8G), which is currently experiencing the issue. If I extract that on the same computer, the error repeats, but if I copy it to another computer running the same OS, the build works fine (the path I copy it to and various other factors are the same across these two machines, so I think symlinks and such just happen to line up). If I then package up and do the same with the outputBase directory, I can get the same behavior on another machine. In fact the archive of the source isn't necessary, so it's some kind of corruption in the output base.

@GMNGeoffrey GMNGeoffrey changed the title Every .bzl file must have a corresponding package, but '@rules_cc//cc:defs.bzl' does not have one Every .bzl file must have a corresponding package, but X does not have one Apr 7, 2021
@Wyverald
Copy link
Member

Wyverald commented Apr 8, 2021

I tried building llvm-bazel, shutting down the Bazel server after that, and then re-building llvm-bazel again, and nothing has gone wrong yet so far. But as you said the problem doesn't always happen after a server restart, so I'll try a rebuild every now and then to see if I can catch a repro.

What's your RBE setup? Maybe that could be related?

Also could you go into $outputBase/external/bazel_skylib on a reproing machine and see what's there? Is there really no BUILD file in the root directory? Are all files seemingly intact?

@GMNGeoffrey
Copy link
Contributor Author

Yeah I also haven't been able to repro by just manually shutting down the server. Maybe something with idle server shutdown? It seems to happen the first time I go to build for the day (which was why I thought it had to do with syncing, since I usually do that). The RBE setup is in the .bazelrc and WORKSPACE. It's basically just the default setup with rbe_autoconfig. IREE has basically the same setup.

Looking in $outputBase/external/bazel_skylib, there really isn't a BUILD file:

gcmn@ws 2021-04-08 10:47 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46/external/bazel_skylib$ ls
lib  rules  toolchains  WORKSPACE

There's also no file called workspace.bzl though... so that's confusing. Is this the kind of sneaky auto loading thing?

If I go into the same directory on a machine without the issue (I think nothing machine-specific, I just happen to have two computers accessible to me):

gcmn@ct 2021-04-08 10:51 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46-bak/external/bazel_skylib$ ls
BUILD            CODEOWNERS    internal_deps.bzl   lib      LICENSE  skylark_library.bzl  version.bzl  workspace.bzl
bzl_library.bzl  CONTRIBUTORS  internal_setup.bzl  lib.bzl  rules    toolchains           WORKSPACE

So yeah for some reason those files aren't showing up. Seems like an issue with repo rules?

@GMNGeoffrey
Copy link
Contributor Author

Same issue with rules_cc:

gcmn@ws 2021-04-08 10:52 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46/external/rules_cc$ ls
cc  distro  examples  third_party  tools  WORKSPACE

vs

gcmn@ct 2021-04-08 10:52 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46-bak/external/rules_cc$ ls
BUILD  CODEOWNERS       distro    internal_deps.bzl   ISSUE_TEMPLATE.md  README.md      third_party  WORKSPACE
cc     CONTRIBUTING.md  examples  internal_setup.bzl  LICENSE            renovate.json  tools

And then if I bazel clean --expunge the directory is deleted. bazel test ... passes.

gcmn@ws 2021-04-08 10:55 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46$ ls external/bazel_skylib/
BUILD            CODEOWNERS    internal_deps.bzl   lib      LICENSE  skylark_library.bzl  version.bzl  workspace.bzl
bzl_library.bzl  CONTRIBUTORS  internal_setup.bzl  lib.bzl  rules    toolchains           WORKSPACE

gcmn@ws 2021-04-08 10:55 ~/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46$ ls external/rules_cc/
BUILD  CODEOWNERS       distro    internal_deps.bzl   ISSUE_TEMPLATE.md  README.md      third_party  WORKSPACE
cc     CONTRIBUTING.md  examples  internal_setup.bzl  LICENSE            renovate.json  tools

The directories are correctly populated. So... something is deleting those files?

I'm poking around looking for linux utilities to tattle on a process that deletes a file.

@GMNGeoffrey
Copy link
Contributor Author

GMNGeoffrey commented Apr 8, 2021

Ok so more details. I tried following https://askubuntu.com/questions/48844/how-to-find-the-pid-of-the-process-which-has-deleted-a-file to audit the deletion of that file, but couldn't get it to work. In the meantime, manually deleting $outputBase/external/bazel_skylib/BUILD causes the same issue, unsurprisingly. No server restart necessary.

@GMNGeoffrey
Copy link
Contributor Author

Ok I think I've set up an auditd rule to watch for deleting of bazel_skylib, so we'll see how that goes.

@GMNGeoffrey
Copy link
Contributor Author

Running what Bazel command gives you the error message?

Realized I never answered this question. Any bazel build/test of the repository basically. Simplest is just bazel test ... from the llvm-bazel subdirectory

@GMNGeoffrey
Copy link
Contributor Author

Oh no... I think I know the problem... and it's my fault... 🤦

@daily find /usr/local/google/home/gcmn/.cache/bazel* -mtime +12 -type f -delete

Bazel kept on using up all of my disk space, so I figured I could empty out old cache entries. It's called "cache" so surely it must be ok to delete things from a correctness perspective. Apparently that's not true 😁

But also

$ ll  $HOME/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46/external/bazel_skylib/BUILD
-r--r--r-- 1 gcmn primarygroup 1591 Dec 31  1999 /usr/local/google/home/gcmn/.cache/bazel/_bazel_gcmn/1655d0ec87ba8569e84422cbb0aeec46/external/bazel_skylib/BUILD

that doesn't seem right?

So I did a bad thing for sure and this is my fault. I think there are some Bazel issues here that lead me to do this, in retrospect obviously bad, thing

  • Using tons of disk space. I've got a half terabyte drive. That's not nothing. And Bazel was filling it up like ~weekly (?)
  • The name "cache" kind of implies that it's not used for persistent storage ;-P

So deleted that cron job for now. I would love to come up with a safe replacement before Bazel fills up my drive again.

FTR I think I never would've figured this out without the suggestion of where to look for a BUILD file. The error about '@bazel_skylib//:workspace.bzl' gave me nothing. It's not any of the symlinks that are created in the source directory, which I did explore.

Also lol at all my guesses as to why I was seeing this "daily" and what that could correlate with. Turns out it just correlated with... days...

@lberki
Copy link
Contributor

lberki commented Apr 9, 2021

I don't think Bazel (or any software) can be resilient to deleting various data files from under it.

I am somewhat surprised about Bazel using hundreds of gigabytes of storage. It should create exactly one directory directly under .cache/bazel for each workspace root and the disk usage within one such directory is should be a function of the amount source code being built.

If you want to clean up old entries under .cache/bazel, check for the mtime of $OUTPUT_BASE/lock. That should be a reliable indicator of when Bazel touched that output base last.

@lberki lberki closed this as completed Apr 9, 2021
@Wyverald
Copy link
Member

Wyverald commented Apr 9, 2021

I wanted to add to a few points:

It's called "cache" so surely it must be ok to delete things from a correctness perspective. Apparently that's not true 😁

That's true. You can nuke the entire cache directory and Bazel will still work fine. You can even nuke some subdirectories under there (for example ~/.cache/_bazel_gcmn/<some_hash>). But I don't think it's fair to expect Bazel to keep working when random files are deleted in the cache folder.

that doesn't seem right?

If you mean the mtime being in 1999 doesn't seem right, that's on bazel_skylib's packaging. If you download the tar.gz and unpack it yourself, you'll see that all the files in there have the same mtime.

@lberki
Copy link
Contributor

lberki commented Apr 9, 2021

The mtimes are that old for reasons of hermeticity; we don't want the exact time used for the build to influence its outputs so we stub out the timestamps.

@GMNGeoffrey
Copy link
Contributor Author

Thanks @Wyverald for asking the critical question that made it possible for me to figure out what was going on here:

Also could you go into $outputBase/external/bazel_skylib on a reproing machine and see what's there?

This was key :-)

I don't think Bazel (or any software) can be resilient to deleting various data files from under it.

Agreed. I misunderstood which things were safe to delete. Sorry about that!

I am somewhat surprised about Bazel using hundreds of gigabytes of storage. It should create exactly one directory directly under .cache/bazel for each workspace root and the disk usage within one such directory is should be a function of the amount source code being built.

I'm going to try not cleaning things up for a little bit and see what starts filling up. I think it was some combination of these caches and the bazel disk cache, and it's possible that while I was in there I decided to try to clean up both. So in some ways this is linked to #1035. Bazel seems to be missing a pretty important secondary feature of caches: cache eviction. It's an example of the way in which it carries its legacy of being a remote-mostly tool for developers at a big tech company with tons of resources. I'm using a top of the line workstation and still running into issues, so it's not surprising to me when folks with less at their fingertips are reticent to choose Bazel.

If you want to clean up old entries under .cache/bazel, check for the mtime of $OUTPUT_BASE/lock. That should be a reliable indicator of when Bazel touched that output base last.

Thanks for the tip. If I have to revive this custom cleanup process in another form, I'll use that to make sure it's safer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-Starlark-Integration Issues involving Bazel's integration with Starlark, excluding builtin symbols type: bug
Projects
None yet
Development

No branches or pull requests

7 participants