Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob(["**"]) with cyclic symlinks can OOM without a useful error message #10783

Closed
celskeggs opened this issue Feb 14, 2020 · 6 comments
Closed
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug

Comments

@celskeggs
Copy link

celskeggs commented Feb 14, 2020

Description of the problem / feature request:

Attempting to glob a directory that contains cyclic symbolic links normally produces a reasonably explanatory error (as in #133), which would let the developer trying to do so know that they should fix their input to not contain symbolic links.

I encountered a scenario where glob(["**"]) on a directory with cyclic symlinks would instead cause Bazel to appear to hang, and eventually terminate with an unexplanatory OutOfMemoryError, like the following:

/homeworld/ceph$ bazel build @ceph//:all
ERROR: bazel crash in async thread:
java.lang.OutOfMemoryError: Java heap space
Loading: 0 packages loaded
    currently loading: @ceph//

Server terminated abruptly (error code: 14, error message: 'Socket closed', log file: '/home/user/.cache/bazel/_bazel_user/88d7a4b1be88f6a8de40798ebf47b263/server/jvm.out')

I wouldn't expect Bazel to support constructing globs over directories containing cyclic symlinks, but this error message is rather confusing, and I was only able to parse out what was happening (i.e. that the crash came from cyclic symlinks in the source tarball of the external dependency that I was trying to build) by using a memory analyzer on the crash dump and reading over the Bazel source manually.

I suggest that Bazel should explicitly check for cyclic symlinks, and exit with a clear error message if a loop is detected.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Put this in a WORKSPACE file:

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "ceph",
    url = "https://download.ceph.com/tarballs/ceph_14.2.4.orig.tar.gz",
    sha256 = "7180b6afcac57d858f1b7cf49615e8903fbf1a701ae00fb956a5656d608fd0f8",
    build_file_content = """filegroup(name = "all", srcs = glob(["**"]), visibility = ["//visibility:public"])""",
)

Then run this:

$ bazel build @ceph//:all

After running for a long time, I eventually get the error shown previously.

EDIT: @cryslith's comment provides an easier way to reproduce this, which works on more machines (because it sets a smaller memory limit): #10783 (comment)

This can also be reproduced by unpacking the Ceph tarball under an empty WORKSPACE and defining a filegroup in a BUILD file. Demonstrating that this is specifically due to the cyclic symlinks can be done by deleting all .qa symlinks and trying again. (The Ceph tarball also has other issues that prevent its use in a filegroup, but Bazel appears to be able to identify those accurately and report useful errors for them.)

What operating system are you running Bazel on?

I'm using a debian buster chroot.

$ cat /etc/apt/sources.list
deb http://debian.csail.mit.edu/debian buster main

What's the output of bazel info release?

release 2.1.0

Have you found anything relevant by searching the web?

I found issues #133, #1293, #2927, and #6350, which are about other issues with globbing and symlinks, but none of them directly address this problem.

Any other information, logs, or outputs that you want to share?

Two relevant screenshots from running Eclipse Memory Analyzer on the .hprof dump:

Screenshot_2020-02-13_20-24-42
Screenshot_2020-02-13_20-24-16

These show that the problem leading up to the OOM crash is that a very large number of entries are populated into com.google.devtools.build.lib.vfs.UnixGlob$GlobVisitor.results, due to paths that cycle through the .qa symlinks contained in the Ceph source tarball.

@aiuto aiuto added team-Core Skyframe, bazel query, BEP, options parsing, bazelrc untriaged labels Feb 14, 2020
@cryslith
Copy link

cryslith commented Feb 14, 2020

I can reliably reproduce this with the following script, which doesn't require the entire ceph source tree.

#!/bin/bash

mkdir test
pushd test
touch WORKSPACE
cat > BUILD.bazel <<EOF
filegroup(
    name = "a",
    srcs = glob(["a/**"]),
)
EOF
mkdir a
pushd a
touch long_long_long_long_long_long_long_long_long_long_long_long_filename
ln -s ../a c
mkdir b
pushd b
ln -s ../c d
popd
popd

bazel clean --expunge
bazel --host_jvm_args=-Xmx1g build //:a

popd

@jin
Copy link
Member

jin commented Feb 26, 2020

FYI @irengrig @lberki @haxorz

@haxorz
Copy link
Contributor

haxorz commented Feb 26, 2020

Very brief comment: Bazel normally handles cases like this directly and elegantly (explained in my BazelCon 2019 Lighting Talk https://youtu.be/EoYdWmMcqDs)... except for "legacy globbing" (#10610 (comment)) which is where the issue here is happening.

@janakdr
Copy link
Contributor

janakdr commented Apr 20, 2020

Seems like this is a potential vulnerability for any production query environment? Giving to @haxorz for triage. Another team member might be interested, will ping them.

@janakdr janakdr added the P2 We'll consider working on this in future. (Assignee optional) label Nov 8, 2020
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Apr 26, 2023
@github-actions
Copy link

This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team (@bazelbuild/triage). Thanks!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) stale Issues or PRs that are stale (no activity for 30 days) team-Core Skyframe, bazel query, BEP, options parsing, bazelrc type: bug
Projects
None yet
Development

No branches or pull requests

6 participants