Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel failure when running tests #539

Closed
colemakdvorak opened this issue Sep 1, 2019 · 15 comments
Closed

Bazel failure when running tests #539

colemakdvorak opened this issue Sep 1, 2019 · 15 comments

Comments

@colemakdvorak
Copy link
Contributor

Hello!

I am trying to get the recent master branch to build & test for an issue I am trying to resolve. I am running on a fresh virtual env with python 3.6, bazel 0.28.1, and pip installation of tf-nightly on Manjaro Linux 18.0.4.

For reference, here's the output for pip list

Package              Version             
-------------------- --------------------
absl-py              0.8.0               
astor                0.8.0               
gast                 0.2.2               
google-pasta         0.1.7               
grpcio               1.23.0              
h5py                 2.9.0               
Keras-Applications   1.0.8               
Keras-Preprocessing  1.1.0               
Markdown             3.1.1               
numpy                1.17.1              
opt-einsum           3.0.1               
pip                  19.2.3              
protobuf             3.9.1               
setuptools           40.6.2              
six                  1.12.0              
tb-nightly           1.15.0a20190831     
termcolor            1.1.0               
tf-estimator-nightly 1.14.0.dev2019083101
tf-nightly           1.15.0.dev20190730  
Werkzeug             0.15.5              
wheel                0.33.6              
wrapt                1.11.2  

Running

bazel test --copt=-O3 --copt=-march=native //tensorflow_probability/...

Results in

(tf) [dvorak@qwerty probability]$ bazel test --copt=-O3 --copt=-march=native //tensorflow_probability/...
WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
/home/dvorak/Workspace/probability/tools/bazel.rc
INFO: Analyzed 1808 targets (78 packages loaded, 4681 targets configured).
INFO: Found 1278 targets and 530 test targets...
ERROR: /home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/numpy/distributions/BUILD:134:2: Executing genrule //tensorflow_probability/python/experimental/substrates/numpy/distributions:rewrite_vector_student_t failed (Exit 1) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
Traceback (most recent call last):
  File "/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/50/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles/tensorflow_probability/tensorflow_probability/python/experimental/substrates/numpy/rewrite.py", line 24, in <module>
    from absl import app
  File "/home/dvorak/Workspace/tf/lib/python3.6/site-packages/absl/app.py", line 41, in <module>
    from absl import logging
  File "/home/dvorak/Workspace/tf/lib/python3.6/site-packages/absl/logging/__init__.py", line 83, in <module>
    import socket
  File "/usr/lib/python3.6/socket.py", line 52, in <module>
    import os, sys, io, selectors
  File "/usr/lib/python3.6/selectors.py", line 10, in <module>
    import math
  File "/home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/numpy/math/__init__.py", line 22, in <module>
    from tensorflow_probability.python.experimental.substrates.numpy.math.generic import log_add_exp
ModuleNotFoundError: No module named 'tensorflow_probability.python.experimental.substrates.numpy.math'

As far as I can tell from the stack trace, it's trying to import math from an internal namespace instead of importing the standard python math module. Unfortunately, this is my first time with bazel so I don't have a clear picture of what can be done to fix the issue.

I tried looking up few bazel related issues in the past (i.e. #179 #141 ), but I haven't found them helpful in resolving this issue.

Thank you for taking the time to read this issue.

@csuter
Copy link
Member

csuter commented Sep 1, 2019

You'll need some of the flags used in our travis tests here: https://github.com/tensorflow/probability/blob/master/testing/run_tests.sh#L108

Specifically, i think you'll need

    --action_env=PATH \
    --action_env=LD_LIBRARY_PATH \
    --noincompatible_py3_is_default

give those a try and let us know if that doesn't fix it.

@colemakdvorak
Copy link
Contributor Author

colemakdvorak commented Sep 1, 2019

Almost missed your answer because it was so fast :p

Thanks for the direction trying,

bazel test --compilation_mode=opt --copt=-O3 --copt=-march=native --action_env=PATH --action_env=LD_LIBRARY_PATH --test_output=errors --noincompatible_py3_is_default //tensorflow_probability/...

Now gives me a different error,

ERROR: /home/dvorak/Workspace/probability/tensorflow_probability/python/internal/backend/jax/BUILD:83:5: Executing genrule //tensorflow_probability/python/internal/backend/jax:rewrite_test_lib failed (Exit 1) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
Traceback (most recent call last):
  File "/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/6/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/internal/backend/jax/rewrite.runfiles/tensorflow_probability/tensorflow_probability/python/internal/backend/jax/rewrite.py", line 24, in <module>
    from absl import app
ImportError: No module named absl
----------------
Note: The failure of target //tensorflow_probability/python/internal/backend/jax:rewrite (with exit code 1) may have been caused by the fact that it is a Python 3 program that was built in the host configuration, which uses Python 2. You can change the host configuration (for the entire build) to instead use Python 3 by setting --host_force_python=PY3.

If this error started occurring in Bazel 0.27 and later, it may be because the Python toolchain now enforces that targets analyzed as PY2 and PY3 run under a Python 2 and Python 3 interpreter, respectively. See https://github.com/bazelbuild/bazel/issues/7899 for more information.
----------------
INFO: Elapsed time: 19.529s, Critical Path: 0.67s
INFO: 0 processes.

which feels odd given absl should be available. I'll try following the issue linked in the error log, and stay open to continued guidance.

update: I guess the name of noincompatible_py3_is_default confused me so I turned it off. Just back to the same error. I'll check the other flags later.

@colemakdvorak
Copy link
Contributor Author

Gave it another go just now with the suggested flags after a bazel clean, still seeing the same symptom of internal math module shadowing the python standard math.

Closest thing I've found to be related to this issue is from an external repo: RobotLocomotion/drake#8041. However, as mentioned before, I have a very limited knowledge of bazel so I don't know if this is still relevant in the recent bazel versions.

@brianwa84
Copy link
Contributor

brianwa84 commented Sep 3, 2019 via email

@colemakdvorak
Copy link
Contributor Author

colemakdvorak commented Sep 3, 2019

Thanks for the suggestion, but taking the flag out brought me back to the shadowing issue.

update: I guess the name of noincompatible_py3_is_default confused me so I turned it off. Just back to the same error. I'll check the other flags later.

I tried the action_env flag in various ways, if I run with --action_env=PATH then it just shows the shadowing behavior I described.

@brianwa84
Copy link
Contributor

brianwa84 commented Sep 4, 2019 via email

@colemakdvorak
Copy link
Contributor Author

I tried setting up a fresh virtualenv again according to

VENV=$(mktemp -d)
virtualenv -p python3.6 $VENV
source $VENV/bin/activate
pip install tf-nightly cloudpickle hypothesis numpy mock scipy decorator matplotlib

Running following command at master of venv

(tmp.ir3OWSdj5I) [dvorak@qwerty probability]$ bazel test --compilation_mode=opt --copt=-O3 --copt=-march=native --test_tag_filters=-gpu,-requires-gpu-sm35,-notap,-no-oss-ci,-tfp_jax --action_env=PATH //tensorflow_probability/... --test_output=errors

results in

ERROR: /home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/jax/bijectors/BUILD:93:2: Executing genrule //tensorflow_probability/python/experimental/substrates/jax/bijectors:rewrite_identity failed (Exit 1) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
Traceback (most recent call last):
  File "/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/171/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles/tensorflow_probability/tensorflow_probability/python/experimental/substrates/numpy/rewrite.py", line 24, in <module>
    from absl import app
  File "/tmp/tmp.ir3OWSdj5I/lib/python3.6/site-packages/absl/app.py", line 41, in <module>
    from absl import logging
  File "/tmp/tmp.ir3OWSdj5I/lib/python3.6/site-packages/absl/logging/__init__.py", line 83, in <module>
    import socket
  File "/usr/lib64/python3.6/socket.py", line 52, in <module>
    import os, sys, io, selectors
  File "/usr/lib64/python3.6/selectors.py", line 10, in <module>
    import math
  File "/home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/numpy/math/__init__.py", line 22, in <module>
    from tensorflow_probability.python.experimental.substrates.numpy.math.generic import log_add_exp
ModuleNotFoundError: No module named 'tensorflow_probability.python.experimental.substrates.numpy.math'
INFO: Elapsed time: 21.475s, Critical Path: 6.38s
INFO: 48 processes: 48 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

So back to the same error. Is there any other log or setting info that I could provide to demystify this?

@brianwa84
Copy link
Contributor

brianwa84 commented Sep 4, 2019 via email

@colemakdvorak
Copy link
Contributor Author

colemakdvorak commented Sep 4, 2019

Yeah I tried LD_LIBRARY_PATH but not much luck.

On tensorflow_probability/python/experimental/substrates/numpy/rewrite.py

The import lines go as following:

"""Rewrite script for TF->JAX."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections

# Dependency imports
from absl import app
from absl import flags

I tried injecting (right before import collections) and looking at sys.path output during build, it yielded

['/home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/numpy', '/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/233/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles', '/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/233/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles/tensorflow_probability', '/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/233/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles/bazel_tools', '/tmp/tmp.ir3OWSdj5I/lib/python36.zip', '/tmp/tmp.ir3OWSdj5I/lib/python3.6', '/tmp/tmp.ir3OWSdj5I/lib/python3.6/lib-dynload', '/usr/lib64/python3.6', '/usr/lib/python3.6', '/tmp/tmp.ir3OWSdj5I/lib/python3.6/site-packages']

@brianwa84
Copy link
Contributor

brianwa84 commented Sep 5, 2019 via email

@brianwa84
Copy link
Contributor

brianwa84 commented Sep 5, 2019 via email

@colemakdvorak
Copy link
Contributor Author

colemakdvorak commented Sep 7, 2019

Hm, I don't see any genrule in tensorflow_probability/python/experimental/substrates/numpy/BUILD. Only thing I know about genrule is that it's a bazel structure for generating files through user defined bash script during build.

Not sure how I can chain the cd / && to rewrite py_binary.

Edit: Looking at the error log again, I guess I should have looked at tensorflow_probability/python/experimental/substrates/jax/bijectors/BUILD instead?

Edit 2:

Changing

cmd = "$(location //tensorflow_probability/python/experimental/substrates/numpy:rewrite) $(SRCS) 

to

cmd = "cd / && $(location //tensorflow_probability/python/experimental/substrates/numpy:rewrite) $(SRCS) --numpy_to_jax > $@"

yields

ERROR: /home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/jax/distributions/BUILD:128:2: Executing genrule //tensorflow_probability/python/experimental/substrates/jax/distributions:rewrite_mvn_linear_operator failed (Exit 1) bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
Traceback (most recent call last):
  File "/home/dvorak/.cache/bazel/_bazel_dvorak/4f8040e5b73e034f489a3157c8a93f87/sandbox/processwrapper-sandbox/160/execroot/tensorflow_probability/bazel-out/host/bin/tensorflow_probability/python/experimental/substrates/numpy/rewrite.runfiles/tensorflow_probability/tensorflow_probability/python/experimental/substrates/numpy/rewrite.py", line 25, in <module>
    from absl import app
  File "/tmp/tmp.ir3OWSdj5I/lib/python3.6/site-packages/absl/app.py", line 41, in <module>
    from absl import logging
  File "/tmp/tmp.ir3OWSdj5I/lib/python3.6/site-packages/absl/logging/__init__.py", line 83, in <module>
    import socket
  File "/usr/lib64/python3.6/socket.py", line 52, in <module>
    import os, sys, io, selectors
  File "/usr/lib64/python3.6/selectors.py", line 10, in <module>
    import math
  File "/home/dvorak/Workspace/probability/tensorflow_probability/python/experimental/substrates/numpy/math/__init__.py", line 22, in <module>
    from tensorflow_probability.python.experimental.substrates.numpy.math.generic import log_add_exp
ModuleNotFoundError: No module named 'tensorflow_probability.python.experimental.substrates.numpy.math'
INFO: Elapsed time: 1.495s, Critical Path: 0.60s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

So uh, I guess I should try to apply cd / && to every build file that throws the same error..?

tensorflow-copybara pushed a commit that referenced this issue Sep 9, 2019
Quoting http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html#the-double-import-trap:

"""
There?s a reason the general ?no package directories on sys.path? guideline
exists, and the fact that the interpreter itself doesn?t follow it when
determining sys.path[0] is the root cause of all sorts of grief.
"""

Also relevant: bazelbuild/bazel#7091

This is addressed toward issue #539

PiperOrigin-RevId: 268058051
@brianwa84
Copy link
Contributor

Hopefully that resolves your issue! Let me know.

@colemakdvorak
Copy link
Contributor Author

Worked like a charm! Thank you for the help and guidance @brianwa84 @csuter .

I learned a lot through this process - about some flags in bazel, and about the quirks in bazel+python sandbox causing shading issue with the import trap. Still a little embarrassed about my obliviousness to bazel and build scripts in this project but you guys made the experience pleasant.

I hope to learn more, and contribute in the future.

@csuter
Copy link
Member

csuter commented Sep 10, 2019

Glad you are up and running again. No cause for embarrassment, no one knows it all and we are all learning new things every day :)

brianwa84 added a commit to brianwa84/probability that referenced this issue Sep 30, 2019
Quoting http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html#the-double-import-trap:

"""
There?s a reason the general ?no package directories on sys.path? guideline
exists, and the fact that the interpreter itself doesn?t follow it when
determining sys.path[0] is the root cause of all sorts of grief.
"""

Also relevant: bazelbuild/bazel#7091

This is addressed toward issue tensorflow#539

PiperOrigin-RevId: 268058051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants