Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot launch TensorBoard from source due to debugger plugin #431

Closed
wchargin opened this issue Aug 28, 2017 · 24 comments
Closed

Cannot launch TensorBoard from source due to debugger plugin #431

wchargin opened this issue Aug 28, 2017 · 24 comments

Comments

@wchargin
Copy link
Contributor

TensorBoard master, with TensorFlow 1.3.0 from pip, cannot run: it fails to import a Python library related to gRPC.

The error is:

Traceback (most recent call last):
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 38, in <module>
    from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_plugin.py", line 35, in <module>
    from tensorboard.plugins.debugger import debugger_server_lib
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_server_lib.py", line 33, in <module>
    from tensorflow.python.debug.lib import grpc_debug_server
ImportError: cannot import name grpc_debug_server

The first bad commit is (unsurprisingly) a856e61, which I identified by using git bisect with the following script:

#!/bin/bash
! bazel run tensorboard 2>&1 | grep -F 'cannot import name grpc_debug_server'

Steps to reproduce:

$ virtualenv /tmp/tensorflow-1.3.0-fresh
$ source /tmp/tensorflow-1.3.0-fresh/bin/activate
$ pip install tensorflow==1.3.0
$ git checkout b1a4d2586a0eae1ce7f3a18b4db188b62c4daaee  # current origin/master
$ bazel run tensorboard -- --logdir /tmp/data

The following patch fixes the problem:

diff --git a/tensorboard/main.py b/tensorboard/main.py
index ec84e25..fb5d2cd 100644
--- a/tensorboard/main.py
+++ b/tensorboard/main.py
@@ -35,7 +35,7 @@ from tensorboard.backend import application
 from tensorboard.backend.event_processing import event_file_inspector as efi
 from tensorboard.plugins.audio import audio_plugin
 from tensorboard.plugins.core import core_plugin
-from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
+#from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
 from tensorboard.plugins.distribution import distributions_plugin
 from tensorboard.plugins.graph import graphs_plugin
 from tensorboard.plugins.histogram import histograms_plugin
@@ -240,11 +240,12 @@ def main(unused_argv=None):
     efi.inspect(FLAGS.logdir, event_file, FLAGS.tag)
     return 0
   else:
-    def ConstructDebuggerPluginWithGrpcPort(context):
-      debugger_plugin = debugger_plugin_lib.DebuggerPlugin(context)
-      if FLAGS.debugger_data_server_grpc_port is not None:
-        debugger_plugin.listen(FLAGS.debugger_data_server_grpc_port)
-      return debugger_plugin
+    pass
+    #def ConstructDebuggerPluginWithGrpcPort(context):
+    #  debugger_plugin = debugger_plugin_lib.DebuggerPlugin(context)
+    #  if FLAGS.debugger_data_server_grpc_port is not None:
+    #    debugger_plugin.listen(FLAGS.debugger_data_server_grpc_port)
+    #  return debugger_plugin
 
     plugins = [
         core_plugin.CorePlugin,
@@ -258,7 +259,7 @@ def main(unused_argv=None):
         projector_plugin.ProjectorPlugin,
         text_plugin.TextPlugin,
         profile_plugin.ProfilePlugin,
-        ConstructDebuggerPluginWithGrpcPort,
+        #ConstructDebuggerPluginWithGrpcPort,
     ]
 
     tb = create_tb_app(plugins)

Versions:

$ bazel version
Build label: 0.5.4
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Aug 25 10:00:00 2017 (1503655200)
Build timestamp: 1503655200
Build timestamp as int: 1503655200
$ pip --version
pip 9.0.1 from /tmp/tensorflow-1.3.0-fresh/local/lib/python2.7/site-packages (python 2.7)
$ lsb_release -a
No LSB modules are available.
Distributor ID:	LinuxMint
Description:	Linux Mint 18.2 Sonya
Release:	18.2
Codename:	sonya
@wchargin
Copy link
Contributor Author

@chihuahua

@chihuahua
Copy link
Member

Hmm, I'm trying to repro. I ran

pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.3.0rc2-cp27-none-linux_x86_64.whl

and TensorBoard at master HEAD seems to run fine.

@chihuahua
Copy link
Member

One thing to note: TensorBoard used to fail to start for me, but I fixed by pip installing grpcio. However, the error I got from that looked different.

INFO: Running command line: bazel-bin/tensorboard/tensorboard '--logdir=~/Desktop/pr_curve_demo'
Traceback (most recent call last):
File "/private/var/tmp/_bazel_chizeng/1b1399fef0aaaae96df4708880f141bb/execroot/org_tensorflow_tensorboard/bazel-out/darwin_x86_64-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 38, in
from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
File "/private/var/tmp/_bazel_chizeng/1b1399fef0aaaae96df4708880f141bb/execroot/org_tensorflow_tensorboard/bazel-out/darwin_x86_64-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_plugin.py", line 35, in
from tensorboard.plugins.debugger import debugger_server_lib
File "/private/var/tmp/_bazel_chizeng/1b1399fef0aaaae96df4708880f141bb/execroot/org_tensorflow_tensorboard/bazel-out/darwin_x86_64-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_server_lib.py", line 33, in
from tensorflow.python.debug.lib import grpc_debug_server
File "/Users/chizeng/anaconda/lib/python3.6/site-packages/tensorflow/python/debug/lib/grpc_debug_server.py", line 27, in
import grpc
ModuleNotFoundError: No module named 'grpc'

The error you noted seems to instead indicate that the grpc_debug_server module is unavailable.

@wchargin
Copy link
Contributor Author

Using 1.3.0rc2 instead of 1.3.0, with the link that you provided, does not fix the problem.

Additionally installing grpcio does not fix the problem.

In my site packages, the tensorflow.python.debug.lib package contains no file grpc_debug_server.py, so it is no wonder that the import fails. You don't seem to have this problem: could you please post your output for

from tensorflow.python.debug.lib import grpc_debug_server
print(grpc_debug_server.__file__)

Note that this file does exist in nightly TensorFlow. However, (a) I'd thought that we no longer wanted to depend on nightly since the 1.3 release (correct me if wrong?), and (b) the import still fails because a transitive dependency is missing: if I write

$ virtualenv /tmp/tensorflow-nightly-20170828
$ source /tmp/tensorflow-nightly-20170828/bin/activate
$ pip install 'https://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-1.3.0-cp27-none-linux_x86_64.whl'
$ bazel run tensorboard

then the error is

Traceback (most recent call last):
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 38, in <module>
    from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_plugin.py", line 35, in <module>
    from tensorboard.plugins.debugger import debugger_server_lib
  File "/home/wchargin/.cache/bazel/_bazel_wchargin/3f99396cfb979f2f5a2059c1fd233f92/execroot/org_tensorflow_tensorboard/bazel-out/local-fastbuild/bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_server_lib.py", line 33, in <module>
    from tensorflow.python.debug.lib import grpc_debug_server
  File "/tmp/tensorflow-nightly-20170828/local/lib/python2.7/site-packages/tensorflow/python/debug/lib/grpc_debug_server.py", line 26, in <module>
    from concurrent import futures
ImportError: No module named concurrent

@wchargin
Copy link
Contributor Author

To summarize, the only configuration that I have found to work is to install both TensorFlow nightly and the separate grpcio package, which provides the concurrent package. The former might be acceptable, but the latter isn't and should be fixed.

@ioeric
Copy link
Contributor

ioeric commented Aug 28, 2017

FYI, I ran into the same problem, and I did pip install grpc which seemed to fix the problem.

@caisq
Copy link
Contributor

caisq commented Aug 28, 2017

I think this may have to do with the recent update in the tensorboard version that tensorflow 1.3.0 depends on. The new version includes the PR that open-sourced plugin/debugger: #310.

But plugin/debugger depends on grpc_debug_server, which is not available in tensorflow 1.3.0. It is available in tensorflow HEAD, though.

So we have a few options:

  1. Put out a patch release of tensorboard with the PR reverted.
  2. Put out a patch release of tensorflow with the grpc_debug_server cherry picked.

@jart

@caisq
Copy link
Contributor

caisq commented Aug 28, 2017

@wchargin, I may have misunderstood the issue in my previous comment. Now I realize that the issue happens only for developers working at tensorboard master HEAD. For this developer workflow, the way to resolve this issue is to install the nightly tensorflow, instead of tensorflow 1.3.0. tensorflow 1.3.0 doesn't have the grpc_debug_server. The nightly install instructions can be found here:
https://github.com/tensorflow/tensorflow#installation

Note that the Travis testing we have is performed against nightly tensorflow, not latest-release tensorflow.

@luchensk
Copy link

I also met the issue before and fixed it by using the master branch of tensorflow as @caisq said as above.

@luchensk
Copy link

BTW,if you work on MAC OS, please refer to tensorflow/tensorflow#12123, which includes a workaround to compile tensorflow on MAC by replacing -Werror with -Wno-excessive-errors in
add_boringssl_s390x.patch.

@RenatoUtsch
Copy link

Just update to Bazel 0.5.4, the -Werror hack is not needed anymore.

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

Bump—this issue continues to occur on a fresh clone (repro below), and using TF nightly does not fix the issue. @caisq

Here is a revised repro script:

#!/bin/sh
set -eux
tmpdir="$(mktemp -d --suffix _tensorflow)"
virtualenv "${tmpdir}"
. "${tmpdir}/bin/activate"
pip install 'https://ci.tensorflow.org/view/tf-nightly/job/tf-nightly-linux/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/52/artifact/pip_test/whl/tf_nightly-1.head-cp27-none-linux_x86_64.whl'
# pip install futures
# pip install grpc
bazel build //tensorboard
./bazel-bin/tensorboard/tensorboard --logdir ~/data/

This yields:

Traceback (most recent call last):
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 38, in <module>
    from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_plugin.py", line 35, in <module>
    from tensorboard.plugins.debugger import debugger_server_lib
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_server_lib.py", line 33, in <module>
    from tensorflow.python.debug.lib import grpc_debug_server
  File "/tmp/tmp.xP0p6ZLUpx_tensorflow/local/lib/python2.7/site-packages/tensorflow/python/debug/lib/grpc_debug_server.py", line 26, in <module>
    from concurrent import futures
ImportError: No module named concurrent

Uncommenting the first commented line yields:

Traceback (most recent call last):
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/main.py", line 38, in <module>
    from tensorboard.plugins.debugger import debugger_plugin as debugger_plugin_lib
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_plugin.py", line 35, in <module>
    from tensorboard.plugins.debugger import debugger_server_lib
  File "/home/wchargin/git/tensorboard/bazel-bin/tensorboard/tensorboard.runfiles/org_tensorflow_tensorboard/tensorboard/plugins/debugger/debugger_server_lib.py", line 33, in <module>
    from tensorflow.python.debug.lib import grpc_debug_server
  File "/tmp/tmp.2juuukpm8w_tensorflow/local/lib/python2.7/site-packages/tensorflow/python/debug/lib/grpc_debug_server.py", line 27, in <module>
    import grpc
ImportError: No module named grpc

Uncommenting the second line works, although there is still a spurious log entry:

Import grpc:No module named gevent.socket

Note that I've had to go back to TensorFlow build 52 because of a regression introduced recently (#595 (comment)).

Surely this must be fixed. We have dependencies that we are failing to express; I just don't know what the right place to put them is. cc @jart

@jart
Copy link
Contributor

jart commented Oct 3, 2017

It's assumed that, when working from source, you'll pip install futures and grpcio manually into your virtualenv, because it's nontrivial to express them in our Bazel build.

It's hard to integrate futures because, in the pip world, installing that package on Python3 is treated as a no-op. I'm not quite certain how to express that in a Bazel build. Integrating grpcio would require a lot of BUILD configuration and a lot of time spent compiling on Travis. It's not a beautiful thing.

I will however note that I've encountered some other strange errors relating to the debugger plugin and grpc. Please see this comment. It seems like our Travis build might be broken and I'm not sure why.

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

@jart: Thanks for the summary. That's quite unfortunate. I'll add that to DEVELOPMENT.md, but I propose that this issue remain open: if we have some opportunity to fix it (a fixit day, or someone just feels like it some time), then that will be nice.

I linked to that comment of yours near the end of my comment; I can reproduce the issues when using TensorFlow nightly, and I have not found a resolution (though I have not looked too deeply, either).

@caisq
Copy link
Contributor

caisq commented Oct 3, 2017

@wchargin, @jart: futures and grpcio are listed as dependencies of the tensorboard pip package in setup.py. setup.py does not affect bazel runs obviously, which is the reason for the ImportErrors that @wchargin mentioned. The ImportErrors do not occur when pip package is built and installed in a virtualenv.

As for the weird issue that @jart mentioned, I just ran bazel test tensorboard/... on my machine in a virtualenv with futures and grpcio installed. I saw some breakage related to SummaryMetadata, but not the one that @jart pasted:

AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor'

@jart, can you let me know which test shows this particular error?

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

@caisq I reproduce @jart's exact error with the script in #431 (comment), by changing the build number from 52 to 56. Moreover, 56 is the earliest bad build. Simply running bazel run tensorboard triggers the error.

That is, the following script reproduces:

#!/bin/sh
set -eux
tmpdir="$(mktemp -d --suffix _tensorflow)"
virtualenv "${tmpdir}"
. "${tmpdir}/bin/activate"
pip install 'https://ci.tensorflow.org/view/tf-nightly/job/tf-nightly-linux/TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON2,label=cpu-slave/56/artifact/pip_test/whl/tf_nightly-1.head-cp27-none-linux_x86_64.whl'
pip install futures
pip install grpc
bazel build //tensorboard
./bazel-bin/tensorboard/tensorboard --logdir ~/data/

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

Here's the commit diff from 54→56 (there is no build 55); one of these changes causes the regression: tensorflow/tensorflow@e3ceea3...64f0ebd

@chihuahua
Copy link
Member

Have we tried changing the version of protobuf?
googleapis/google-cloud-python#3967

I think I've seen that AttributeError before while using TensorFlow, and I resolved by installing protobuf 3.1.0. https://www.tensorflow.org/versions/r0.12/get_started/os_setup#protobuf_library_related_issues

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

@chihuahua downgrading protobuf from 3.4.0 to 3.1.0 does not fix the issue.

@wchargin
Copy link
Contributor Author

wchargin commented Oct 3, 2017

I observe the following commit in the list: "Update protobuf to 3.4.1" (tensorflow/tensorflow@d16262d). It seems probable that this is related.

@caisq
Copy link
Contributor

caisq commented Oct 3, 2017

I have some rough ideas of what might be the cause and how to fix it from the tensorflow side. Will give it a shot tomorrow.

@jart
Copy link
Contributor

jart commented Oct 3, 2017

Upgrading grpc and protobuf doesn't fix the issue either. How stable is grpc? I'm concerned that issues like these could cause problems for TensorBoard and TensorFlow users if we make it a dependency. Should we rework the debugger code so that it can survive if importing grpc fails? Then have an "inactive plugin" page that tells the user to pip install grpc if he/she wants to use it?

@caisq
Copy link
Contributor

caisq commented Oct 3, 2017

@jart That sounds good to me, too. I will look into that rework.

@caisq
Copy link
Contributor

caisq commented May 24, 2018

This issue is obsolete now. Closing it.

@caisq caisq closed this as completed May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants