-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting tensorboard inline within a Jupyter notebook consistently times out #4300
Comments
Hi @joyceerhl, thanks for the report. I tried reproducing it and failed to do so. It may truly be OS dependent. Could you try to run Technically speaking, we should never take more than a minute to load TensorFlow or TensorBoard so this is bad but above exercise may shine some light into the underlying issue. |
AFAICT this is not an import latency bug because TensorBoard actually starts up almost immediately. I can even access TensorBoard in a browser running on localhost less than 2 seconds after launching it inline in a notebook. The issue seems to be that even after TensorBoard is started, the notebook extension running inside the Jupyter kernel does not send back the iframe that is the result of executing |
@joyceerhl, |
Sorry for not getting back to you, @joyceerhl. I had some trouble setting up my Windows environment then I forgot about this bug :( I was able to reproduce your bug and was able to narrow it down a little. When running on Windows, we never get past this: tensorboard/tensorboard/manager.py Lines 433 to 435 in a46a6f6
In order to narrow the problem down, I tried few things. # subprogram.py
import os
print("subprogram", os.getpid())
# In TensorBoard's main.py
print("tb main", os.getpid())
# test_main.py
# p = subprocess.Popen(["python", "[PATH_TO]/subprogram.py"],)
p = subprocess.Popen(["tensorboard"],)
print("main", p.pid) Weird thing is, when I run my subprogram, the pid on subprocess was equal to that of one queried within the subprogram.py while the same is not true for running
I can reproduce the same issue on Python 3.7.1 @wchargin, would you have any idea how this can happen? |
If I understand correctly, you’re saying that when you launch If so, I wonder whether there is some kind of wrapper script around Maybe one fix would be to relax the This would explain why retrying @joyceerhl: Unfortunately, I have no Windows box on which to test (Linux Let me try to come up with something and send it your way. |
Summary: When the `%tensorboard` cell magic is invoked, we compute a cache key for the “hermetic environment”, primarily args to `%tensorboard` and the working directory. We first check whether any running TensorBoard instances match that cache key, and launch a new instance if none do. But then, while polling for the new instance to have launched, we had a different matching criterion, checking for a process ID match instead of a cache key match. The idea was that “is this TensorBoard instance’s PID equal to the PID of the subprocess that we just spawned?” would be a more reliable check. But on Windows ((╯°□°)╯︵ ┻━┻) this is not the case, presumably because the `tensorboard` console script has some kind of wrapper process in certain versions of Python. This manifested as “`%tensorboard` always times out on the first invocation, but works immediately when I invoke it again”, since invoking it again triggers the cache key check rather than the PID check. So we now just check by cache key in all cases, and the logic is consistent, if a bit less precise overall. Fixes #4300. Test Plan: Still works for me on Linux, with both new and existing TensorBoard processes across multiple (concurrent) cache keys. @stephanwlee can repro the bug and fix on Windows with Python 3.8. wchargin-branch: notebook-poll-no-pid-filter
Hi @joyceerhl—we think we have a fix for this in #4407. It should go out |
Patched the diff in and it works perfectly! Thanks so much for tracking down the fix 😊 |
Excellent! Thanks for letting us know. |
Consider Stack Overflow for getting support using TensorBoard—they have
a larger community with better searchability:
https://stackoverflow.com/questions/tagged/tensorboard
Do not use this template for for setup, installation, or configuration
issues. Instead, use the “installation problem” issue template:
https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md
To report a problem with TensorBoard itself, please fill out the
remainder of this template.
Environment information (required)
Please run
diagnose_tensorboard.py
(link below) in the sameenvironment from which you normally run TensorFlow/TensorBoard, and
paste the output here:
Diagnostics
Diagnostics output
For browser-related issues, please additionally specify:
Issue description
Please describe the bug as clearly as possible. How can we reproduce the
problem without additional resources (including external data files and
proprietary Python modules)?
Repro steps:
python -m pip install jupyter tensorflow
jupyter notebook
%load_ext tensorboard
in one cell, then%tensorboard --logdir logs/fit
in a second cell7. If I rerun the cell with
%tensorboard --logdir logs/fit
, tensorboard does show up inline.The diagnostic info above is based on my local machine. I initially thought my local box might simply have a busted tensorboard install, but the problem persisted even after uninstalling and reinstalling tensorflow and tensorboard. I was able to repro this on a clean Windows VM with the above repro steps.
Debugging the iopub messages sent by the Jupyter kernel, the Jupyter kernel does send a message with the 'Launching TensorBoard...' message in response to the execution request for
%tensorboard --logdir logs/fit
the first time, but it doesn't ever send the iframe back. For some reason requesting the tensorboard launch again results in an immediate response about 'Reusing TensorBoard on port <...>', followed by a message with the iframe for display in the cell output.Happy to provide additional information that will help with diagnosing the issue!
The text was updated successfully, but these errors were encountered: