-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confirm that the connection to tensorboard works or change to localhost #2371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on :)
tensorboard/program.py
Outdated
# Confirm that the connection is open, otherwise change to `localhost` | ||
socket.setdefaulttimeout(1) | ||
try: | ||
socket.socket().connect((display_host, self.server_port)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use socket.create_connection()
instead - it's a higher level API that should handle, for example, IPv4 and IPv6 (rather than picking just one, at which point if the hostname is only available on the other protocol it will incorrectly fall back to localhost).
Also, I think it only makes sense to do this fallback when the user didn't specify an explicit host via the --host
flag, since if --host
was passed we should use it without modification. That would mean moving this fallback code up into the first case of the if statement above, where we do socket.gethostname()
. And actually it's probably better to replace socket.gethostname()
with socket.getfqdn()
so we try to resolve the hostname to a reachable domain name first if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and addressed this in 88e25, 6a5fc8, and 6c796 respectively.
tensorboard/program.py
Outdated
@@ -526,6 +526,14 @@ def get_url(self): | |||
host = self._flags.host | |||
display_host = ( | |||
'[%s]' % host if ':' in host and not host.startswith('[') else host) | |||
|
|||
# Confirm that the connection is open, otherwise change to `localhost` | |||
socket.setdefaulttimeout(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing the default for the entire socket module, let's use .settimeout()
on the individual socket object - or better yet, pass it as an argument into create_connection()
per the comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted, fixed in 1ae50.
tensorboard/program.py
Outdated
socket.setdefaulttimeout(1) | ||
try: | ||
socket.socket().connect((display_host, self.server_port)) | ||
except socket.timeout as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's catch socket.error
generally so that if there is some other error attempting the connection (not a timeout), we fall back to localhost in that case too. For example, it looks like if the DNS name doesn't resolve at all, this line fails with socket.gaierror
rather than socket.timeout
:
>>> socket.socket().connect(("foobartensorboard.com", 6006))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.gaierror: [Errno -2] Name or service not known
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well spotted, fixed in 71307.
tensorboard/program.py
Outdated
|
||
# Confirm that the connection is open, otherwise change to `localhost` | ||
socket.setdefaulttimeout(1) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that creating the URL involves an actual network request, let's avoid doing that on each call to get_url()
and just cache the constructed URL in a property and return that if it's called another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I proposed a solution in bf05d. Another solution could be to move the code into __init__()
of class WerkzeugServer
and change get_url()
to only the last two lines of the current function (return 'http:// ... .rstrip('/'))
).. Also, why do we have the property _auto_wildcard
if it's simply not host
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience, I was on vacation for a while.
FYI, we're planning on soon changing the default behavior of TensorBoard to only serve on localhost, in which case the URL printed out would use "localhost" always in that case. (Basically, it'll be like passing --host=localhost
by default.) So after that change, the logic in this PR would only affect cases where users specifically opt-in to the wildcard behavior.
tensorboard/program.py
Outdated
else: | ||
host = self._flags.host | ||
display_host = ( | ||
if not self.display_host: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the variable self._url
instead and just store the full URL.
It would probably read better if we factor out something like get_url_host()
e.g.
def get_url_host(self):
if not self._auto_wildcard:
return self._flags.host
host = socket.getfqdn()
try:
socket.create_connection((host, self.server_port), timeout=1)
return host
except socket.error as e:
return 'localhost'
def get_url(self):
if not self._url:
host = self.get_url_host()
display_host = (
'[%s]' % host if ':' in host and not host.startswith('[') else host)
self._url = 'http://%s:%d%s/' % (self.display_host, self.server_port,
self._flags.path_prefix.rstrip('/'))
return self._url
tensorboard/program.py
Outdated
if not self.display_host: | ||
|
||
if self._auto_wildcard: | ||
display_host = socket.getfqdn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "host" is a better name here since this is still a real hostname; the name "display_host" was meant to be only for the host after adding brackets around an IPv6 address so it renders correctly when printed out in the URL. See also suggestion above for a get_url_host()
helper.
@nfelt Thanks for your patience too, I was overwhelmed for a while. Given the move of tensorboard and the conflicts with the base branch, does this PR still make sense, or shall I open a new one from the new codebase? |
@miguelmorin Thanks for following up - I might be misunderstanding but I don't think anything actually moved? There was a behavior change in #2589 though that as mentioned in my last comment makes it so that now by default the URL would use This PR would still affect the default value shown when using |
I misnamed the change of default behavior as a "move". Yes, I still want to update the PR for this edge case. What is simpler for you: that I start a new branch from the latest code or that I rebase master onto this branch? Iin #2589 you mention "wildcard dual-binding behavior": is that when TensorBoard "defaults to serving on the entire local network"? |
@miguelmorin the easiest option is probably if you can rebase your local changes onto latest master and then force-push that branch to
Yes, this was the default behavior prior to 2.0. |
Hi @nfelt, I've finally made time to finish this. The rebasing was straightforward and I force-pushed the branch. Your requested changes are still marked as unsolved and those commits are now gone. How would you like to proceed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for resuming this! Mostly looks good, just a couple last comments.
tensorboard/program.py
Outdated
@@ -597,6 +597,7 @@ def __init__(self, wsgi_app, flags): | |||
host = "localhost" | |||
|
|||
self._host = host | |||
self.display_host = None # Will be set by get_url() below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just cache the entire URL for this, so self._url
.
That way we can skip even the last formatting step in get_url()
if we already have generated the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I cached the entire URL in 3f99276.
tensorboard/program.py
Outdated
if not self.display_host: | ||
if self._auto_wildcard: | ||
self.display_host = socket.getfqdn() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we leave out the blank lines inside the function, since it's not that long?
If we want separation for clarity I'd recommend just factoring out a private helper function as described in the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean just removing line 734, which I did in f3bbb87? If not, I don't see you mean by "as described in the previous comment". And what do you mean by "nit: "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I had in mind both blank lines (so also 742), but it's fine.
The second part of the comment was an alternative suggestion, that for the purpose of clarifying the sub-parts of the function, instead of setting them off with blank lines, to just refactor it into smaller functions. By "previous comment" I meant from earlier in the review thread (#2371 (comment)).
Anyway, sorry that was a bit confusing, this is fine as is.
tensorboard/program.py
Outdated
if not self.display_host: | ||
if self._auto_wildcard: | ||
self.display_host = socket.getfqdn() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I had in mind both blank lines (so also 742), but it's fine.
The second part of the comment was an alternative suggestion, that for the purpose of clarifying the sub-parts of the function, instead of setting them off with blank lines, to just refactor it into smaller functions. By "previous comment" I meant from earlier in the review thread (#2371 (comment)).
Anyway, sorry that was a bit confusing, this is fine as is.
Sorry, I missed that second part of the 18 July comment. My changes already seem outdated, do you want me pull the master branch and do a manual rebase of my changes? |
I don't see any conflicts with the base branch? I can go ahead and merge now unless there were further changes you were going to make. |
No, I don't see conflicts with the base branch. I see |
Outdated just means the comment is on code that isn't at the most recent commit touching that file; it's fine in this case. Will merge. |
The TensorBoard log told me to go to
http://laptop-name.corp.domain.tld:6006
and that connection failed, buthttp://localhost:6006
worked.I tested the connection to the host with
socket()
, and if it is closed, the host becomeslocalhost
.No UI changes.
I ran
export PYTHONPATH="$PYTHONPATH:$HOME/code/tensorflow:$HOME/code:tensorboard"; python3 tensorboard/program.py
, which threw this unrelated error (and both repos are in sync with upstream):None.