Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: bind to localhost rather than all interfaces #2589

Merged
merged 5 commits into from
Sep 18, 2019

Conversation

wchargin
Copy link
Contributor

@wchargin wchargin commented Aug 22, 2019

Summary:
Prior to this change, TensorBoard would default to serving on the entire
local network; now, TensorBoard serves to the local machine only, and
the flag --bind_all can be used to dual-bind to IPv4 and IPv6 on the
entire local network (the previous default). See #2387 and comments
therein for details.

Test Plan:
On my Debian machine, running with strace -e trace=%network:

  • running with no --host flag:
    • can connect to loopback on IPv4 only
    • cannot connect over LAN
    • strace shows binding on AF_INET
    • a notice about --bind_all is printed to stderr
  • running with --host=localhost:
    • same behavior as with no --host flag, but no notice is printed
  • running with --host='::1':
    • can connect to loopback on IPv6 only
    • cannot connect over LAN
    • strace shows binding on AF_INET6
  • running with --host=0.0.0.0:
    • can connect to loopback on IPv4 only
    • can connect over LAN
    • strace shows binding on AF_INET
  • running with --host='::0':
    • can connect on both IPv4 and IPv6
    • can connect over LAN
    • strace shows binding on AF_INET6
  • running with --bind_all:
    • can connect on both IPv4 and IPv6
    • can connect over LAN
    • strace shows binding on AF_INET6 with an additional syscall
      to setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [0], 4) to facilitate
      the dual-binding, which is not present in any other tested case

In all cases, the printed serving URL (“TensorBoard x.y.z running at…”)
bears the exact --host flag, or my full hostname if --bind_all was
given, or localhost if neither was given. In all cases, the URL is a
clickable link in my gnome-terminal.

Note that on my system dual binding to ::0 works without an explicit
syscall—i.e., IPV6_V6ONLY defaults to 0—but this is not portable.

Connection testing was performed via

for ipv in 4 6; do
    if curl -sfL -"${ipv}" localhost:6006/data/logdir >/dev/null; then
        printf 'v%d OK\n' "${ipv}"
    else
        printf 'v%d FAIL\n' "${ipv}"
    fi
done

in all cases.

wchargin-branch: localhost-only

Summary:
Prior to this change, TensorBoard would default to serving on the entire
local network; now, TensorBoard serves to the local machine only, and
the flag `--host='*'` can be used to dual-bind to IPv4 and IPv6 on the
entire local network (the previous default). See #2387 and comments
therein for details.

Test Plan:
On my Debian machine, running with `strace -e trace=%network`, and
testing connection with `curl -4 localhost:6006/data/logdir` (or `-6`):

  - running with no `--host` flag, or `--host=localhost`:
      - can connect to loopback on IPv4 only
      - cannot connect over LAN
      - `strace` shows binding on `AF_INET`
  - running with `--host='::1'`:
      - can connect to loopback on IPv6 only
      - cannot connect over LAN
      - `strace` shows binding on `AF_INET6`
  - running with `--host=0.0.0.0`:
      - can connect to loopback on IPv4 only
      - **can** connect over LAN
      - `strace` shows binding on `AF_INET`
  - running with `--host='*'`:
      - can connect on both IPv4 and IPv6
      - **can** connect over LAN
      - `strace` shows binding on `AF_INET6` with an additional syscall
        to `setsockopt(3, SOL_IPV6, IPV6_V6ONLY, [0], 4)` to facilitate
        the dual-binding, which is not present in any other tested case

wchargin-branch: localhost-only
@wchargin
Copy link
Contributor Author

I’ve attempted to implement this to the spec in #2387. I’ve read through
#1320 (comment),
but I can claim no expertise here, so @nfelt’s review and testing are
warmly welcomed.

@wchargin
Copy link
Contributor Author

From offline discussion: we don’t much like that the --host='*' form
uses a literal asterisk, because if one writes --host * without quotes
then glob expansion will produce an argument list that is not what was
intended. (Technically, --host=* unquoted could do the same, but this
less unlikely.)

We chose “*” in the first place because it’s one of very few strings
that can’t be a valid address. For instance, choosing --host=all as
the magic incantation would be confusing on machines with literal
hostname all. An option that is neither a POSIX shell special
character nor a valid address would be --host=., but this is not
terribly clear, as . more typically refers to the current directory
and there’s nothing to suggest that it should mean “bind publicly to all
addresses”.

We propose instead the following:

  • If no flags are specified, use ::1, an IPv6 loopback address (like
    IPv4’s 127.0.0.1), as the default host.
  • If a new flag --public is given, change the default host to ::,
    the IPv6 wildcard address (like IPv4’s 0.0.0.0). It shall be an
    error to use both --public and --host.
  • Always attempt to dual-bind an IPv6 socket as IPv4. This is
    implemented by using setsockopt to clear the IPV6_V6ONLY flag,
    so will have no effect if the host is explicitly IPv4: e.g., if the
    user passes --host 0.0.0.0 or --host localhost. We see no harm
    in dual-binding to IPv4 even when the user explicitly gives us an
    IPv6 host like ::1.

This way, the stock answer to user questions of the form “why can’t my
collaborators see my TensorBoards?” can be “just add --public”, and
it’s clear from the command line at least that --public is some sort
of ACL expansion.

@nfelt
Copy link
Contributor

nfelt commented Aug 23, 2019

So... after mulling this over, the flag --public still doesn't sit quite right with me since it's not clearly related to the host flag, and the implications of "public" are a little ambiguous (it's not necessarily actually public in a meaningful sense). And conversely, any use of --host that isn't with localhost might also result in a "public" tensorboard, so the absence of --public isn't any guarantee that your instance is inaccessible.

Would using all as a sentinel value really be that bad? It seems very unlikely to be an entire hostname in practice, since even if the local part of the hostname was all, there would typically be further parts to actually form a FQDN of any practical use.

Also, I think we probably want to special-case localhost to ensure it gets dual-mapped (and maybe even 127.0.0.1 as well) because it's not necessarily IPv4-only. In fact, what I recall from testing this is that at least on my machine and presumably many others, localhost has DNS mappings for both IPv4 and IPv6, and which one actually gets used by your browser is browser-dependent. In particular, if you don't dual-map it becomes possible in some cases to run two different servers on localhost on the same port, and two tabs pointed at that address might actually end up connecting over different IP flavors to get different backends.

@wchargin
Copy link
Contributor Author

Okay, I’ll hold off for now. A few quick notes:

Would using all as a sentinel value really be that bad? It seems
very unlikely to be an entire hostname in practice, since even if the
local part of the hostname was all, there would typically be further
parts to actually form a FQDN of any practical use.

It doesn’t have to be the entire hostname; with current TensorBoard,
invoking

tensorboard --logdir /tmp/x1 --port 6006 --host "$(hostname -s)"

(where my hostname -s is a single label) exposes the instance to LAN.
Special-casing all would change that behavior.

Not to say that we can’t bite the bullet and do that anyway.

it becomes possible in some cases to run two different servers on
localhost on the same port, and two tabs pointed at that address might
actually end up connecting over different IP flavors to get different
backends

Yes; I ran into this while writing manager.py. A simple example:

$ tensorboard --logdir /tmp/x1 --port 6006 --host 127.0.0.1
TensorBoard 1.15.0a20190822 at http://127.0.0.1:6006/ (Press CTRL+C to quit)
^Z
[1]+  Stopped                 tensorboard --logdir /tmp/x1 --port 6006 --host 127.0.0.1
$ tensorboard --logdir /tmp/x2 --port 6006 --host ::1
TensorBoard 1.15.0a20190822 at http://[::1]:6006/ (Press CTRL+C to quit)
^Z
[2]+  Stopped                 tensorboard --logdir /tmp/x2 --port 6006 --host ::1
$ bg %1 %2
[1]- tensorboard --logdir /tmp/x1 --port 6006 --host 127.0.0.1 &
[2]+ tensorboard --logdir /tmp/x2 --port 6006 --host ::1 &
$ curl -4 localhost:6006/data/logdir; printf '\n'
{"logdir": "/tmp/x1"}
$ curl -6 localhost:6006/data/logdir; printf '\n'
{"logdir": "/tmp/x2"}

When given an IPv4 loopback address (possibly including localhost), we
can’t actually force it to be dual-bound, right? On my testing on
current Debian, binding one socket to 127.0.0.1 and another to ::1
on the same port seems to always succeed and allow version-specific
connections to both (and in particular binding to ::1 immediately
resets the IPV6_V6ONLY socket option to 1), so if the user specifies
those two hosts I don’t see how any dual-binding configuration helps.

@wchargin
Copy link
Contributor Author

We chose “*” in the first place because it’s one of very few strings
that can’t be a valid address

FWIW, --host=... would be another unambiguous specifier, which may be
more or less confusing than --host=. but is still not great.

@nfelt
Copy link
Contributor

nfelt commented Sep 17, 2019

Okay, what about the following path forward:

  • add a --bind-all flag separate from --host to activate today's wildcard dual-binding behavior
  • make it an error to pass --bind-all and --host
  • default --host to None
  • if --host is None (the default), behave like passing --host localhost, except also print a message to stderr just before the normal launch message that says:

Serving TensorBoard on localhost - to expose to the network use a proxy or pass --bind-all

Edited to add: oh and yeah, I expected that dual-binding would work for ::1 but apparently it doesn't based on my testing and some internet searches, so I guess we'll just have to accept the not great behavior until such point as we want to actually implement two-socket serving ourselves.

wchargin-branch: localhost-only
wchargin-source: c05af50a7db71f2ab0b762f05f02ddf24ab76103
wchargin-branch: localhost-only
wchargin-source: c05af50a7db71f2ab0b762f05f02ddf24ab76103
@wchargin
Copy link
Contributor Author

That plan sounds good to me. Done; PTAL.

wchargin-branch: localhost-only
wchargin-source: f3749cf614b1530c0031a2380c63dc3a629206ba
Copy link
Contributor

@nfelt nfelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif host is None:
host = 'localhost'

self._host = host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is self._host actually used anywhere? Later down you read self.host, which is the property set by werkzeug:
https://github.com/pallets/werkzeug/blob/bd60d52ba14f32a38caffc674fb17c9090ef70ce/src/werkzeug/serving.py#L728

Not sure if those are intentionally different or a typo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; this was an oversight. I’ll conservatively use self._host.

@wchargin wchargin requested a review from nfelt September 18, 2019 21:12
wchargin-branch: localhost-only
wchargin-source: 3fbdc41b7ce84cf605953a9cc3f5252380d88d13
Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs and the _host. Thanks!

elif host is None:
host = 'localhost'

self._host = host
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; this was an oversight. I’ll conservatively use self._host.

@wchargin wchargin merged commit 9dfd82b into master Sep 18, 2019
@wchargin wchargin deleted the wchargin-localhost-only branch September 18, 2019 23:18
wchargin added a commit that referenced this pull request Sep 20, 2019
Summary:
These tests are currently broken on IPv6-only systems. While it’s valid
to open a _socket_ that binds to `("localhost", 0)` even if the system
only supports IPv6, it’s not valid to start a _Werkzeug server_ with
host `localhost`, because Werkzeug detects `localhost` as an IPv4
address and will open an `AF_INET` socket, causing catastrophe.

This restores the test behavior prior to #2589, which IMHO isn’t great,
but unbreaks these tests on IPv6-only systems.

Test Plan:
This test still works on my dual-IPv4/IPv6 machine, and now also works
on an IPv6-only machine where previously it failed with `EAFNOSUPPORT`.

wchargin-branch: program-test-bind-all
wchargin added a commit that referenced this pull request Sep 23, 2019
Summary:
These tests are currently broken on IPv6-only systems. While it’s valid
to open a _socket_ that binds to `("localhost", 0)` even if the system
only supports IPv6, it’s not valid to start a _Werkzeug server_ with
host `localhost`, because Werkzeug detects `localhost` as an IPv4
address and will open an `AF_INET` socket, causing catastrophe.

This restores the test behavior prior to #2589, which IMHO isn’t great,
but unbreaks these tests on IPv6-only systems.

Test Plan:
This test still works on my dual-IPv4/IPv6 machine, and now also works
on an IPv6-only machine where previously it failed with `EAFNOSUPPORT`.

wchargin-branch: program-test-bind-all
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants