Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli,sdnotify: improve the handling of unix socket directories #84532

Merged
merged 3 commits into from
Jul 19, 2022

Conversation

knz
Copy link
Contributor

@knz knz commented Jul 16, 2022

tldr: This patch makes it easier for users to understand when
CockroachDB has a problem using/creating unix sockets, and
what they can do to avert the situation.

Fixes #76904.

Example, before;

$ cockroach start-single-node --insecure --background
...
listen unixgram /data/home/kena/cockroach/this-is-a-very-long-directory-name-the-point-is-to-be-more-than-one-hundred-and-twenty-three-characters/sdnotify2139543236/notify.sock: bind: invalid argument

Example, after:

$ cockroach start-single-node --insecure --background
ERROR: no suitable directory found for the --background notify socket
HINT: Avoid using --background altogether (preferred), or use a
shorter directory name for --socket-dir, TMPDIR or the current directory.

Context:

CockroachDB uses unix sockets both to handle --background and to
create a postgres-compatible SQL client socket via --socket-dir. For
--background, the logic was previously hardwired to always use
os.TempDir(), i.e. either $TMPDIR or /tmp.

On Unix, socket objects are commonly limited to 128 bytes;
that's e.g. 104 path characters including final NUL on BSD, 108 on glibc.
When the actual length is larger, creating the socket fails with
a confusing error bind: invalid argument.

The problem:

Users have reported issues in the following circumstances:

  • when setting their CWD/$TMPDIR to a long directory name,
    they expected --socket-dir to also influence the socket
    name used for --background. Instead, --background
    failed with a hard-to-understand error. (cli/start: sdnotify socket listen chokes if $PWD is too long #76904)

  • when running tests inside a container/sandbox with a long
    prefix, they were confused that --background / --socket-dir=.
    did not work properly.

  • they use macOS which has very large directory names in $TMPDIR.

To improve user experience, this patch changes the behavior
as described in the release notes below below.

Release note (cli change): CockroachDB now produces a clearer error
when the path specified via --socket-dir is too long.

Release note (cli change): When the flag --background is specified,
CockroachDB now makes 3 attempts to find a suitable directory to
create the notification socket: the value of --socket-dir if
specified, the value of $TMPDIR (or /tmp if the env var is empty),
and the current working directory. If none of these directories has a
name short enough, an explanatory error is printed.

@knz knz requested review from bdarnell and stevendanna July 16, 2022 22:36
@knz knz requested review from a team as code owners July 16, 2022 22:36
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20220716-unix branch from 58b5f5a to fc8aeeb Compare July 16, 2022 23:02
@knz knz requested a review from a team July 16, 2022 23:02
@knz knz force-pushed the 20220716-unix branch 2 times, most recently from d871771 to ea14c9f Compare July 16, 2022 23:12
knz added 2 commits July 17, 2022 01:13
tldr: This patch makes it easier for users to understand when
CockroachDB has a problem using/creating unix sockets, and
what they can do to avert the situation.

Example, before;
```
$ cockroach start-single-node --insecure --background
...
listen unixgram /data/home/kena/cockroach/this-is-a-very-long-directory-name-the-point-is-to-be-more-than-one-hundred-and-twenty-three-characters/sdnotify2139543236/notify.sock: bind: invalid argument
```

Example, after:
```
$ cockroach start-single-node --insecure --background
ERROR: no suitable directory found for the --background notify socket
HINT: Avoid using --background altogether (preferred), or use a
shorter directory name for --socket-dir, TMPDIR or the current directory.
```

**Context:**

CockroachDB uses unix sockets both to handle `--background` and to
create a postgres-compatible SQL client socket via `--socket-dir`. For
`--background`, the logic was previously hardwired to always use
`os.TempDir()`, i.e. either `$TMPDIR` or `/tmp`.

On Unix, socket objects are commonly limited to 128 bytes;
that's e.g. 104 path characters including final NUL on BSD, 108 on glibc.
When the actual length is larger, creating the socket fails with
a confusing error `bind: invalid argument`.

**The problem:**

Users have reported issues in the following circumstances:

- when setting their CWD/$TMPDIR to a long directory name,
  they expected `--socket-dir` to also influence the socket
  name used for `--background`. Instead, `--background`
  failed with a hard-to-understand error.

- when running tests inside a container/sandbox with a long
  prefix, they were confused that `--background` / `--socket-dir=.`
  did not work properly.

- they use macOS which has very large directory names in $TMPDIR.

To improve user experience, this patch changes the behavior
as described in the release notes below below.

Release note (cli change): CockroachDB now produces a clearer error
when the path specified via `--socket-dir` is too long.

Release note (cli change): When the flag `--background` is specified,
CockroachDB now makes 3 attempts to find a suitable directory to
create the notification socket: the value of `--socket-dir` if
specified, the value of `$TMPDIR` (or /tmp if the env var is empty),
and the current working directory. If none of these directories has a
name short enough, an explanatory error is printed.
Release note (bug fix): The server process does not any more announce
that it is shutting down on stdout when running with `--background`.
@knz knz force-pushed the 20220716-unix branch 2 times, most recently from b526d14 to 82d92c5 Compare July 18, 2022 12:19
This commit changes `TestAuthenticationAndHBARules` (pgwire) and
`TestSDNotify` (sdnotify) to use `/tmp` as socket directory if the
default temporary directory name is too long.

Release note: None
@jreut jreut self-requested a review July 18, 2022 14:57
Copy link

@jreut jreut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I see mention of long macOS $TMP paths. Have these tests simply historically failed on macOS?

Reviewed 7 of 7 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @stevendanna)


pkg/cli/flags.go line 1013 at r1 (raw file):

		} else {
			socketName := ".s.PGSQL." + serverListenPort
			// On BSD, binding to a socket is limited to a path length of 104 characters

For my own education, could you link to the code or docs that support these facts?


pkg/cli/interactive_tests/test_socket_name.tcl line 65 at r1 (raw file):

system "$crdb sql --insecure -e 'select 1'"
stop_server $crdb
end_test

TIL we have TCL-based tests in CRDB. These are a joy to read!

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have these tests simply historically failed on macOS?

They do when you try to run them outside of the docker sandbox.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @stevendanna)


pkg/cli/flags.go line 1013 at r1 (raw file):

Previously, jreut (jordan ryan reuter) wrote…

For my own education, could you link to the code or docs that support these facts?

https://www.man7.org/linux/man-pages/man7/unix.7.html

      A UNIX domain socket address is represented in the following
       structure:

           struct sockaddr_un {
               sa_family_t sun_family;               /* AF_UNIX */
               char        sun_path[108];            /* Pathname */
           };

       The sun_family field always contains AF_UNIX.  On Linux, sun_path
       is 108 bytes in size

https://www.freebsd.org/cgi/man.cgi?query=unix&sektion=4

UNIX-domain addresses are variable-length file system pathnames of	at
     most 104 characters.  The include file <sys/un.h> defines this address:

	   struct sockaddr_un {
		   u_char  sun_len;
		   u_char  sun_family;
		   char	   sun_path[104];
	   };

@knz
Copy link
Contributor Author

knz commented Jul 19, 2022

TFYR!

bors r=jreut

@craig
Copy link
Contributor

craig bot commented Jul 19, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli/start: sdnotify socket listen chokes if $PWD is too long
3 participants