Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSan: async signals are never being delivered when the target thread is blocked waiting in select #1813

Open
vries opened this issue Nov 5, 2024 · 1 comment

Comments

@vries
Copy link

vries commented Nov 5, 2024

In man select we find:
If timeout is specified as NULL, select() blocks indefinitely waiting for a file descriptor to become ready.

However, in sanitizer_common_syscalls.inc we find that select is not given any COMMON_SYSCALL_BLOCKING_START / COMMON_SYSCALL_BLOCKING_END treatment:

PRE_SYSCALL(select)
(long n, __sanitizer___kernel_fd_set *inp, __sanitizer___kernel_fd_set *outp,
 __sanitizer___kernel_fd_set *exp, void *tvp) {}

POST_SYSCALL(select)
(long res, long n, __sanitizer___kernel_fd_set *inp,
 __sanitizer___kernel_fd_set *outp, __sanitizer___kernel_fd_set *exp,
 void *tvp) {
  if (res >= 0) {
    if (inp)
      POST_WRITE(inp, sizeof(*inp));
    if (outp)
      POST_WRITE(outp, sizeof(*outp));
    if (exp)
      POST_WRITE(exp, sizeof(*exp));
    if (tvp)
      POST_WRITE(tvp, timeval_sz);
  }
}

Consequently, it can happen that a program that uses select runs fine without ThreadSanitizer, but hangs with it.

We run into this when building gdb with -fsanize=thread and running the gdb testsuite. It causes a fair amount of timeouts.

I've proposed a gdb workaround, but this should be fixed in ThreadSanitizer.

I haven't got an MRE, but it should roughly look like this:

  • create a self-pipe
  • install a signal handler that writes to the self-pipe
  • raise a signal, making sure to trigger the !is_sync_signal path
  • call select to read from the self-pipe with timeout == NULL, in an EINTR loop
@vries
Copy link
Author

vries commented Nov 22, 2024

MRE: select.c

#define _GNU_SOURCE
#include <signal.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <assert.h>
#include <stdio.h>

#ifdef VERBOSE
int verbose = 1;
#else
int verbose = 0;
#endif

/* Self-pipe.  */

static int fds[2] = { -1, -1 };

/* Setup self-pipe.  */

static void
setup_self_pipe (void)
{
  int ret;

  ret = pipe2 (fds, O_CLOEXEC);
  assert (ret != -1);
  ret = fcntl (fds[0], F_SETFL, O_NONBLOCK);
  assert (ret != -1);
  ret = fcntl (fds[1], F_SETFL, O_NONBLOCK);
  assert (ret != -1);
}

/* Notifiy self-pipe.  */

static void
notify_self_pipe (void)
{
  while (1)
    {
      int ret = write (fds[1], "+", 1);
      if (ret == -1 && errno == EINTR)
        continue;

      assert (ret != -1);
      break;
    }
}

/* Wait for self-pipe notification.  */

static void
wait_for_self_pipe_notification (void)
{
  fd_set set;
  int n = fds[0];

  while (1)
    {
      FD_ZERO (&set);
      FD_SET (n, &set);
      int ret = select (n + 1, &set, NULL, NULL, NULL);
      if (ret == -1 && errno == EINTR)
        continue;

      assert (ret != -1);
      break;
    }
}

/* Signal handler that notifies self-pipe.  */

static void
handler (int signum)
{
  assert (signum == SIGCHLD);

  int save_errno = errno;
  if (verbose)
    printf ("Parent: handler: Signal received\n");
  notify_self_pipe ();
  if (verbose)
    printf ("Parent: handler: Notification sent\n");
  errno = save_errno;
}

static void
signal_parent (void)
{
  pid_t parent = getppid ();

  /* A more elaborate example would instead call exec ("/usr/bin/kill"),
     taking the generation of the signal out of the scope of compilation.  */
  kill (parent, SIGCHLD);
}

/* The program consists of two parts: generation and handling of a signal.

   The signal is generated in a fork-child.

   Upon arrival of the signal in the fork-parent, the following should happen:
   - signal handler called
   - self-pipe notified
   - wait for self-pipe notification ended
   - program exit.

   Possible output with -DVERBOSE:

   Parent: Waiting for notification
   Child: Waiting to send signal
   <1 second>
   Child: Signal sent
   Parent: handler: Signal received
   Parent: handler: Notification sent
   Parent: Notification received
*/

int
main (void)
{
  setup_self_pipe ();

  pid_t pid;
  pid = fork ();
  assert (pid != -1);

  if (pid == 0)
    {
      /* Fork-child.  */

      /* Wait long enough for fork-parent to block in select.  */
      if (verbose)
        printf ("Child: Waiting to send signal\n");
      sleep (1);

      signal_parent ();
      if (verbose)
        printf ("Child: Signal sent\n");
      exit (0);
    }

  /* Fork-parent.  */

  /* Install signal handler.  */
  sighandler_t signal_ret = signal (SIGCHLD, handler);
  assert (signal_ret != SIG_ERR);

  if (verbose)
    printf ("Parent: Waiting for notification\n");
  wait_for_self_pipe_notification ();
  if (verbose)
    printf ("Parent: Notification received\n");

  return 0;
}

Without:

$ gcc select.c
$ ./a.out 
<1 second>
$ 

With:

$ gcc select.c -fsanitize=thread
$ ./a.out
<hangs>

saagarjha pushed a commit to ahjragaas/binutils-gdb that referenced this issue Nov 22, 2024
When building gdb with -O0 and -fsanitize-thread, I run into a large number of
timeouts caused by gdb hanging, for instance:
...
(gdb) continue^M
Continuing.^M
[Inferior 1 (process 378) exited normally]^M
FAIL: gdb.multi/stop-all-on-exit.exp: continue until exit (timeout)
...

What happens is the following:
- two inferiors are added, stopped at main
- inferior 1 is setup to exit after 1 second
- inferior 2 is setup to exit after 10 seconds
- the continue command is issued
- because of set schedule-multiple on, both inferiors continue
- the first inferior exits
- gdb sends a SIGSTOP to the second inferior
- the second inferior receives the SIGSTOP, and raises a SIGCHILD
- gdb calls select, and blocks
- the signal arrives, and interrupts select
- ThreadSanitizers signal handler is called, which marks the signal pending
  internally
- select returns -1 with errno == EINTR
- gdb calls select again, and blocks
- gdb hangs, waiting for gdb's sigchild_handler to be called

This is a bug [1] in ThreadSanitizer.  When select is called with
timeout == nullptr, it is blocking but ThreadSanitizer doesn't consider it so,
and consequently doesn't see the need to call sigchild_handler.

Work around this by:
- instead of using the blocking select variant, forcing a small timeout and
- upon timeout calling a function that ThreadSanitizer does consider
  blocking: usleep, forcing sigchild_handler to be called.

Tested on x86_64-linux.

PR build/32295
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32295

[1] google/sanitizers#1813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant