Skip to content

Commit

Permalink
gdb: fix b/p conditions with infcalls in multi-threaded inferiors
Browse files Browse the repository at this point in the history
This commit fixes bug PR 28942, that is, creating a conditional
breakpoint in a multi-threaded inferior, where the breakpoint
condition includes an inferior function call.

Currently, when a user tries to create such a breakpoint, then GDB
will fail with:

  (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ())
  Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61.
  (gdb) continue
  Continuing.
  [New Thread 0x7ffff7c5d700 (LWP 2460150)]
  [New Thread 0x7ffff745c700 (LWP 2460151)]
  [New Thread 0x7ffff6c5b700 (LWP 2460152)]
  [New Thread 0x7ffff645a700 (LWP 2460153)]
  [New Thread 0x7ffff5c59700 (LWP 2460154)]
  Error in testing breakpoint condition:
  Couldn't get registers: No such process.
  An error occurred while in a function called from GDB.
  Evaluation of the expression containing the function
  (return_true) will be abandoned.
  When the function is done executing, GDB will silently stop.
  Selected thread is running.
  (gdb)

Or, in some cases, like this:

  (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1))
  Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56.
  (gdb) continue
  Continuing.
  [New Thread 0x7ffff7c5d700 (LWP 2461106)]
  [New Thread 0x7ffff745c700 (LWP 2461107)]
  ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.

The precise error depends on the exact thread state; so there's race
conditions depending on which threads have fully started, and which
have not.  But the underlying problem is always the same; when GDB
tries to execute the inferior function call from within the breakpoint
condition, GDB will, incorrectly, try to resume threads that are
already running - GDB doesn't realise that some threads might already
be running.

The solution proposed in this patch requires an additional member
variable thread_info::in_cond_eval.  This flag is set to true (in
breakpoint.c) when GDB is evaluating a breakpoint condition.

In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is
true, then GDB will only try to resume the current thread, that is,
the thread for which the breakpoint condition is being evaluated.
This solves the problem of GDB trying to resume threads that are
already running.

The next problem is that inferior function calls are assumed to be
synchronous, that is, GDB doesn't expect to start an inferior function
call in thread #1, then receive a stop from thread #2 for some other,
unrelated reason.  To prevent GDB responding to an event from another
thread, we update fetch_inferior_event and do_target_wait in infrun.c,
so that, when an inferior function call (on behalf of a breakpoint
condition) is in progress, we only wait for events from the current
thread (the one evaluating the condition).

In do_target_wait I had to change the inferior_matches lambda
function, which is used to select which inferior to wait on.
Previously the logic was this:

   auto inferior_matches = [&wait_ptid] (inferior *inf)
     {
       return (inf->process_target () != nullptr
               && ptid_t (inf->pid).matches (wait_ptid));
     };

This compares the pid of the inferior against the complete ptid we
want to wait on.  Before this commit wait_ptid was only ever
minus_one_ptid (which is special, and means any process), and so every
inferior would match.

After this commit though wait_ptid might represent a specific thread
in a specific inferior.  If we compare the pid of the inferior to a
specific ptid then these will not match.  The fix is to compare
against the pid extracted from the wait_ptid, not against the complete
wait_ptid itself.

In fetch_inferior_event, after receiving the event, we only want to
stop all the other threads, and call inferior_event_handler with
INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint.
If we are, then all the other threads should be left doing whatever
they were before.  The inferior_event_handler call will be performed
once the breakpoint condition has finished being evaluated, and GDB
decides to stop or not.

The final problem that needs solving relates to GDB's commit-resume
mechanism, which allows GDB to collect resume requests into a single
packet in order to reduce traffic to a remote target.

The problem is that the commit-resume mechanism will not send any
resume requests for an inferior if there are already events pending on
the GDB side.

Imagine an inferior with two threads.  Both threads hit a breakpoint,
maybe the same conditional breakpoint.  At this point there are two
pending events, one for each thread.

GDB selects one of the events and spots that this is a conditional
breakpoint, GDB evaluates the condition.

The condition includes an inferior function call, so GDB sets up for
the call and resumes the one thread, the resume request is added to
the commit-resume queue.

When the commit-resume queue is committed GDB sees that there is a
pending event from another thread, and so doesn't send any resume
requests to the actual target, GDB is assuming that when we wait we
will select the event from the other thread.

However, as this is an inferior function call for a condition
evaluation, we will not select the event from the other thread, we
only care about events from the thread that is evaluating the
condition - and the resume for this thread was never sent to the
target.

And so, GDB hangs, waiting for an event from a thread that was never
fully resumed.

To fix this issue I have added the concept of "forcing" the
commit-resume queue.  When enabling commit resume, if the force flag
is true, then any resumes will be committed to the target, even if
there are other threads with pending events.

A note on authorship: this patch was based on some work done by
Natalia Saiapova and Tankut Baris Aktemur from Intel[1].  I have made
some changes to their work in this version.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942

[1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html

Co-authored-by: Natalia Saiapova <[email protected]>
Co-authored-by: Tankut Baris Aktemur <[email protected]>
Reviewed-By: Tankut Baris Aktemur <[email protected]>
Tested-By: Luis Machado <[email protected]>
Tested-By: Keith Seitz <[email protected]>
  • Loading branch information
3 people committed Mar 25, 2024
1 parent 07505b6 commit 3df7843
Show file tree
Hide file tree
Showing 11 changed files with 952 additions and 15 deletions.
2 changes: 2 additions & 0 deletions gdb/breakpoint.c
Original file line number Diff line number Diff line change
Expand Up @@ -5665,6 +5665,8 @@ bpstat_check_breakpoint_conditions (bpstat *bs, thread_info *thread)
{
try
{
scoped_restore reset_in_cond_eval
= make_scoped_restore (&thread->control.in_cond_eval, true);
condition_result = breakpoint_cond_eval (cond);
}
catch (const gdb_exception_error &ex)
Expand Down
3 changes: 3 additions & 0 deletions gdb/gdbthread.h
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,9 @@ struct thread_control_state
command. This is used to decide whether "set scheduler-locking
step" behaves like "on" or "off". */
int stepping_command = 0;

/* True if the thread is evaluating a BP condition. */
bool in_cond_eval = false;
};

/* Inferior thread specific part of `struct infcall_suspend_state'. */
Expand Down
6 changes: 6 additions & 0 deletions gdb/infcall.c
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,12 @@ run_inferior_call (std::unique_ptr<call_thread_fsm> sm,

proceed (real_pc, GDB_SIGNAL_0);

/* Enable commit resume, but pass true for the force flag. This
ensures any thread we set running in proceed will actually be
committed to the target, even if some other thread in the current
target has a pending event. */
scoped_enable_commit_resumed enable ("infcall", true);

infrun_debug_show_threads ("non-exited threads after proceed for inferior-call",
all_non_exited_threads ());

Expand Down
64 changes: 50 additions & 14 deletions gdb/infrun.c
Original file line number Diff line number Diff line change
Expand Up @@ -2406,6 +2406,14 @@ user_visible_resume_ptid (int step)
mode. */
resume_ptid = inferior_ptid;
}
else if (inferior_ptid != null_ptid
&& inferior_thread ()->control.in_cond_eval)
{
/* The inferior thread is evaluating a BP condition. Other threads
might be stopped or running and we do not want to change their
state, thus, resume only the current thread. */
resume_ptid = inferior_ptid;
}
else if (!sched_multi && target_supports_multi_process ())
{
/* Resume all threads of the current process (and none of other
Expand Down Expand Up @@ -3201,12 +3209,24 @@ schedlock_applies (struct thread_info *tp)
execution_direction)));
}

/* Set process_stratum_target::COMMIT_RESUMED_STATE in all target
stacks that have threads executing and don't have threads with
pending events. */
/* When FORCE_P is false, set process_stratum_target::COMMIT_RESUMED_STATE
in all target stacks that have threads executing and don't have threads
with pending events.
When FORCE_P is true, set process_stratum_target::COMMIT_RESUMED_STATE
in all target stacks that have threads executing regardless of whether
there are pending events or not.
Passing FORCE_P as false makes sense when GDB is going to wait for
events from all threads and will therefore spot the pending events.
However, if GDB is only going to wait for events from select threads
(i.e. when performing an inferior call) then a pending event on some
other thread will not be spotted, and if we fail to commit the resume
state for the thread performing the inferior call, then the inferior
call will never complete (or even start). */

static void
maybe_set_commit_resumed_all_targets ()
maybe_set_commit_resumed_all_targets (bool force_p)
{
scoped_restore_current_thread restore_thread;

Expand Down Expand Up @@ -3235,7 +3255,7 @@ maybe_set_commit_resumed_all_targets ()
status to report, handle it before requiring the target to
commit its resumed threads: handling the status might lead to
resuming more threads. */
if (proc_target->has_resumed_with_pending_wait_status ())
if (!force_p && proc_target->has_resumed_with_pending_wait_status ())
{
infrun_debug_printf ("not requesting commit-resumed for target %s, a"
" thread has a pending waitstatus",
Expand All @@ -3245,7 +3265,7 @@ maybe_set_commit_resumed_all_targets ()

switch_to_inferior_no_thread (inf);

if (target_has_pending_events ())
if (!force_p && target_has_pending_events ())
{
infrun_debug_printf ("not requesting commit-resumed for target %s, "
"target has pending events",
Expand Down Expand Up @@ -3338,7 +3358,7 @@ scoped_disable_commit_resumed::reset ()
{
/* This is the outermost instance, re-enable
COMMIT_RESUMED_STATE on the targets where it's possible. */
maybe_set_commit_resumed_all_targets ();
maybe_set_commit_resumed_all_targets (false);
}
else
{
Expand Down Expand Up @@ -3371,7 +3391,7 @@ scoped_disable_commit_resumed::reset_and_commit ()
/* See infrun.h. */

scoped_enable_commit_resumed::scoped_enable_commit_resumed
(const char *reason)
(const char *reason, bool force_p)
: m_reason (reason),
m_prev_enable_commit_resumed (enable_commit_resumed)
{
Expand All @@ -3383,7 +3403,7 @@ scoped_enable_commit_resumed::scoped_enable_commit_resumed

/* Re-enable COMMIT_RESUMED_STATE on the targets where it's
possible. */
maybe_set_commit_resumed_all_targets ();
maybe_set_commit_resumed_all_targets (force_p);

maybe_call_commit_resumed_all_targets ();
}
Expand Down Expand Up @@ -4136,10 +4156,11 @@ do_target_wait (ptid_t wait_ptid, execution_control_state *ecs,
polling the rest of the inferior list starting from that one in a
circular fashion until the whole list is polled once. */

auto inferior_matches = [&wait_ptid] (inferior *inf)
ptid_t wait_ptid_pid {wait_ptid.pid ()};
auto inferior_matches = [&wait_ptid_pid] (inferior *inf)
{
return (inf->process_target () != nullptr
&& ptid_t (inf->pid).matches (wait_ptid));
&& ptid_t (inf->pid).matches (wait_ptid_pid));
};

/* First see how many matching inferiors we have. */
Expand Down Expand Up @@ -4628,7 +4649,17 @@ fetch_inferior_event ()
the event. */
scoped_disable_commit_resumed disable_commit_resumed ("handling event");

if (!do_target_wait (minus_one_ptid, &ecs, TARGET_WNOHANG))
/* Is the current thread performing an inferior function call as part
of a breakpoint condition evaluation? */
bool in_cond_eval = (inferior_ptid != null_ptid
&& inferior_thread ()->control.in_cond_eval);

/* If the thread is in the middle of the condition evaluation, wait for
an event from the current thread. Otherwise, wait for an event from
any thread. */
ptid_t waiton_ptid = in_cond_eval ? inferior_ptid : minus_one_ptid;

if (!do_target_wait (waiton_ptid, &ecs, TARGET_WNOHANG))
{
infrun_debug_printf ("do_target_wait returned no event");
disable_commit_resumed.reset_and_commit ();
Expand Down Expand Up @@ -4686,7 +4717,12 @@ fetch_inferior_event ()
bool should_notify_stop = true;
bool proceeded = false;

stop_all_threads_if_all_stop_mode ();
/* If the thread that stopped just completed an inferior
function call as part of a condition evaluation, then we
don't want to stop all the other threads. */
if (ecs.event_thread == nullptr
|| !ecs.event_thread->control.in_cond_eval)
stop_all_threads_if_all_stop_mode ();

clean_up_just_stopped_threads_fsms (&ecs);

Expand All @@ -4713,7 +4749,7 @@ fetch_inferior_event ()
proceeded = normal_stop ();
}

if (!proceeded)
if (!proceeded && !in_cond_eval)
{
inferior_event_handler (INF_EXEC_COMPLETE);
cmd_done = 1;
Expand Down
3 changes: 2 additions & 1 deletion gdb/infrun.h
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,8 @@ extern void maybe_call_commit_resumed_all_targets ();

struct scoped_enable_commit_resumed
{
explicit scoped_enable_commit_resumed (const char *reason);
explicit scoped_enable_commit_resumed (const char *reason,
bool force_p = false);
~scoped_enable_commit_resumed ();

DISABLE_COPY_AND_ASSIGN (scoped_enable_commit_resumed);
Expand Down
135 changes: 135 additions & 0 deletions gdb/testsuite/gdb.threads/infcall-from-bp-cond-other-thread-event.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/* Copyright 2022-2024 Free Software Foundation, Inc.
This file is part of GDB.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>. */

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <sched.h>

#define NUM_THREADS 2

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

/* Some global variables to poke, just for something to do. */
volatile int global_var_0 = 0;
volatile int global_var_1 = 0;

/* This flag is updated from GDB. */
volatile int raise_signal = 0;

/* Implement the breakpoint condition function. Release the other thread
and try to give the other thread a chance to run. Then return ANSWER. */
int
condition_core_func (int answer)
{
/* This unlock should release the other thread. */
if (pthread_mutex_unlock (&mutex) != 0)
abort ();

/* And this yield and sleep should (hopefully) give the other thread a
chance to run. This isn't guaranteed of course, but once the other
thread does run it should hit a breakpoint, which GDB should
(temporarily) ignore, so there's no easy way for us to know the other
thread has done what it needs to, thus, yielding and sleeping is the
best we can do. */
sched_yield ();
sleep (2);

return answer;
}

void
stop_marker ()
{
int a = 100; /* Final breakpoint here. */
}

/* A breakpoint condition function that always returns true. */
int
condition_true_func ()
{
return condition_core_func (1);
}

/* A breakpoint condition function that always returns false. */
int
condition_false_func ()
{
return condition_core_func (0);
}

void *
worker_func (void *arg)
{
volatile int *ptr = 0;
int tid = *((int *) arg);

switch (tid)
{
case 0:
global_var_0 = 11; /* First thread breakpoint. */
break;

case 1:
if (pthread_mutex_lock (&mutex) != 0)
abort ();
if (raise_signal)
global_var_1 = *ptr; /* Signal here. */
else
global_var_1 = 99; /* Other thread breakpoint. */
break;

default:
abort ();
}

return NULL;
}

int
main ()
{
pthread_t threads[NUM_THREADS];
int args[NUM_THREADS];

/* Set an alarm, just in case the test deadlocks. */
alarm (300);

/* We want the mutex to start locked. */
if (pthread_mutex_lock (&mutex) != 0)
abort ();

for (int i = 0; i < NUM_THREADS; i++)
{
args[i] = i;
pthread_create (&threads[i], NULL, worker_func, &args[i]);
}

for (int i = 0; i < NUM_THREADS; i++)
{
void *retval;
pthread_join (threads[i], &retval);
}

/* Unlock once we're done, just for cleanliness. */
if (pthread_mutex_unlock (&mutex) != 0)
abort ();

stop_marker ();

return 0;
}
Loading

0 comments on commit 3df7843

Please sign in to comment.