Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Build error: implicit declaration of function ‘CPU_MACH’ #2

Closed
cliffordwolf opened this issue Nov 4, 2015 · 5 comments
Closed

Comments

@cliffordwolf
Copy link

Trying to build this package I get the following error:

gcc -DHAVE_CONFIG_H   -DWITH_DEFAULT_MODEL='"RV64G"'  -DPROFILE=1 -DWITH_PROFILE=-1   -DWITH_ALIGNMENT=NONSTRICT_ALIGNMENT -DWITH_TARGET_WORD_BITSIZE=64 -DWITH_TARGET_BYTE_ORDER=LITTLE_ENDIAN -DWITH_ENVIRONMENT=ALL_ENVIRONMENT   -DWITH_HOST_BYTE_ORDER=LITTLE_ENDIAN        -Wall -Wdeclaration-after-statement -Wpointer-arith -Wpointer-sign -Wno-unused -Wunused-value -Wunused-function -Wno-switch -Wno-char-subscripts -Wmissing-prototypes -Wdeclaration-after-statement -Wempty-body -Wmissing-parameter-type -Wold-style-declaration -Wold-style-definition -Wformat-nonliteral       -I. -I../../../sim/riscv -I../common -I../../../sim/riscv/../common -I../../include -I../../../sim/riscv/../../include -I../../bfd -I../../../sim/riscv/../../bfd -I../../opcodes -I../../../sim/riscv/../../opcodes  -g -O2 -c -o sim-model.o -MT sim-model.o -MMD -MP -MF .deps/sim-model.Tpo ../../../sim/riscv/../common/sim-model.c
../../../sim/riscv/../common/sim-model.c: In function ‘model_set’:
../../../sim/riscv/../common/sim-model.c:108:3: warning: implicit declaration of function ‘CPU_MACH’ [-Wimplicit-function-declaration]
   CPU_MACH (cpu) = MODEL_MACH (model);
   ^
../../../sim/riscv/../common/sim-model.c:108:18: error: lvalue required as left operand of assignment
   CPU_MACH (cpu) = MODEL_MACH (model);
                  ^
../../../sim/riscv/../common/sim-model.c:109:3: warning: implicit declaration of function ‘CPU_MODEL’ [-Wimplicit-function-declaration]
   CPU_MODEL (cpu) = model;
   ^
../../../sim/riscv/../common/sim-model.c:109:19: error: lvalue required as left operand of assignment
   CPU_MODEL (cpu) = model;
                   ^
[....]

I'm trying to build using the following sequence of commands (same with --target=riscv32):

mkdir build
cd build/
../configure --program-prefix=riscv- --target=riscv
make

Also: How does this package relate to https://github.com/mythdraenor/riscv-gdb? The webpage (http://riscv.org/download.html#tab_gdb) points to that repo, but it has not seen an update in over a year..

@cliffordwolf cliffordwolf changed the title Build error: Build error: implicit declaration of function ‘CPU_MACH’ Nov 4, 2015
@palmer-dabbelt
Copy link
Contributor

I just built the latest master on Ubuntu 14.04 and I don't get any errors. sim/common/cgen-types.h makes it look like SIM_HAVE_MODEL is always defined, which sim/common/sim-cpu.h uses to define CPU_MACH and CPU_MODEL.

Can you try some standard stuff like doing a clean clone and build? If that doesn't work, can you give me some details of your setup so I can try and actually fix this?

I believe this is the correct repository to point to, not riscv-gdb.

@palmer-dabbelt
Copy link
Contributor

I'm trying to get a version of our stuff rebased on upstream's HEAD, and I'm seeing these errors. I'm not sure why you're getting them and I'm not, but with any luck I'll end up fixing this.

palmer-dabbelt pushed a commit to palmer-dabbelt/riscv-binutils-gdb that referenced this issue Nov 9, 2015
The main motivation for this is making non-stop / all-stop behave
similarly on initial connection, in order to move in the direction of
reimplementing all-stop mode with the remote target always running in
non-stop mode.

When we connect to a remote target in non-stop mode, we may find
threads either running or already stopped.  The act of connecting
itself does not force threads to stop.  To handle that, the remote
non-stop connection is currently roughly like this:

 riscvarchive#1 - Fetch list of remote threads (qXfer:threads:read, qfThreadInfo,
    etc).  All threads are assumed to be running until the target
    reports an asynchronous stop reply for them.

 riscvarchive#2 - Fetch the initial set of threads that were already stopped, with
    the '?'  packet.  (In non-stop, this is coupled with the vStopped
    mechanism to be able to retrieve the status of more than one
    thread.)

The stop replies fetched in riscvarchive#2 are placed in the pending stop reply
queue, and left for the regular event loop to process.  That is,
"target remote" finishes and returns _before_ those stops are
processed.

That means that it's possible to have GDB process further commands
before the initial set of stopped threads is reported to the user.

E.g., before the patch, note how the prompt is printed before the
frame:

 Remote debugging using :9999
 (gdb)
 [Thread 15296] riscvarchive#1 stopped.
 0x0000003615a011f0 in ?? ()

Even though thread riscvarchive#1 was not running, for a moment, the user can see
it as such:

 $ gdb a.out -ex "set non-stop 1" -ex "tar rem :9999"  -ex "info threads" -ex "info registers"
 Remote debugging using :9999
   Id   Target Id         Frame
 * 1    Thread 4772       (running)
 Target is executing.                 <<<<<<< info registers
 (gdb)
 [Thread 4772] riscvarchive#1 stopped.
 0x0000003615a011f0 in ?? ()

To fix that, this commit makes gdb process all threads found already
stopped at connection time, before giving the prompt to the user.

The fix takes a cue from fork-child.c:startup_inferior [1], and
processes the events locally in remote.c, avoiding the whole
wait_for_inferior/handle_inferior_event path.  I decided to try this
approach after noticing that:

 - several cases in handle_inferior_event miss checking stop_soon.
 - we don't want to fetch the thread list in normal_stop.

and trying to fix them was resulting in sprinkling stop_soon checks in
many places, and uglifying normal_stop even more.

While with this patch, I'm avoiding changing GDB's output other than
when the prompt is printed, I think this approach is more flexible if
we do want to change it.  And also, it's likely easier to get rid of
the MI *running event that is still sent for threads that are
initially found stopped, if we want to.

This happens to fix the testsuite too.  All non-stop tests are racy
against "target remote" / gdbserver testing currently.  That is,
sometimes the tests run, but other times they're just skipped without
any indication of PASS/FAIL.  When that happens, the logs show:

 target remote localhost:2346
 Remote debugging using localhost:2346
 (gdb)
 [Thread 25418] riscvarchive#1 stopped.
 0x0000003615a011f0 in ?? ()
 ^CQuit
 (gdb) Remote debugging from host 127.0.0.1
 Killing process(es): 25418
 monitor exit
 (gdb) Remote connection closed
 (gdb) testcase /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/multi-create-ns-info-thr.exp completed in 61 seconds

The trouble here is that there's output after the prompt, and the
regex in question doesn't expect that:

   -re "Remote debugging using .*$serialport_re.*$gdb_prompt $" {
	verbose "Set target to $targetname"
	return 0
    }

[1] - before startup_inferior was added, we'd go through
wait_for_inferior/handle_inferior_event while going through the shell,
and that turned out problematic.

Tested on x86_64 Fedora 20, gdbserver.

gdb/ChangeLog:
2015-08-20  Pedro Alves  <[email protected]>

	* infrun.c (print_target_wait_results): Make extern.
	* infrun.h (print_target_wait_results): Declare.
	* remote.c (set_stop_requested_callback): Delete.
	(process_initial_stop_replies): New function.
	(remote_start_remote): Use it.
	(stop_reply_queue_length): New function.

gdb/testsuite/ChangeLog:
2015-08-20  Pedro Alves  <[email protected]>

	* gdb.server/connect-stopped-target.c: New file.
	* gdb.server/connect-stopped-target.exp: New file.
palmer-dabbelt pushed a commit to palmer-dabbelt/riscv-binutils-gdb that referenced this issue Nov 9, 2015
…g-bp.exp

Running that test in a loop, I found a gdbserver core dump with the
following back trace:

 Core was generated by `../gdbserver/gdbserver --once --multi :2346'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236
 236       return inferior->regcache_data;
 (gdb) up
 riscvarchive#1  0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31
 31        regcache = (struct regcache *) inferior_regcache_data (thread);
 (gdb) bt
 #0  0x0000000000406ab6 in inferior_regcache_data (inferior=0x0) at src/gdb/gdbserver/inferiors.c:236
 riscvarchive#1  0x0000000000406d7f in get_thread_regcache (thread=0x0, fetch=1) at src/gdb/gdbserver/regcache.c:31
 riscvarchive#2  0x0000000000409271 in prepare_resume_reply (buf=0x20dd593 "", ptid=..., status=0x20edce0) at src/gdb/gdbserver/remote-utils.c:1147
 riscvarchive#3  0x000000000040ab0a in vstop_notif_reply (event=0x20edcc0, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/server.c:183
 riscvarchive#4  0x0000000000426b38 in notif_write_event (notif=0x66e6c0 <notif_stop>, own_buf=0x20dd590 "T05") at src/gdb/gdbserver/notif.c:69
 riscvarchive#5  0x0000000000426c55 in handle_notif_ack (own_buf=0x20dd590 "T05", packet_len=8) at src/gdb/gdbserver/notif.c:113
 riscvarchive#6  0x000000000041118f in handle_v_requests (own_buf=0x20dd590 "T05", packet_len=8, new_packet_len=0x7fff742c77b8)
     at src/gdb/gdbserver/server.c:2862
 riscvarchive#7  0x0000000000413850 in process_serial_event () at src/gdb/gdbserver/server.c:4148
 riscvarchive#8  0x0000000000413945 in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:4196
 riscvarchive#9  0x000000000041a1ef in handle_file_event (event_file_desc=5) at src/gdb/gdbserver/event-loop.c:429
 riscvarchive#10 0x00000000004199b6 in process_event () at src/gdb/gdbserver/event-loop.c:184
 riscvarchive#11 0x000000000041a735 in start_event_loop () at src/gdb/gdbserver/event-loop.c:547
 riscvarchive#12 0x00000000004123d2 in captured_main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3562
 riscvarchive#13 0x000000000041252e in main (argc=4, argv=0x7fff742c7ac8) at src/gdb/gdbserver/server.c:3631

Clearly this means that a thread pushed a stop reply in the event
queue, and then before GDB confused the event, the whole process died,
along with its thread.  But the pending thread event was left
dangling.  When GDB fetched that event, gdbserver looked up the
corresponding thread, but found NULL; not expecting this, gdbserver
crashes when it tries to read this thread's registers.

gdb/gdbserver/
2015-08-21  Pedro Alves  <[email protected]>

	PR gdb/18749
	* inferiors.c (remove_thread): Discard any pending stop reply for
	this thread.
	* server.c (remove_all_on_match_pid): Rename to ...
	(remove_all_on_match_ptid): ... this.  Work with a filter ptid
	instead of a pid.
	(discard_queued_stop_replies): Change parameter to a ptid.  Now
	extern.
	(handle_v_kill, kill_inferior_callback)
	(process_serial_event): Adjust.
	(captured_main): Call initialize_notif before starting the
	program, thus before threads are created.
	* server.h (discard_queued_stop_replies): Declare.
palmer-dabbelt pushed a commit to palmer-dabbelt/riscv-binutils-gdb that referenced this issue Nov 9, 2015
Assuming displaced stepping is enabled, and a breakpoint is set in the
memory region of the scratch pad, things break.  One of two cases can
happen:

riscvarchive#1 - The breakpoint wasn't inserted yet (all threads were stopped), so
     after setting up the displaced stepping scratch pad with the
     adjusted copy of the instruction we're trying to single-step, we
     insert the breakpoint, which corrupts the scratch pad, and the
     inferior executes the wrong instruction.  (Example below.)
     This is clearly unacceptable.

riscvarchive#2 - The breakpoint was already inserted, so setting up the displaced
     stepping scratch pad overwrites the breakpoint.  This is OK in
     the sense that we already assume that no thread is going to
     executes the code in the scratch pad range (after initial
     startup) anyway.

This commit addresses both cases by simply punting on displaced
stepping if we have a breakpoint in the scratch pad range.

The riscvarchive#1 case above explains a few regressions exposed by the AS/NS
series on x86:

 Running ./gdb.dwarf2/callframecfa.exp ...
 FAIL: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa
 FAIL: gdb.dwarf2/callframecfa.exp: step 1 for call-frame-cfa
 FAIL: gdb.dwarf2/callframecfa.exp: step 2 for call-frame-cfa
 FAIL: gdb.dwarf2/callframecfa.exp: step 3 for call-frame-cfa
 FAIL: gdb.dwarf2/callframecfa.exp: step 4 for call-frame-cfa
 Running ./gdb.dwarf2/typeddwarf.exp ...
 FAIL: gdb.dwarf2/typeddwarf.exp: continue to breakpoint: continue to typeddwarf.c:53
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of x at typeddwarf.c:53
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of y at typeddwarf.c:53
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of z at typeddwarf.c:53
 FAIL: gdb.dwarf2/typeddwarf.exp: continue to breakpoint: continue to typeddwarf.c:73
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of w at typeddwarf.c:73
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of x at typeddwarf.c:73
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of y at typeddwarf.c:73
 FAIL: gdb.dwarf2/typeddwarf.exp: check value of z at typeddwarf.c:73

Enabling "maint set target-non-stop on" implies displaced stepping
enabled as well, and it's the latter that's to blame here.  We can see
the same failures with "maint set target-non-stop off + set displaced
on".

Diffing (good/bad) gdb.log for callframecfa.exp shows:

 @@ -99,29 +99,29 @@ Breakpoint 2 at 0x80481b0: file q.c, lin
  continue
  Continuing.

 -Breakpoint 2, func (arg=77) at q.c:2
 +Breakpoint 2, func (arg=52301) at q.c:2
  2      in q.c
  (gdb) PASS: gdb.dwarf2/callframecfa.exp: continue to breakpoint: continue to breakpoint for call-frame-cfa
  display arg
 -1: arg = 77
 -(gdb) PASS: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa
 +1: arg = 52301
 +(gdb) FAIL: gdb.dwarf2/callframecfa.exp: set display for call-frame-cfa

The problem is here, when setting up the func call:

 Breakpoint 1, main (argc=-13345, argv=0x0) at q.c:7
 7       in q.c

 (gdb) disassemble
 Dump of assembler code for function main:
    0x080481bb <+0>:     push   %ebp
    0x080481bc <+1>:     mov    %esp,%ebp
    0x080481be <+3>:     sub    $0x4,%esp
 => 0x080481c1 <+6>:     movl   $0x4d,(%esp)
    0x080481c8 <+13>:    call   0x80481b0 <func>
    0x080481cd <+18>:    leave
    0x080481ce <+19>:    ret
 End of assembler dump.
 (gdb) disassemble /r
 Dump of assembler code for function main:
    0x080481bb <+0>:     55      push   %ebp
    0x080481bc <+1>:     89 e5   mov    %esp,%ebp
    0x080481be <+3>:     83 ec 04        sub    $0x4,%esp
 => 0x080481c1 <+6>:     c7 04 24 4d 00 00 00    movl   $0x4d,(%esp)
    0x080481c8 <+13>:    e8 e3 ff ff ff  call   0x80481b0 <func>
    0x080481cd <+18>:    c9      leave
    0x080481ce <+19>:    c3      ret
 End of assembler dump.

Note the breakpoint at main is set at 0x080481c1.  Right at the
instruction that sets up func's argument.  Executing that instruction
should write 0x4d to the address pointed at by $esp.  However, if we
stepi, the program manages to write 52301/0xcc4d there instead (0xcc
is int3, the x86 breakpoint instruction), because the breakpoint
address is 4 bytes inside the scratch pad location, which is
0x080481bd:

 (gdb) p 0x080481c1 - 0x080481bd
 $1 = 4

IOW, instead of executing:

  "c7 04 24 4d 00 00 00" [ movl $0x4d,(%esp) ]

the inferior executes:

  "c7 04 24 4d cc 00 00" [ movl $0xcc4d,(%esp) ]

gdb/ChangeLog:
2015-10-30  Pedro Alves  <[email protected]>

	* breakpoint.c (breakpoint_in_range_p)
	(breakpoint_location_address_range_overlap): New functions.
	* breakpoint.h (breakpoint_in_range_p): New declaration.
	* infrun.c (displaced_step_prepare_throw): If there's a breakpoint
	in the scratch pad range, don't displaced step.
@palmer-dabbelt
Copy link
Contributor

I don't know what github is doing with the issues, but how does this look

commit 3d29d2a30eb2efd30f8d01a0f4811a3438ebffee
Author: Palmer Dabbelt <[email protected]>
Date:   Sun Nov 8 23:49:42 2015 -0800

    Fix RISC-V sim for new binutils/gdb versions

    I'm not sure when or why this broke, or why adding tconfig.h fixes it, but it
    appears to.  Note that if I add tconfig.h and don't do a clean build then I get
    some linking errors.

    Pretty much all of this came from the bfin sim port, as that looks most similar
    to ours.

diff --git a/sim/riscv/machs.c b/sim/riscv/machs.c
index ef08319..c968109 100644
--- a/sim/riscv/machs.c
+++ b/sim/riscv/machs.c
@@ -21,6 +21,10 @@
 #include "config.h"

 #include "sim-main.h"
+#include "gdb/sim-riscv.h"
+#include "bfd.h"
+
+#include "sim-hw.h"

 static void
 riscv_model_init (SIM_CPU *cpu)
diff --git a/sim/riscv/tconfig.h b/sim/riscv/tconfig.h
new file mode 100644
index 0000000..b109533
--- /dev/null
+++ b/sim/riscv/tconfig.h
@@ -0,0 +1,2 @@
+/* This was copied from bfin, and is listed as a hack there.  */
+#define SIM_HAVE_MODEL

@cliffordwolf
Copy link
Author

I've checked out riscv-rebase-2015.10.08 (3d29d2a) from your git repo and I can successfully build gdb from it using the following commands:

mkdir build && cd build
../configure --target=riscv32-unknown-elf --prefix=/opt/riscv32i
make -j4 && make install

Thanks!

Maybe you can also point me in the right direction or refer me to the right person to ask regarding the following issue:

I want to write a gdbserver for PicoRV32 (going to implement a hw debug interface for it). According to the gdb docs only the following packets need to be implemented for a minimal gdb server: g, G, m, M, c, s. However, when the gdb I just built connects to my dummy gdb server I just wrote I see the following requests sent to by gdb:

$qSupported:multiprocess+;swbreak+;hwbreak+;qRelocInsn+;vContSupported+#ff
$Hg0#df
$qTStatus#49
$?#3f
$qfThreadInfo#bb
$qL1200000000000000000#50
$Hc-1#09
$qC#b4
$qAttached#8f

(Note that none of them is in g, G, m, M, c, s.)

I respond to all of them by sending +, followed by $#00, as this is the respond that I should send for an unsupported command, according to the docs as I understand them. After that sequence of pkts sent to my gdb server, gdb just hangs. This is the output I see:

(gdb) target remote :1234
Remote debugging using :1234
warning: Invalid remote reply: 

Is there an existing simple gdb stub (preferably for RISC-V) that I could use as a reference? Any other useful hints that might help me?

vapier pushed a commit that referenced this issue Nov 18, 2015
Hi,
I build GDB with -fsanitize=address, and run testsuite.  In
gdb.base/callfuncs.exp, I see the following error,

p/c fun1()
=================================================================^M
==9601==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffee858530 at pc 0x6df079 bp 0x7fffee8583a0 sp 0x7fffee858398
WRITE of size 16 at 0x7fffee858530 thread T0
    #0 0x6df078 in regcache_raw_read /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:673
    #1 0x6dfe1e in regcache_cooked_read /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:751
    #2 0x4696a3 in aarch64_extract_return_value /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1708
    #3 0x46ae57 in aarch64_return_value /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1918

We are extracting return value from V registers (128 bit), but only
allocate X_REGISTER_SIZE-byte array, which isn't sufficient.  This
patch changes the array to V_REGISTER_SIZE.

gdb:

2015-11-16  Yao Qi  <[email protected]>

	* aarch64-tdep.c (aarch64_extract_return_value):  Change array
	buf's length to V_REGISTER_SIZE.
@vapier
Copy link
Collaborator

vapier commented Nov 24, 2015

you probably want to use the sw-dev mailing list for questions like this

@vapier vapier closed this as completed Nov 24, 2015
@vapier vapier added the bug label Nov 24, 2015
palmer-dabbelt pushed a commit that referenced this issue Nov 30, 2015
Hi,
I build GDB with -fsanitize=address, and run testsuite.  In
gdb.base/callfuncs.exp, I see the following error,

p t_float_values(0.0,0.0)
=================================================================
==8088==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000cb650 at pc 0x6e195c bp 0x7fff164f9770 sp 0x7fff164f9768
READ of size 16 at 0x6020000cb650 thread T0^
    #0 0x6e195b in regcache_raw_write /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:912
    #1 0x6e1e52 in regcache_cooked_write /home/yao/SourceCode/gnu/gdb/git/gdb/regcache.c:945
    #2 0x466d69 in pass_in_v /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1101
    #3 0x467512 in pass_in_v_or_stack /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1196
    #4 0x467d7d in aarch64_push_dummy_call /home/yao/SourceCode/gnu/gdb/git/gdb/aarch64-tdep.c:1335

The code in pass_in_v read contents from V registers (128 bit), but the
data passed through V registers can be less than 128 bit.  In this case,
float is passed.  So writing V registers contents into contents buff
will cause overflow.  In this patch, we add an array reg[V_REGISTER_SIZE],
which is to hold the contents from V registers, and then copy useful
bits to buf.

gdb:

2015-11-18  Yao Qi  <[email protected]>

	* aarch64-tdep.c (pass_in_v): Add argument len.  Add local array
	reg.  Callers updated.
palmer-dabbelt pushed a commit that referenced this issue Nov 30, 2015
The following issue has been observed on arm-android, trying to step
over the following line of code:

        Put_Line (">>> " & Integer'Image (Message (I)));

Below is a copy of the GDB transcript:

    (gdb) cont
    Breakpoint 1, q.dump (message=...) at q.adb:11
    11               Put_Line (">>> " & Integer'Image (Message (I)));
    (gdb) next
    0x00016000 in system.concat_2.str_concat_2 ()

The expected behavior for the "next" command is to step over
the call to Put_Line and stop at line 12:

    (gdb) next
    12               I := I + 1;

What happens during the next step is that the code for line 11
above make a call to system.concat_2.str_concat_2 (to implement
the '&' string concatenation operator) before making the call
to Put_Line. While stepping, GDB stops eventually stops at the
first instruction of that function, and fails to detect that
it's a function call from where we were before, and so decides
to stop stepping.

And the reason why it fails to detect that we landed inside a function
call is because it fails to unwind from that function:

    (gdb) bt
    #0  0x00016000 in system.concat_2.str_concat_2 ()
    #1  0x0001bc74 in ?? ()

Debugging GDB, I found that GDB decides to use the ARM unwind info
for that function, which contains the following data:

    0x16000 <system__concat_2__str_concat_2>: 0x80acb0b0
      Compact model index: 0
      0xac      pop {r4, r5, r6, r7, r8, r14}
      0xb0      finish
      0xb0      finish

But, in fact, using that data is wrong, in this case, because
it mentions a pop of 6 registers, and therefore hints at a frame
size of 24 bytes. The problem is that, because we're at the first
instruction of the function, the 6 registers haven't been pushed
to the stack yet. In other words, using the ARM unwind entry above,
GDB is tricked into thinking that the frame size is 24 bytes, and
that the return address (r14) is available on the stack.

One visible manifestation of this issue can been seen by looking
at the value of the stack pointer, and the frame's base address:

    (gdb) p /x $sp
    $2 = 0xbee427b0
    (gdb) info frame
    Stack level 0, frame at 0xbee427c8:
                            ^^^^^^^^^^
                            ||||||||||

The frame's base address should be equal to the value of the stack
pointer at entry. And you eventually get the correct frame address,
as well as the correct backtrace if you just single-step one additional
instruction, past the push:

    (gdb) x /i $pc
    => 0x16000 <system__concat_2__str_concat_2>:
        push        {r4, r5, r6, r7, r8, lr}
    (gdb) stepi
    (gdb) bt
    #0  0x00016004 in system.concat_2.str_concat_2 ()
    #1  0x00012b6c in q.dump (message=...) at q.adb:11
    #2  0x00012c3c in q () at q.adb:19

Digging further, I found that GDB tries to use the ARM unwind info
only when sure that it is relevant, as explained in the following
comment:

  /* The ARM exception table does not describe unwind information
     for arbitrary PC values, but is guaranteed to be correct only
     at call sites.  We have to decide here whether we want to use
     ARM exception table information for this frame, or fall back [...]

There is one case where it decides that the info is relevant,
described in the following comment:

      /* We also assume exception information is valid if we're currently
         blocked in a system call.  The system library is supposed to
         ensure this, so that e.g. pthread cancellation works.

For that, it just parses the instruction at the address it believes
to be the point of call, and matches it against an "svc" instruction.
For instance, for a non-thumb instruction, it is at...

    get_frame_pc (this_frame) - 4

... and the code checking looks like the following.

              if (safe_read_memory_integer (get_frame_pc (this_frame) - 4, 4,
                                            byte_order_for_code, &insn)
                  && (insn & 0x0f000000) == 0x0f000000 /* svc */)
                exc_valid = 1;

However, the reason why this doesn't work in our case is that
because we are at the first instruction of a function in the innermost
frame. That frame can't possibly be making a call, and therefore
be stuck on a system call.

What the code above ends up doing is checking the instruction
just before the start of our function, which in our case is not
even an actual instruction, but unlucky for us, happens to match
the pattern it is looking for, thus leading GDB to improperly
trust the ARM unwinding data.

gdb/ChangeLog:

        * arm-tdep.c (arm_exidx_unwind_sniffer): Do not check for a frame
        stuck on a system call if the given frame is the innermost frame.
palmer-dabbelt pushed a commit that referenced this issue Nov 30, 2015
This patch fixes the GDB internal error on AArch64 when running
watchpoint-fork.exp

 top?bt 15
 internal_error (file=file@entry=0x79d558 "../../binutils-gdb/gdb/linux-nat.c", line=line@entry=4866, fmt=0x793b20 "%s: Assertion `%s' failed.")
    at ../../binutils-gdb/gdb/common/errors.c:51
 #1  0x0000000000495bc4 in linux_nat_thread_address_space (t=<optimized out>, ptid=<error reading variable: Cannot access memory at address 0x1302>)
    at ../../binutils-gdb/gdb/linux-nat.c:4866
 #2  0x00000000005db2c8 in delegate_thread_address_space (self=<optimized out>, arg1=<error reading variable: Cannot access memory at address 0x1302>)
    at ../../binutils-gdb/gdb/target-delegates.c:2447
 #3  0x00000000005e8c7c in target_thread_address_space (ptid=<error reading variable: Cannot access memory at address 0x1302>)
    at ../../binutils-gdb/gdb/target.c:2727
 #4  0x000000000054eef8 in get_thread_arch_regcache (ptid=..., gdbarch=0xad51e0) at ../../binutils-gdb/gdb/regcache.c:529
 #5  0x000000000054efcc in get_thread_regcache (ptid=...) at ../../binutils-gdb/gdb/regcache.c:546
 #6  0x000000000054f120 in get_thread_regcache_for_ptid (ptid=...) at ../../binutils-gdb/gdb/regcache.c:560
 #7  0x00000000004a2278 in aarch64_point_is_aligned (is_watchpoint=0, addr=34168, len=2) at ../../binutils-gdb/gdb/nat/aarch64-linux-hw-point.c:122
 #8  0x00000000004a2e68 in aarch64_handle_breakpoint (type=hw_execute, addr=34168, len=2, is_insert=0, state=0xae8880)
    at ../../binutils-gdb/gdb/nat/aarch64-linux-hw-point.c:465
 #9  0x000000000048edf0 in aarch64_linux_remove_hw_breakpoint (self=<optimized out>, gdbarch=<optimized out>, bp_tgt=<optimized out>)
    at ../../binutils-gdb/gdb/aarch64-linux-nat.c:657
 #10 0x00000000005da8dc in delegate_remove_hw_breakpoint (self=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
    at ../../binutils-gdb/gdb/target-delegates.c:492
 #11 0x0000000000536a24 in bkpt_remove_location (bl=<optimized out>) at ../../binutils-gdb/gdb/breakpoint.c:13065
 #12 0x000000000053351c in remove_breakpoint_1 (bl=0xb3fe70, is=is@entry=mark_inserted) at ../../binutils-gdb/gdb/breakpoint.c:4026
 #13 0x000000000053ccc0 in detach_breakpoints (ptid=...) at ../../binutils-gdb/gdb/breakpoint.c:3930
 #14 0x00000000005a3ac0 in handle_inferior_event_1 (ecs=0x7ffffff048) at ../../binutils-gdb/gdb/infrun.c:5042

After the fork, GDB will physically remove the breakpoints from the child
process (in frame #14), but at that time, GDB doesn't create an inferior
yet for child, but inferior_ptid is set to child's ptid (in frame #13).
In aarch64_point_is_aligned, we'll get the regcache of current_lwp_ptid
to determine if the current process is 32-bit or 64-bit, so the inferior
can't be found, and the internal error is caused.

I don't find a better fix other than not checking alignment on removing
breakpoint.

gdb:

2015-11-27  Yao Qi  <[email protected]>

	* nat/aarch64-linux-hw-point.c (aarch64_dr_state_remove_one_point):
	Don't assert on alignment.
	(aarch64_handle_breakpoint): Only check alignment when IS_INSERT
	is true.
palmer-dabbelt pushed a commit that referenced this issue Feb 12, 2016
Hi,
AddressSanitizer reports an error like this,

(gdb) PASS: gdb.base/call-ar-st.exp: continue to tbreak9
print print_long_arg_list(a, b, c, d, e, f, *struct1, *struct2, *struct3, *struct4, *flags, *flags_combo, *three_char, *five_char, *int_char_combo, *d1, *d2, *d3, *f1, *f2, *f3)
=================================================================
==6236==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200008eb50 at pc 0x89e432 bp 0x7fffa3df9080 sp 0x7fffa3df9078
READ of size 5 at 0x60200008eb50 thread T0
    #0 0x89e431 in memory_xfer_partial gdb/target.c:1264
    #1 0x89e6c7 in target_xfer_partial gdb/target.c:1320
    #2 0x89f267 in target_write_partial gdb/target.c:1595^M
    #3 0x8a014b in target_write_with_progress gdb/target.c:1889^M
    #4 0x8a0262 in target_write gdb/target.c:1914^M
    #5 0x89ee59 in target_write_memory gdb/target.c:1492^M
    #6 0x9a1c74 in write_memory gdb/corefile.c:393^M
    #7 0x467ea5 in aarch64_push_dummy_call gdb/aarch64-tdep.c:1388

The problem is that an instance of stack_item_t is created to adjust
stack for alignment, the item.len is correct, but item.data is buf,
which is wrong, because item.len can be greater than the length of
buf.  This patch sets item.data to NULL, and only update sp (no
inferior memory writes on stack for this item).

gdb:

2015-12-17  Yao Qi  <[email protected]>

	* aarch64-tdep.c (struct stack_item_t): Update comments.
	(pass_on_stack): Set item.data to NULL.
	(aarch64_push_dummy_call): Call write_memory if si->data
	isn't NULL.
palmer-dabbelt pushed a commit that referenced this issue Feb 12, 2016
…al is ours

I see a timeout in gdb.base/random-signal.exp,

 Continuing.^M
 PASS: gdb.base/random-signal.exp: continue
 ^CPython Exception <type 'exceptions.KeyboardInterrupt'> <type
 exceptions.KeyboardInterrupt'>: ^M
 FAIL: gdb.base/random-signal.exp: stop with control-c (timeout)

it can be reproduced by running random-signal.exp with native-gdbserver
in a loop, like this, and the fail will be shown in about 20 runs,

$ (set -e; while true; do make check RUNTESTFLAGS="--target_board=native-gdbserver random-signal.exp"; done)

In the test, the program is being single-stepped for software watchpoint,
and in each internal stop, python unwinder sniffer is used,

 #0  pyuw_sniffer (self=<optimised out>, this_frame=<optimised out>, cache_ptr=0xd554f8) at /home/yao/SourceCode/gnu/gdb/git/gdb/python/py-unwind.c:608
 #1  0x00000000006a10ae in frame_unwind_try_unwinder (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8, unwinder=0xecd540)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:107
 #2  0x00000000006a143f in frame_unwind_find_by_frame (this_frame=this_frame@entry=0xd554e0, this_cache=this_cache@entry=0xd554f8)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/frame-unwind.c:163
 #3  0x000000000069dc6b in compute_frame_id (fi=0xd554e0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:454
 #4  get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1781
 #5  0x000000000069fdb9 in get_prev_frame_always_1 (this_frame=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1955
 #6  get_prev_frame_always (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1971
 #7  0x00000000006a04b1 in get_prev_frame (this_frame=this_frame@entry=0xd55410) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2213

when GDB goes to python extension, or other language extension, the
SIGINT handler is changed, and is restored when GDB leaves extension
language.  GDB only stays in extension language for a very short period
in this case, but if ctrl-c is pressed at that moment, python extension
will handle the SIGINT, and exceptions.KeyboardInterrupt is shown.

Language extension is used in GDB side rather than inferior side,
so GDB should only change SIGINT handler for extension language when
the terminal is ours (not inferior's).  This is what this patch does.
With this patch applied, I run random-signal.exp in a loop for 18
hours, and no fail is shown.

gdb:

2016-01-08  Yao Qi  <[email protected]>

	* extension.c: Include target.h.
	(set_active_ext_lang): Only call install_gdb_sigint_handler,
	check_quit_flag, and set_quit_flag if target_terminal_is_ours
	returns false.
	(restore_active_ext_lang): Likewise.
	* target.c (target_terminal_is_ours): New function.
	* target.h (target_terminal_is_ours): Declare.
palmer-dabbelt pushed a commit that referenced this issue Feb 12, 2016
The PR threads/19422 patchset added a new regression.

Additionally below it there was already a regression if --with-guile (which is
default if Guile is found) was used.

racy case #1:

(xgdb) PASS: gdb.gdb/selftest.exp: Set xgdb_prompt
^M
Thread 1 "xgdb" received signal SIGINT, Interrupt.^M
0x00007ffff583bfdd in poll () from /lib64/libc.so.6^M
(gdb) FAIL: gdb.gdb/selftest.exp: send ^C to child process
signal SIGINT^M
Continuing with signal SIGINT.^M
^C^M
Thread 1 "xgdb" received signal SIGINT, Interrupt.^M
0x00007ffff5779da0 in sigprocmask () from /lib64/libc.so.6^M
(gdb) PASS: gdb.gdb/selftest.exp: send SIGINT signal to child process
backtrace^M
errstring=errstring@entry=0x7e0e6c "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240^M
errstring=errstring@entry=0x7e0e6c "", mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240^M
(gdb) PASS: gdb.gdb/selftest.exp: backtrace through signal handler

racy case #2:

(xgdb) PASS: gdb.gdb/selftest.exp: Set xgdb_prompt
^M
Thread 1 "xgdb" received signal SIGINT, Interrupt.^M
0x00007ffff583bfdd in poll () from /lib64/libc.so.6^M
(gdb) FAIL: gdb.gdb/selftest.exp: send ^C to child process
signal SIGINT^M
Continuing with signal SIGINT.^M
^C^M
Thread 2 "xgdb" received signal SIGINT, Interrupt.^M
[Switching to Thread 0x7ffff3b7f700 (LWP 13227)]^M
0x00007ffff6b88b10 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0^M
(gdb) PASS: gdb.gdb/selftest.exp: send SIGINT signal to child process
backtrace^M
(gdb) FAIL: gdb.gdb/selftest.exp: backtrace through signal handler

Pedro Alves:
Not all targets support thread names, and even those that do, not all
use the program name as default thread name -- I think that's only true
for GNU/Linux, actually.  So I think it's best to not expect that, like:
            -re "(Thread .*|Program) received signal SIGINT.*$gdb_prompt $" {

gdb/testsuite/ChangeLog
2016-01-22  Jan Kratochvil  <[email protected]>

	Fix testsuite compatibility with Guile.
	* gdb.gdb/selftest.exp (send ^C to child process): Accept also Thread.
	(thread 1): New test for backtrace through signal handler.
palmer-dabbelt pushed a commit that referenced this issue Feb 12, 2016
I see GDB crashes in dprintf.exp on aarch64-linux testing,

(gdb) PASS: gdb.base/dprintf.exp: agent: break 29
set dprintf-style agent^M
(gdb) PASS: gdb.base/dprintf.exp: agent: set dprintf style to agent
continue^M
Continuing.
ASAN:SIGSEGV
=================================================================
==22475==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x000000494820 sp 0x7fff389b83a0 bp 0x62d000082417 T0)
    #0 0x49481f in remote_add_target_side_commands /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9190^M
    #1 0x49e576 in remote_add_target_side_commands /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9174^M
    #2 0x49e576 in remote_insert_breakpoint /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:9240^M
    #3 0x5278b7 in insert_bp_location /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:2734^M
    #4 0x52ac09 in insert_breakpoint_locations /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:3159^M
    #5 0x52ac09 in update_global_location_list /home/yao/SourceCode/gnu/gdb/git/gdb/breakpoint.c:12686

the root cause of this problem in this case is about linespec and
symtab which produces additional incorrect location and a NULL is added to
bp_tgt->tcommands.  I posted a patch
https://sourceware.org/ml/gdb-patches/2015-12/msg00321.html to fix it
in linespec (the fix causes regression), but GDB still shouldn't add
NULL into bp_tgt->tcommands.  The logic of build_target_command_list
looks odd to me.  If we get something wrong in parse_cmd_to_aexpr (it
returns NULL), we shouldn't continue, instead we should set flag
null_command_or_parse_error.  This is what this patch does.  In the
meantime, we find build_target_condition_list has the same problem, so
fix it too.

gdb:

2016-01-28  Yao Qi  <[email protected]>

	* breakpoint.c (build_target_command_list): Don't call continue
	if aexpr is NULL.
	(build_target_condition_list): Likewise.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
I see the following error in testing aarch64 GDB debugging arm
program.

(gdb) PASS: gdb.reverse/readv-reverse.exp: set breakpoint at marker2
continue
Continuing.
=================================================================
==32273==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x000000ce4c00 in thread T0
    #0 0x2ba5615645c7 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x545c7)^M
    riscvarchive#1 0x4be8b5 in VEC_CORE_ADDR_cleanup /home/yao/SourceCode/gnu/gdb/git/gdb/common/gdb_vecs.h:34^M
    riscvarchive#2 0x5e6d95 in do_my_cleanups /home/yao/SourceCode/gnu/gdb/git/gdb/common/cleanups.c:154^M
    riscvarchive#3 0x64c99a in fetch_inferior_event /home/yao/SourceCode/gnu/gdb/git/gdb/infrun.c:3975^M
    riscvarchive#4 0x678437 in inferior_event_handler /home/yao/SourceCode/gnu/gdb/git/gdb/inf-loop.c:44^M
    riscvarchive#5 0x5078f6 in remote_async_serial_handler /home/yao/SourceCode/gnu/gdb/git/gdb/remote.c:13223^M
    riscvarchive#6 0x4cecfd in run_async_handler_and_reschedule /home/yao/SourceCode/gnu/gdb/git/gdb/ser-base.c:137^M
    riscvarchive#7 0x676864 in gdb_wait_for_event /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:834^M
    riscvarchive#8 0x676a27 in gdb_do_one_event /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:323^M
    riscvarchive#9 0x676aed in start_event_loop /home/yao/SourceCode/gnu/gdb/git/gdb/event-loop.c:347^M
    riscvarchive#10 0x6706d2 in captured_command_loop /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:318^M
    riscvarchive#11 0x66db8c in catch_errors /home/yao/SourceCode/gnu/gdb/git/gdb/exceptions.c:240^M
    riscvarchive#12 0x6716dd in captured_main /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:1157^M
    riscvarchive#13 0x66db8c in catch_errors /home/yao/SourceCode/gnu/gdb/git/gdb/exceptions.c:240^M
    riscvarchive#14 0x671b7a in gdb_main /home/yao/SourceCode/gnu/gdb/git/gdb/main.c:1165^M
    riscvarchive#15 0x467684 in main /home/yao/SourceCode/gnu/gdb/git/gdb/gdb.c:32^M
    riscvarchive#16 0x2ba563ed7ec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)^M
    riscvarchive#17 0x4676b2 (/scratch/yao/gdb/build-git/aarch64-linux-gnu/gdb/gdb+0x4676b2)

looks we should discard cleanup if function
arm_linux_software_single_step returns early, or create cleanup when
it is needed.

gdb:

2016-02-16  Yao Qi  <[email protected]>

	* arm-linux-tdep.c (arm_linux_software_single_step): Assign
	'old_chain' later.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
I see the following GDBserver internal error in two cases,

 gdb/gdbserver/linux-low.c:1922: A problem internal to GDBserver has been detected.
 unsuspend LWP 17200, suspended=-1

 1. step over a breakpoint on fork/vfork syscall instruction,
 2. step over a breakpoint on clone syscall instruction and child
    threads hits a breakpoint,

the stack backtrace is

 #0  internal_error (file=file@entry=0x44c4c0 "gdb/gdbserver/linux-low.c", line=line@entry=1922,
    fmt=fmt@entry=0x44c7d0 "unsuspend LWP %ld, suspended=%d\n") at gdb/gdbserver/../common/errors.c:51
 riscvarchive#1  0x0000000000424014 in lwp_suspended_decr (lwp=<optimised out>, lwp=<optimised out>) at gdb/gdbserver/linux-low.c:1922
 riscvarchive#2  0x000000000042403a in unsuspend_one_lwp (entry=<optimised out>, except=0x66e8c0) at gdb/gdbserver/linux-low.c:2885
 riscvarchive#3  0x0000000000405f45 in find_inferior (list=<optimised out>, func=func@entry=0x424020 <unsuspend_one_lwp>, arg=arg@entry=0x66e8c0)
    at gdb/gdbserver/inferiors.c:243
 riscvarchive#4  0x00000000004297de in unsuspend_all_lwps (except=0x66e8c0) at gdb/gdbserver/linux-low.c:2895
 riscvarchive#5  linux_wait_1 (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, target_options=target_options@entry=0)
    at gdb/gdbserver/linux-low.c:3632
 riscvarchive#6  0x000000000042a764 in linux_wait (ptid=..., ourstatus=0x665ec0 <last_status>, target_options=0)
    at gdb/gdbserver/linux-low.c:3770
 riscvarchive#7  0x0000000000411163 in mywait (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, options=options@entry=0, connected_wait=connected_wait@entry=1)
    at gdb/gdbserver/target.c:214
 riscvarchive#8  0x000000000040b1f2 in resume (actions=0x66f800, num_actions=1) at gdb/gdbserver/server.c:2757
 riscvarchive#9  0x000000000040f660 in handle_v_cont (own_buf=0x66a630 "vCont;c:p45e9.-1") at gdb/gdbserver/server.c:2719

when GDBserver steps over a thread, other threads have been suspended,
the "stepping" thread may create new thread, but GDBserver doesn't set
it suspend count to 1.  When GDBserver unsuspend threads, the child's
suspend count goes to -1, and the assert is triggered.  In fact, GDBserver
has already taken care of suspend count of new thread when GDBserver is
suspending all threads except the one GDBserver wants to step over by
https://sourceware.org/ml/gdb-patches/2015-07/msg00946.html

+	  /* If we're suspending all threads, leave this one suspended
+	     too.  */
+	  if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS)
+	    {
+	      if (debug_threads)
+		debug_printf ("HEW: leaving child suspended\n");
+	      child_lwp->suspended = 1;
+	    }

but that is not enough, because new thread is still can be spawned in
the thread which is being stepped over.  This patch extends the
condition that GDBserver set child's suspend count to one if it is
suspending threads or stepping over the thread.

gdb/gdbserver:

2016-03-03  Yao Qi  <[email protected]>

	PR server/19736
	* linux-low.c (handle_extended_wait): Set child suspended
	if event_lwp->bp_reinsert isn't zero.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
Fix this GDB crash:

  $ gdb -ex "set architecture mips:10000"
  Segmentation fault (core dumped)

Backtrace:

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/mips-tdep.c:8436
  8436              if (bfd_get_flavour (info.abfd) == bfd_target_elf_flavour
  (top-gdb) bt
  #0  0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at .../src/gdb/mips-tdep.c:8436
  riscvarchive#1  0x00000000007348a6 in gdbarch_find_by_info (info=...) at .../src/gdb/gdbarch.c:5155
  riscvarchive#2  0x000000000073563c in gdbarch_update_p (info=...) at .../src/gdb/arch-utils.c:522
  riscvarchive#3  0x0000000000735585 in set_architecture (ignore_args=0x0, from_tty=1, c=0x26bc870) at .../src/gdb/arch-utils.c:496
  riscvarchive#4  0x00000000005f29fd in do_sfunc (c=0x26bc870, args=0x0, from_tty=1) at .../src/gdb/cli/cli-decode.c:121
  riscvarchive#5  0x00000000005fd3f3 in do_set_command (arg=0x7fffffffdcdd "mips:10000", from_tty=1, c=0x26bc870) at .../src/gdb/cli/cli-setshow.c:455
  riscvarchive#6  0x0000000000836157 in execute_command (p=0x7fffffffdcdd "mips:10000", from_tty=1) at .../src/gdb/top.c:460
  riscvarchive#7  0x000000000071abfb in catch_command_errors (command=0x835f6b <execute_command>, arg=0x7fffffffdccc "set architecture mips:10000", from_tty=1)
      at .../src/gdb/main.c:368
  riscvarchive#8  0x000000000071bf4f in captured_main (data=0x7fffffffd750) at .../src/gdb/main.c:1132
  riscvarchive#9  0x0000000000716737 in catch_errors (func=0x71af44 <captured_main>, func_args=0x7fffffffd750, errstring=0x106b9a1 "", mask=RETURN_MASK_ALL)
      at .../src/gdb/exceptions.c:240
  riscvarchive#10 0x000000000071bfe6 in gdb_main (args=0x7fffffffd750) at .../src/gdb/main.c:1164
  riscvarchive#11 0x000000000040a6ad in main (argc=4, argv=0x7fffffffd858) at .../src/gdb/gdb.c:32
  (top-gdb)

We already check whether info.abfd is NULL before all other
bfd_get_flavour calls in the same function.  Just this one case was
missing.

(This was exposed by a WIP test that tries all "set architecture ARCH"
values.)

gdb/ChangeLog:
2016-03-07  Pedro Alves  <[email protected]>

	* mips-tdep.c (mips_gdbarch_init): Check whether info.abfd is NULL
	before calling bfd_get_flavour.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
immediate_quit used to be necessary back when prompt_for_continue used
blocking fread, but nowadays it uses gdb_readline_wrapper, which is
implemented in terms of a nested event loop, which already knows how
to react to SIGINT:

 #0  throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88)
     at .../src/gdb/common/common-exceptions.c:324
 riscvarchive#1  0x00000000007bab5d in throw_vquit (fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:366
 riscvarchive#2  0x00000000007bac9f in throw_quit (fmt=0x9d6d7e "Quit") at .../src/gdb/common/common-exceptions.c:385
 riscvarchive#3  0x0000000000773a2d in quit () at .../src/gdb/utils.c:1039
 riscvarchive#4  0x000000000065d81b in async_request_quit (arg=0x0) at .../src/gdb/event-top.c:893
 riscvarchive#5  0x000000000065c27b in invoke_async_signal_handlers () at .../src/gdb/event-loop.c:949
 riscvarchive#6  0x000000000065aeef in gdb_do_one_event () at .../src/gdb/event-loop.c:280
 riscvarchive#7  0x0000000000770838 in gdb_readline_wrapper (prompt=0x7fffffffcd40 "---Type <return> to continue, or q <return> to quit---")
     at .../src/gdb/top.c:873

The need for the QUIT in stdin_event_handler is then exposed by the
gdb.base/double-prompt-target-event-error.exp test, which has:

	# We're now stopped in a pagination query while handling a
	# target event (printing where the program stopped).  Quitting
	# the pagination should result in only one prompt being
	# output.
	send_gdb "\003p 1\n"

Without that change we'd get:

 Continuing.
 ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination
 ^CpQuit
 (gdb)  1
 Undefined command: "1".  Try "help".
 (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt
 ERROR: Undefined command "".
 UNRESOLVED: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt

Vs:

 Continuing.
 ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination
 ^CQuit
 (gdb) p 1
 $1 = 1
 (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt
 PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt

gdb/ChangeLog:
2016-04-12  Pedro Alves  <[email protected]>

	* event-top.c (stdin_event_handler): Call QUIT;
	(prompt_for_continue): Don't run with immediate_quit set.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
I see the following test fail in arm-linux with -marm and -fomit-frame-pointer,

 step
 callee () at /home/yao/SourceCode/gnu/gdb/git/gdb/testsuite/gdb.reverse/step-reverse.c:27
 27      }                       /* RETURN FROM CALLEE */
 (gdb) step
 main () at /home/yao/SourceCode/gnu/gdb/git/gdb/testsuite/gdb.reverse/step-reverse.c:58
 58         callee();    /* STEP INTO THIS CALL */
 (gdb) FAIL: gdb.reverse/step-precsave.exp: reverse step into fn call

As we can see, the "step" has already stepped into the function callee,
but in the last line.  The second "step" attempts to step to function
body, but it goes out of callee, which isn't expected.

The program is compiled with -marm and -fomit-frame-pointer, the
function callee is prologue-less, because nothing needs to be saved
on stack,

(gdb) disassemble callee
Dump of assembler code for function callee:
   0x00010680 <+0>:	movw	r3, #2364	; 0x93c
   0x00010684 <+4>:	movt	r3, riscvarchive#2
   0x00010688 <+8>:	ldr	r3, [r3]
   0x0001068c <+12>:	add	r2, r3, riscvarchive#1
   0x00010690 <+16>:	movw	r3, #2364	; 0x93c
   0x00010694 <+20>:	movt	r3, riscvarchive#2
   0x00010698 <+24>:	str	r2, [r3]
   0x0001069c <+28>:	mov	r3, #0
   0x000106a0 <+32>:	mov	r0, r3
   0x000106a4 <+36>:	bx	lr

program stops at the 0x106a0 (passed the epilogue) after the first
"step".  When second "step" is executed, the stepping range is
[0x10680-0x106a0], which starts from the first instruction of function
callee (because it doesn't have prologue).

infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 2461] at 0x1069c^M
infrun: prepare_to_wait^M
infrun: target_wait (-1.0.0, status) =^M
infrun:   2461.2461.0 [LWP 2461],^M
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP^M
infrun: TARGET_WAITKIND_STOPPED^M
infrun: stop_pc = 0x10698^M
infrun: stepping inside range [0x10680-0x106a0]

When program goes out of the range, it stops at the caller of callee,
and test fails.  IOW, if function callee has prologue, the stepping
range won't start from the first instruction of the function, and
program stops at the prologue and test passes.

IMO, GDB does nothing wrong, but test shouldn't expect the program
stops in callee after the second "step".  I decide to fix test rather
than GDB.  In this patch, I change to test to do one "step", and check
the program is still in callee, then, do multiple "step" until program
goes out of the callee.

gdb/testsuite:

2016-04-22  Yao Qi  <[email protected]>

	* gdb.reverse/step-precsave.exp: Do one step and test program
	stops in "callee" and do multiple steps until program goes out
	of "callee".
	* gdb.reverse/step-reverse.exp: Likewise.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
Nowadays, read_memory may throw NOT_AVAILABLE_ERROR (it is done by
patch http://sourceware.org/ml/gdb-patches/2013-08/msg00625.html)
however, read_stack and read_code still throws MEMORY_ERROR only.  This
causes PR 19947, that is prologue unwinder is unable unwind because
code memory isn't available, but MEMORY_ERROR is thrown, while unwinder
catches NOT_AVAILABLE_ERROR.

 #0  memory_error (err=err@entry=TARGET_XFER_E_IO, memaddr=memaddr@entry=140737349781158) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:217
 riscvarchive#1  0x000000000065f5ba in read_code (memaddr=memaddr@entry=140737349781158, myaddr=myaddr@entry=0x7fffffffd7b0 "\340\023<\001", len=len@entry=1)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:288
 riscvarchive#2  0x000000000065f7b5 in read_code_unsigned_integer (memaddr=memaddr@entry=140737349781158, len=len@entry=1, byte_order=byte_order@entry=BFD_ENDIAN_LITTLE)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:363
 riscvarchive#3  0x00000000004717e0 in amd64_analyze_prologue (gdbarch=gdbarch@entry=0x13c13e0, pc=140737349781158, current_pc=140737349781165, cache=cache@entry=0xda0cb0)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2267
 riscvarchive#4  0x0000000000471f6d in amd64_frame_cache_1 (cache=0xda0cb0, this_frame=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2437
 riscvarchive#5  amd64_frame_cache (this_frame=0xda0bf0, this_cache=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2508
 riscvarchive#6  0x000000000047214d in amd64_frame_this_id (this_frame=<optimised out>, this_cache=<optimised out>, this_id=0xda0c50)
     at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2541
 riscvarchive#7  0x00000000006b94c4 in compute_frame_id (fi=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:481
 riscvarchive#8  get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1809
 riscvarchive#9  0x00000000006bb6c9 in get_prev_frame_always_1 (this_frame=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1983
 riscvarchive#10 get_prev_frame_always (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1999
 riscvarchive#11 0x00000000006bbe11 in get_prev_frame (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2241
 riscvarchive#12 0x00000000006bc13c in unwind_to_current_frame (ui_out=<optimised out>, args=args@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1485

The fix is to let read_stack and read_code throw NOT_AVAILABLE_ERROR too,
in order to align with read_memory.

gdb:

2016-05-04  Yao Qi  <[email protected]>

	PR gdb/19947
	* corefile.c (read_memory): Rename it to ...
	(read_memory_object): ... it.  Add parameter object.
	(read_memory): Call read_memory_object.
	(read_stack): Likewise.
	(read_code): Likewise.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
Nowadays, GDB can't insert breakpoint on the return address of the
exception handler on ARM M-profile, because the address is a magic
one 0xfffffff9,

 (gdb) bt
 #0  CT32B1_IRQHandler () at ../src/timer.c:67
 riscvarchive#1  <signal handler called>
 riscvarchive#2  main () at ../src/timer.c:127

(gdb) info frame
Stack level 0, frame at 0x200ffa8:
 pc = 0x4ec in CT32B1_IRQHandler (../src/timer.c:67); saved pc = 0xfffffff9
 called by frame at 0x200ffc8
 source language c.
 Arglist at 0x200ffa0, args:
 Locals at 0x200ffa0, Previous frame's sp is 0x200ffa8
 Saved registers:
  r7 at 0x200ffa0, lr at 0x200ffa4

(gdb) x/x 0xfffffff9
0xfffffff9:     Cannot access memory at address 0xfffffff9

(gdb) finish
Run till exit from #0  CT32B1_IRQHandler () at ../src/timer.c:67
Ed:15: Target error from Set break/watch: Et:96: Pseudo-address (0xFFFFFFxx) for EXC_RETURN is invalid (GDB error?)

Warning:
Cannot insert hardware breakpoint 0.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

Command aborted.

even some debug probe can't set hardware breakpoint on the magic
address too,

(gdb) hbreak *0xfffffff9
Hardware assisted breakpoint 2 at 0xfffffff9
(gdb) c
Continuing.
Ed:15: Target error from Set break/watch: Et:96: Pseudo-address (0xFFFFFFxx) for EXC_RETURN is invalid (GDB error?)

Warning:
Cannot insert hardware breakpoint 2.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

Command aborted.

The problem described above is quite similar to PR 8841, in which GDB
can't set breakpoint on signal trampoline, which is mapped to a read-only
page by kernel.  The rationale of this patch is to skip "unwritable"
frames when looking for caller frames in command "finish", and a new
gdbarch method code_of_frame_writable is added.  This patch fixes
the problem on ARM cortex-m target, but it can be used to fix
PR 8841 too.

gdb:

2016-05-10  Yao Qi  <[email protected]>

	* arch-utils.c (default_code_of_frame_writable): New function.
	* arch-utils.h (default_code_of_frame_writable): Declare.
	* arm-tdep.c (arm_code_of_frame_writable): New function.
	(arm_gdbarch_init): Install gdbarch method
	code_of_frame_writable if the target is M-profile.
	* frame.c (skip_unwritable_frames): New function.
	* frame.h (skip_unwritable_frames): Declare.
	* gdbarch.sh (code_of_frame_writable): New.
	* gdbarch.c, gdbarch.h: Re-generated.
	* infcmd.c (finish_command): Call skip_unwritable_frames.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
As reported in PR 19998, after type ctrl-c, GDB hang there and does
not send interrupt.  It causes a fail in gdb.base/interrupt.exp.
All targets support remote fileio should be affected.

When we type ctrc-c, SIGINT is handled by remote_fileio_sig_set,
as shown below,

 #0  remote_fileio_sig_set (sigint_func=0x4495d0 <remote_fileio_ctrl_c_signal_handler(int)>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:325
 riscvarchive#1  0x00000000004495de in remote_fileio_ctrl_c_signal_handler (signo=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:349
 riscvarchive#2  <signal handler called>
 riscvarchive#3  0x00007ffff647ed83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
 riscvarchive#4  0x00000000005530ce in interruptible_select (n=10, readfds=readfds@entry=0x7fffffffd730, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0,
    timeout=timeout@entry=0x0) at /home/yao/SourceCode/gnu/gdb/git/gdb/event-top.c:1017
 riscvarchive#5  0x000000000061ab20 in stdio_file_read (file=<optimised out>, buf=0x12d02e0 "\n\022-\001", length_buf=16383)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/ui-file.c:577
 riscvarchive#6  0x000000000044a4dc in remote_fileio_func_read (buf=0x12c0360 "") at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:583
 riscvarchive#7  0x0000000000449598 in do_remote_fileio_request (uiout=<optimised out>, buf_arg=buf_arg@entry=0x12c0340)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:1179

we don't set quit_serial_event,

  do
    {
      res = gdb_select (n, readfds, writefds, exceptfds, timeout);
    }
  while (res == -1 && errno == EINTR);

  if (res == 1 && FD_ISSET (fd, readfds))
    {
      errno = EINTR;
      return -1;
    }
  return res;

we can't go out of the loop above, and that is why GDB can't send
interrupt.

Recently, we stop throwing exception from SIGINT handler
(remote_fileio_ctrl_c_signal_handler)
https://sourceware.org/ml/gdb-patches/2016-03/msg00372.html, which
is correct, because gdb_select is interruptible.  However, in the
same patch series, we add interruptible_select later as a wrapper
to gdb_select, https://sourceware.org/ml/gdb-patches/2016-03/msg00375.html
and it is not interruptible (because of the loop in it) unless
select/poll-able file descriptors are marked.

This fix in this patch is to call quit_serial_event_set, so that we can
go out of the loop above, return -1 and set errno to EINTR.

2016-06-01  Yao Qi  <[email protected]>

	PR remote/19998
	* remote-fileio.c (remote_fileio_ctrl_c_signal_handler): Call
	quit_serial_event_set.
zizztux pushed a commit to zizztux/riscv-binutils-gdb that referenced this issue Aug 11, 2016
This patch adds some sanity check that reinsert breakpoints must be
there when doing step-over on software single step target.  The check
triggers an assert when running forking-threads-plus-breakpoint.exp
on arm-linux target,

 gdb/gdbserver/linux-low.c:4714: A problem internal to GDBserver has been detected.^M
 int finish_step_over(lwp_info*): Assertion `has_reinsert_breakpoints ()' failed.

the error happens when GDBserver has already resumed a thread of
process A for step-over (and wait for it hitting reinsert breakpoint),
but receives detach request for process B from GDB, which is shown in
the backtrace below,

 (gdb) bt
 riscvarchive#2  0x000228aa in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4703
 riscvarchive#3  0x00025a50 in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4749
 riscvarchive#4  complete_ongoing_step_over () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4760
 riscvarchive#5  linux_detach (pid=25228) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1503
 riscvarchive#6  0x00012bae in process_serial_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3974
 riscvarchive#7  handle_serial_event (err=<optimized out>, client_data=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:4347
 riscvarchive#8  0x00016d68 in handle_file_event (event_file_desc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:429
 riscvarchive#9  0x000173ea in process_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:184
 riscvarchive#10 start_event_loop () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:547
 riscvarchive#11 0x0000aa2c in captured_main (argv=<optimized out>, argc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3719
 riscvarchive#12 main (argc=<optimized out>, argv=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3804

the sanity check tries to find the reinsert breakpoint from process B,
but nothing is found.  It is wrong, we need to search in process A,
since we started step-over of a thread of process A.

 (gdb) p lwp->thread->entry.id
 $3 = {pid = 25120, lwp = 25131, tid = 0}
 (gdb) p current_thread->entry.id
 $4 = {pid = 25228, lwp = 25228, tid = 0}

This patch switched current_thread to the thread we are doing step-over
in finish_step_over.

gdb/gdbserver:

2016-06-17  Yao Qi  <[email protected]>

	* linux-low.c (maybe_hw_step): New function.
	(linux_resume_one_lwp_throw): Call maybe_hw_step.
	(finish_step_over): Switch current_thread to lwp temporarily,
	and assert has_reinsert_breakpoints returns true.
	(proceed_one_lwp): Call maybe_hw_step.
	* mem-break.c (has_reinsert_breakpoints): New function.
	* mem-break.h (has_reinsert_breakpoints): Declare.
palmer-dabbelt pushed a commit that referenced this issue Oct 10, 2016
When I run process-dies-while-detaching.exp with GDBserver, I see many
warnings printed by GDBserver,

ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26183: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process
ptrace(regsets_fetch_inferior_registers) PID=26184: No such process

regsets_fetch_inferior_registers is called when GDBserver resumes each
lwp.

 #2  0x0000000000428260 in regsets_fetch_inferior_registers (regsets_info=0x4690d0 <aarch64_regsets_info>, regcache=0x31832020)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:5412
 #3  0x00000000004070e8 in get_thread_regcache (thread=0x31832940, fetch=fetch@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/regcache.c:58
 #4  0x0000000000429c40 in linux_resume_one_lwp_throw (info=<optimized out>, signal=0, step=0, lwp=0x31832830)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4463
 #5  linux_resume_one_lwp (lwp=0x31832830, step=<optimized out>, signal=<optimized out>, info=<optimized out>)
    at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4573

The is the case that threads are disappeared when GDB/GDBserver resumes
them.  We check errno for ESRCH, and don't print error messages, like
what we are doing in regsets_store_inferior_registers.

gdb/gdbserver:

2016-08-04  Yao Qi  <[email protected]>

	* linux-low.c (regsets_fetch_inferior_registers): Check
	errno is ESRCH or not.
palmer-dabbelt pushed a commit that referenced this issue Oct 10, 2016
… out value

With something like:

  struct A { int bitfield:4; } var;

If 'var' ends up wholly-optimized out, printing 'var.bitfield' crashes
gdb here:

 (top-gdb) bt
 #0  0x000000000058b89f in extract_unsigned_integer (addr=0x2 <error: Cannot access memory at address 0x2>, len=2, byte_order=BFD_ENDIAN_LITTLE)
     at /home/pedro/gdb/mygit/src/gdb/findvar.c:109
 #1  0x00000000005a187a in unpack_bits_as_long (field_type=0x16cff70, valaddr=0x0, bitpos=16, bitsize=12) at /home/pedro/gdb/mygit/src/gdb/value.c:3347
 #2  0x00000000005a1b9d in unpack_value_bitfield (dest_val=0x1b5d9d0, bitpos=16, bitsize=12, valaddr=0x0, embedded_offset=0, val=0x1b5d8d0)
     at /home/pedro/gdb/mygit/src/gdb/value.c:3441
 #3  0x00000000005a2a5f in value_fetch_lazy (val=0x1b5d9d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3958
 #4  0x00000000005a10a7 in value_primitive_field (arg1=0x1b5d8d0, offset=0, fieldno=0, arg_type=0x16d04c0) at /home/pedro/gdb/mygit/src/gdb/value.c:3161
 #5  0x00000000005b01e5 in do_search_struct_field (name=0x1727c60 "bitfield", arg1=0x1b5d8d0, offset=0, type=0x16d04c0, looking_for_baseclass=0, result_ptr=0x7fffffffcaf8,
 [...]

unpack_value_bitfield is already optimized-out/unavailable -aware:

   (...) VALADDR points to the contents of VAL.  If the VAL's contents
   required to extract the bitfield from are unavailable/optimized
   out, DEST_VAL is correspondingly marked unavailable/optimized out.

however, it is not considering the case of the value having no
contents buffer at all, as can happen through
allocate_optimized_out_value.

gdb/ChangeLog:
2016-08-09  Pedro Alves  <[email protected]>

	* value.c (unpack_value_bitfield): Skip unpacking if the parent
	has no contents buffer to begin with.

gdb/testsuite/ChangeLog:
2016-08-09  Pedro Alves  <[email protected]>

	* gdb.dwarf2/bitfield-parent-optimized-out.exp: New file.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Jul 8, 2021
Currently, on GNU/Linux, if you try to access memory and you have a
running thread selected, GDB fails the memory accesses, like:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 Cannot access memory at address 0x555555558010

Or:

 (gdb) b main
 Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
 Warning:
 Cannot insert breakpoint 2.
 Cannot access memory at address 0x55555555524d

This patch removes this limitation.  It teaches the native Linux
target to read/write memory even if the target is running.  And it
does this without temporarily stopping threads.  We now get:

 (gdb) c&
 Continuing.
 (gdb) p global_var
 $1 = 123
 (gdb) b main
 Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.

(The scenarios above work correctly with current GDBserver, because
GDBserver temporarily stops all threads in the process whenever GDB
wants to access memory (see prepare_to_access_memory /
done_accessing_memory).  Freezing the whole process makes sense when
we need to be sure that we have a consistent view of memory and don't
race with the inferior changing it at the same time as GDB is
accessing it.  But I think that's a too-heavy hammer for the default
behavior.  I think that ideally, whether to stop all threads or not
should be policy decided by gdb core, probably best implemented by
exposing something like gdbserver's prepare_to_access_memory /
done_accessing_memory to gdb core.)

Currently, if we're accessing (reading/writing) just a few bytes, then
the Linux native backend does not try accessing memory via
/proc/<pid>/mem and goes straight to ptrace
PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when
the ptracee is running.  So the first step is to prefer
/proc/<pid>/mem even for small accesses.  Without further changes
however, that may cause a performance regression, due to constantly
opening and closing /proc/<pid>/mem for each memory access.  So the
next step is to keep the /proc/<pid>/mem file open across memory
accesses.  If we have this, then it doesn't make sense anymore to even
have the ptrace fallback, so the patch disables it.

I've made it such that GDB only ever has one /proc/<pid>/mem file open
at any time.  As long as a memory access hits the same inferior
process as the previous access, then we reuse the previously open
file.  If however, we access memory of a different process, then we
close the previous file and open a new one for the new process.

If we wanted, we could keep one /proc/<pid>/mem file open per
inferior, and never close them (unless the inferior exits or execs).
However, having seen bfd patches recently about hitting too many open
file descriptors, I kept the logic to have only one file open tops.
Also, we need to handle memory accesses for processes for which we
don't have an inferior object, for when we need to detach a
fork-child, and we'd probaly want to handle caching the open file for
that scenario (no inferior for process) too, which would probably end
up meaning caching for last non-inferior process, which is very much
what I'm proposing anyhow.  So always having one file open likely ends
up a smaller patch.

The next step is handling the case of GDB reading/writing memory
through a thread that is running and exits.  The access should not
result in a user-visible failure if the inferior/process is still
alive.

Once we manage to open a /proc/<lwpid>/mem file, then that file is
usable for memory accesses even if the corresponding lwp exits and is
reaped.  I double checked that trying to open the same
/proc/<lwpid>/mem path again fails because the lwp is really gone so
there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
previously open file remains usable.  It's only when the whole process
execs that we need to reopen a new file.

When the kernel destroys the whole address space, i.e., when the
process exits or execs, the reads/writes fail with 0 aka EOF, in which
case there's nothing else to do than returning a memory access
failure.  Note this means that when we get an exec event, we need to
reopen the file, to access the process's new address space.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening it for exits before we open it and before we reap the
LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch
handles this by just looking for another thread until it finds one
that we can open a /proc/<pid>/mem successfully for.

If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
we're opening has exited and we already reaped it, which is the case
if the selected thread is in THREAD_EXIT state, the open fails with
ENOENT.  The patch handles this the same way as a zombie race
(EACCES), instead of checking upfront whether we're accessing a
known-exited thread, because that would result in more complicated
code, because we also need to handle accessing lwps that are not
listed in the core thread list, and it's the core thread list that
records the THREAD_EXIT state.

The patch includes two testcases:

riscvarchive#1 - gdb.base/access-mem-running.exp

  This is the conceptually simplest - it is single-threaded, and has
  GDB read and write memory while the program is running.  It also
  tests setting a breakpoint while the program is running, and checks
  that the breakpoint is hit immediately.

riscvarchive#2 - gdb.threads/access-mem-running-thread-exit.exp

  This one is more elaborate, as it continuously spawns short-lived
  threads in order to exercise accessing memory just while threads are
  exiting.  It also spawns two different processes and alternates
  accessing memory between the two processes to exercise the reopening
  the /proc file frequently.  This also ends up exercising GDB reading
  from an exited thread frequently.  I confirmed by putting abort()
  calls in the EACCES/ENOENT paths added by the patch that we do hit
  all of them frequently with the testcase.  It also exits the
  process's main thread (i.e., the main thread becomes zombie), to
  make sure accessing memory in such a corner-case scenario works now
  and in the future.

The tests fail on GNU/Linux native before the code changes, and pass
after.  They pass against current GDBserver, again because GDBserver
supports memory access even if all threads are running, by
transparently pausing the whole process.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR mi/15729
	PR gdb/13463
	* linux-nat.c (linux_nat_target::detach): Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
	/proc/<pid>/mem file if it was open for this process.
	(linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
	if it was open for this process.
	(linux_nat_target::xfer_partial): Adjust.  Do not fall back to
	inf_ptrace_target::xfer_partial for memory accesses.
	(last_proc_mem_file): New.
	(maybe_close_proc_mem_file): New.
	(linux_proc_xfer_memory_partial_pid): New, with bits factored out
	from linux_proc_xfer_partial.
	(linux_proc_xfer_partial): Delete.
	(linux_proc_xfer_memory_partial): New.

gdb/testsuite/ChangeLog
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR mi/15729
	PR gdb/13463
	* gdb.base/access-mem-running.c: New.
	* gdb.base/access-mem-running.exp: New.
	* gdb.threads/access-mem-running-thread-exit.c: New.
	* gdb.threads/access-mem-running-thread-exit.exp: New.

Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Jul 8, 2021
This makes the simulator work the same regardless of the target (bare
metal m32r-elf or Linux m32r-linux-gnu) by unifying the traps code.
It was mostly already the same with the only difference being support
for trap riscvarchive#2 reserved for Linux syscalls.  We can move that logic to
runtime by checking the current environment operating mode instead.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Jul 8, 2021
When loading a file using the file command on macOS, we get:

    $ ./gdb -nx --data-directory=data-directory -q -ex "file ./test"
    Reading symbols from ./test...
    Reading symbols from /Users/smarchi/build/binutils-gdb/gdb/test.dSYM/Contents/Resources/DWARF/test...
    /Users/smarchi/src/binutils-gdb/gdb/thread.c:72: internal-error: struct thread_info *inferior_thread(): Assertion `current_thread_ != nullptr' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    Quit this debugging session? (y or n)

The backtrace is:

    * frame #0: 0x0000000101fcb826 gdb`internal_error(file="/Users/smarchi/src/binutils-gdb/gdb/thread.c", line=72, fmt="%s: Assertion `%s' failed.") at errors.cc:52:3
      frame riscvarchive#1: 0x00000001018a2584 gdb`inferior_thread() at thread.c:72:3
      frame riscvarchive#2: 0x0000000101469c09 gdb`get_current_regcache() at regcache.c:421:31
      frame riscvarchive#3: 0x00000001015f9812 gdb`darwin_solib_get_all_image_info_addr_at_init(info=0x0000603000006d00) at solib-darwin.c:464:34
      frame riscvarchive#4: 0x00000001015f7a04 gdb`darwin_solib_create_inferior_hook(from_tty=1) at solib-darwin.c:515:5
      frame riscvarchive#5: 0x000000010161205e gdb`solib_create_inferior_hook(from_tty=1) at solib.c:1200:3
      frame riscvarchive#6: 0x00000001016d8f76 gdb`symbol_file_command(args="./test", from_tty=1) at symfile.c:1650:7
      frame riscvarchive#7: 0x0000000100abab17 gdb`file_command(arg="./test", from_tty=1) at exec.c:555:3
      frame riscvarchive#8: 0x00000001004dc799 gdb`do_const_cfunc(c=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:102:3
      frame riscvarchive#9: 0x00000001004ea042 gdb`cmd_func(cmd=0x000061100000c340, args="./test", from_tty=1) at cli-decode.c:2160:7
      frame riscvarchive#10: 0x00000001018d4f59 gdb`execute_command(p="t", from_tty=1) at top.c:674:2
      frame riscvarchive#11: 0x0000000100eee430 gdb`catch_command_errors(command=(gdb`execute_command(char const*, int) at top.c:561), arg="file ./test", from_tty=1, do_bp_actions=true)(char const*, int), char const*, int, bool) at main.c:523:7
      frame riscvarchive#12: 0x0000000100eee902 gdb`execute_cmdargs(cmdarg_vec=0x00007ffeefbfeba0 size=1, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x00007ffeefbfec20) at main.c:618:9
      frame riscvarchive#13: 0x0000000100eed3a4 gdb`captured_main_1(context=0x00007ffeefbff780) at main.c:1322:3
      frame riscvarchive#14: 0x0000000100ee810d gdb`captured_main(data=0x00007ffeefbff780) at main.c:1343:3
      frame riscvarchive#15: 0x0000000100ee8025 gdb`gdb_main(args=0x00007ffeefbff780) at main.c:1368:7
      frame riscvarchive#16: 0x00000001000044f1 gdb`main(argc=6, argv=0x00007ffeefbff8a0) at gdb.c:32:10
      frame riscvarchive#17: 0x00007fff20558f5d libdyld.dylib`start + 1

The solib_create_inferior_hook call in symbol_file_command was added by
commit ea142fb ("Fix breakpoints on file reloads for PIE
binaries").  It causes solib_create_inferior_hook to be called while
the inferior is not running, which darwin_solib_create_inferior_hook
does not expect.  darwin_solib_get_all_image_info_addr_at_init, in
particular, assumes that there is a current thread, as it tries to get
the current thread's regcache.

Fix it by adding a target_has_execution check and returning early.  Note
that there is a similar check in svr4_solib_create_inferior_hook.

gdb/ChangeLog:

	* solib-darwin.c (darwin_solib_create_inferior_hook): Return
	early if no execution.

Change-Id: Ia11dd983a1e29786e5ce663d0fcaa6846dc611bb
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Jul 22, 2021
Commit 408f668 ("detach in all-stop
with threads running") regressed "detach" with "target remote":

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 riscvarchive#1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 riscvarchive#2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 riscvarchive#3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 riscvarchive#4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 riscvarchive#5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 riscvarchive#6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 riscvarchive#7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 riscvarchive#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 riscvarchive#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 riscvarchive#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 riscvarchive#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 riscvarchive#12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 riscvarchive#13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 riscvarchive#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 riscvarchive#15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 riscvarchive#16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 riscvarchive#17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 riscvarchive#18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 riscvarchive#19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 riscvarchive#20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 riscvarchive#21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 riscvarchive#22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 riscvarchive#23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 riscvarchive#24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 riscvarchive#25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 riscvarchive#26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 riscvarchive#27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 riscvarchive#28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

So frame riscvarchive#28 already detached the remote process, and then we're
purging the shared libraries.  GDB had opened remote shared libraries
via the target: sysroot, so it tries closing them.  GDBserver is
tearing down already, so remote communication breaks down and we close
the remote target and throw TARGET_CLOSE_ERROR.

Note frame riscvarchive#14:

 riscvarchive#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

Stepping back a bit, why do we still have open remote files if we've
managed to detach already, and, we're debugging with "target remote"?
The reason is that commit 408f668
makes detach_command hold a reference to the target, so the remote
target won't be finally closed until frame riscvarchive#28 returns.  It's closing
the target that invalidates target file I/O handles.

This commit fixes the issue by not relying on target_close to
invalidate the target file I/O handles, instead invalidate them
immediately in remote_unpush_target.  So when GDB purges the solibs,
and we end up in target_fileio_close (frame riscvarchive#7 above), there's nothing
to do, and we don't try to talk with the remote target anymore.

The regression isn't seen when testing with
--target_board=native-gdbserver, because that does "set sysroot" to
disable the "target:" sysroot, for test run speed reasons.  So this
commit adds a testcase that explicitly tests detach with "set sysroot
target:".

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* remote.c (remote_unpush_target): Invalidate file I/O target
	handles.
	* target.c (fileio_handles_invalidate_target): Make extern.
	* target.h (fileio_handles_invalidate_target): Declare.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* gdb.base/detach-sysroot-target.exp: New.
	* gdb.base/detach-sysroot-target.c: New.

Reported-By: Jonah Graham <[email protected]>

Change-Id: I851234910172f42a1b30e731161376c344d2727d
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Jul 22, 2021
…080)

Before PR gdb/28080 was fixed by the previous patch, GDB was crashing
like this:

 (gdb) detach
 Detaching from program: target:/any/program, process 3671843
 Detaching from process 3671843
 Ending remote debugging.
 [Inferior 1 (process 3671843) detached]
 In main
 terminate called after throwing an instance of 'gdb_exception_error'
 Aborted (core dumped)

Here's the exception above being thrown:

 (top-gdb) bt
 #0  throw_error (error=TARGET_CLOSE_ERROR, fmt=0x555556035588 "Remote connection closed") at src/gdbsupport/common-exceptions.cc:222
 riscvarchive#1  0x0000555555bbaa46 in remote_target::readchar (this=0x555556a11040, timeout=10000) at src/gdb/remote.c:9440
 riscvarchive#2  0x0000555555bbb9e5 in remote_target::getpkt_or_notif_sane_1 (this=0x555556a11040, buf=0x555556a11058, forever=0, expecting_notif=0, is_notif=0x0) at src/gdb/remote.c:9928
 riscvarchive#3  0x0000555555bbbda9 in remote_target::getpkt_sane (this=0x555556a11040, buf=0x555556a11058, forever=0) at src/gdb/remote.c:10030
 riscvarchive#4  0x0000555555bc0e75 in remote_target::remote_hostio_send_command (this=0x555556a11040, command_bytes=13, which_packet=14, remote_errno=0x7fffffffcfd0, attachment=0x0, attachment_len=0x0) at src/gdb/remote.c:12137
 riscvarchive#5  0x0000555555bc1b6c in remote_target::remote_hostio_close (this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12455
 riscvarchive#6  0x0000555555bc1bb4 in remote_target::fileio_close (During symbol reading: .debug_line address at offset 0x64f417 is 0 [in module build/gdb/gdb]
 this=0x555556a11040, fd=8, remote_errno=0x7fffffffcfd0) at src/gdb/remote.c:12462
 riscvarchive#7  0x0000555555c9274c in target_fileio_close (fd=3, target_errno=0x7fffffffcfd0) at src/gdb/target.c:3365
 riscvarchive#8  0x000055555595a19d in gdb_bfd_iovec_fileio_close (abfd=0x555556b9f8a0, stream=0x555556b11530) at src/gdb/gdb_bfd.c:439
 riscvarchive#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556b9f8a0) at src/bfd/opncls.c:599
 riscvarchive#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556b9f8a0) at src/bfd/opncls.c:847
 riscvarchive#11 0x0000555555e0a27a in bfd_close (abfd=0x555556b9f8a0) at src/bfd/opncls.c:814
 riscvarchive#12 0x000055555595a9d3 in gdb_bfd_close_or_warn (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:626
 riscvarchive#13 0x000055555595ad29 in gdb_bfd_unref (abfd=0x555556b9f8a0) at src/gdb/gdb_bfd.c:715
 riscvarchive#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573
 riscvarchive#15 0x0000555555ae955a in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:377
 riscvarchive#16 0x000055555572b7c8 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x555556c20db0) at /usr/include/c++/9/bits/shared_ptr_base.h:155
 riscvarchive#17 0x00005555557263c3 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x555556bf0588, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:730
 riscvarchive#18 0x0000555555ae745e in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr_base.h:1169
 riscvarchive#19 0x0000555555ae747e in std::shared_ptr<objfile>::~shared_ptr (this=0x555556bf0580, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/shared_ptr.h:103
 riscvarchive#20 0x0000555555b1c1dc in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x5555564cdd60, __p=0x555556bf0580) at /usr/include/c++/9/ext/new_allocator.h:153
 riscvarchive#21 0x0000555555b1bb1d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=..., __p=0x555556bf0580) at /usr/include/c++/9/bits/alloc_traits.h:497
 riscvarchive#22 0x0000555555b1b73e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/stl_list.h:1921
 riscvarchive#23 0x0000555555b1afeb in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x5555564cdd60, __position=std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x555556515540}) at /usr/include/c++/9/bits/list.tcc:158
 riscvarchive#24 0x0000555555b19576 in program_space::remove_objfile (this=0x5555564cdd20, objfile=0x555556515540) at src/gdb/progspace.c:210
 riscvarchive#25 0x0000555555ae4502 in objfile::unlink (this=0x555556515540) at src/gdb/objfiles.c:487
 riscvarchive#26 0x0000555555ae5a12 in objfile_purge_solibs () at src/gdb/objfiles.c:875
 riscvarchive#27 0x0000555555c09686 in no_shared_libraries (ignored=0x0, from_tty=1) at src/gdb/solib.c:1236
 riscvarchive#28 0x00005555559e3f5f in detach_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:2769

Note frame riscvarchive#14:

 riscvarchive#14 0x0000555555ae4730 in objfile::~objfile (this=0x555556515540, __in_chrg=<optimized out>) at src/gdb/objfiles.c:573

That's a dtor, thus noexcept.  That's the reason for the
std::terminate.

The previous patch fixed things such that the exception above isn't
thrown anymore.  However, it's possible that e.g., the remote
connection drops just while a user types "nosharedlibrary", or some
other reason that leads to objfile::~objfile, and then we end up the
same std::terminate problem.

Also notice that frames riscvarchive#9-riscvarchive#11 are BFD frames:

 riscvarchive#9  0x0000555555e09e3f in opncls_bclose (abfd=0x555556bc27e0) at src/bfd/opncls.c:599
 riscvarchive#10 0x0000555555e0a2c7 in bfd_close_all_done (abfd=0x555556bc27e0) at src/bfd/opncls.c:847
 riscvarchive#11 0x0000555555e0a27a in bfd_close (abfd=0x555556bc27e0) at src/bfd/opncls.c:814

BFD is written in C and thus throwing exceptions over such frames may
either not clean up properly, or, may abort if bfd is not compiled
with -fasynchronous-unwind-tables (x86-64 defaults that on, but not
all GCC ports do).

Thus frame riscvarchive#8 seems like a good place to swallow exceptions.  More so
since in this spot we already ignore target_fileio_close return
errors.  That's what this commit does.  Without the previous fix, we'd
see:

 (gdb) detach
 Detaching from program: target:/any/program, process 2197701
 Ending remote debugging.
 [Inferior 1 (process 2197701) detached]
 warning: cannot close "target:/lib64/ld-linux-x86-64.so.2": Remote connection closed

Note it prints a warning, which would still be a regression compared
to GDB 10, if it weren't for the previous fix.

gdb/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected]>

	PR gdb/28080
	* gdb_bfd.c (gdb_bfd_close_warning): New.
	(gdb_bfd_iovec_fileio_close): Wrap target_fileio_close in
	try/catch and print warning on exception.
	(gdb_bfd_close_or_warn): Use gdb_bfd_close_warning.

Change-Id: Ic7a26ddba0a4444e3377b0e7c1c89934a84545d7
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
Simon Marchi tried gdb on OpenBSD, and it immediately segfaults when
running a program.  Simon tracked down the problem to x86_dr_low.get_status
being nullptr at this point:

    (lldb) print x86_dr_low.get_status
    (unsigned long (*)()) $0 = 0x0000000000000000
    (lldb) bt
    * thread riscvarchive#1, stop reason = step over
      * frame #0: 0x0000033b64b764aa gdb`x86_dr_stopped_data_address(state=0x0000033d7162a310, addr_p=0x00007f7ffffc5688) at x86-dregs.c:645:12
        frame riscvarchive#1: 0x0000033b64b766de gdb`x86_dr_stopped_by_watchpoint(state=0x0000033d7162a310) at x86-dregs.c:687:10
        frame riscvarchive#2: 0x0000033b64ea5f72 gdb`x86_stopped_by_watchpoint() at x86-nat.c:206:10
        frame riscvarchive#3: 0x0000033b64637fbb gdb`x86_nat_target<obsd_nat_target>::stopped_by_watchpoint(this=0x0000033b65252820) at x86-nat.h:100:12
        frame riscvarchive#4: 0x0000033b64d3ff11 gdb`target_stopped_by_watchpoint() at target.c:468:46
        frame riscvarchive#5: 0x0000033b6469b001 gdb`watchpoints_triggered(ws=0x00007f7ffffc61c8) at breakpoint.c:4790:32
        frame riscvarchive#6: 0x0000033b64a8bb8b gdb`handle_signal_stop(ecs=0x00007f7ffffc61a0) at infrun.c:6072:29
        frame riscvarchive#7: 0x0000033b64a7e3a7 gdb`handle_inferior_event(ecs=0x00007f7ffffc61a0) at infrun.c:5694:7
        frame riscvarchive#8: 0x0000033b64a7c1a0 gdb`fetch_inferior_event() at infrun.c:4090:5
        frame riscvarchive#9: 0x0000033b64a51921 gdb`inferior_event_handler(event_type=INF_REG_EVENT) at inf-loop.c:41:7
        frame riscvarchive#10: 0x0000033b64a827c9 gdb`infrun_async_inferior_event_handler(data=0x0000000000000000) at infrun.c:9384:3
        frame riscvarchive#11: 0x0000033b6465bd4f gdb`check_async_event_handlers() at async-event.c:335:4
        frame riscvarchive#12: 0x0000033b65070917 gdb`gdb_do_one_event() at event-loop.cc:216:10
        frame riscvarchive#13: 0x0000033b64af0db1 gdb`start_event_loop() at main.c:421:13
        frame riscvarchive#14: 0x0000033b64aefe9a gdb`captured_command_loop() at main.c:481:3
        frame riscvarchive#15: 0x0000033b64aed5c2 gdb`captured_main(data=0x00007f7ffffc6470) at main.c:1353:4
        frame riscvarchive#16: 0x0000033b64aed4f2 gdb`gdb_main(args=0x00007f7ffffc6470) at main.c:1368:7
        frame riscvarchive#17: 0x0000033b6459d787 gdb`main(argc=5, argv=0x00007f7ffffc6518) at gdb.c:32:10
        frame riscvarchive#18: 0x0000033b6459d521 gdb`___start + 321

On BSDs, get_status is set in _initialize_x86_bsd_nat, but only if
HAVE_PT_GETDBREGS is defined.  PT_GETDBREGS doesn't exist on OpenBSD, so
get_status (and the other fields of x86_dr_low) are left as nullptr.

OpenBSD doesn't support getting or setting the x86 debug registers, so
fix by omitting debug register support entirely on OpenBSD:

- Change x86bsd_nat_target to only inherit from x86_nat_target if
  PT_GETDBREGS is supported.

- Don't include x86-nat.o and nat/x86-dregs.o for OpenBSD/amd64.  They
  were already omitted for OpenBSD/i386.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
In PR28004 the following warning / Internal error is reported:
...
$ gdb -q -batch \
    -iex "set sysroot $(pwd -P)/repro" \
    ./repro/gdb \
    ./repro/core \
    -ex bt
  ...
 Program terminated with signal SIGABRT, Aborted.
 #0  0x00007ff8fe8e5d22 in raise () from repro/usr/lib/libc.so.6
 [Current thread is 1 (LWP 1762498)]
 riscvarchive#1  0x00007ff8fe8cf862 in abort () from repro/usr/lib/libc.so.6
 warning: (Internal error: pc 0x7ff8feb2c21d in read in psymtab, \
           but not in symtab.)
 warning: (Internal error: pc 0x7ff8feb2c218 in read in psymtab, \
           but not in symtab.)
  ...
 riscvarchive#2  0x00007ff8feb2c21e in __gnu_debug::_Error_formatter::_M_error() const \
   [clone .cold] (warning: (Internal error: pc 0x7ff8feb2c21d in read in \
   psymtab, but not in symtab.)

) from repro/usr/lib/libstdc++.so.6
...

The warning is about the following:
- in find_pc_sect_compunit_symtab we try to find the address
  (0x7ff8feb2c218 / 0x7ff8feb2c21d) in the symtabs.
- that fails, so we try again in the partial symtabs.
- we find a matching partial symtab
- however, the partial symtab has a full symtab, so
  we should have found a matching symtab in the first step.

The addresses are:
...
(gdb) info sym 0x7ff8feb2c218
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] in \
  section .text of repro/usr/lib/libstdc++.so.6
(gdb) info sym 0x7ff8feb2c21d
__gnu_debug::_Error_formatter::_M_error() const [clone .cold] + 5 in \
  section .text of repro/usr/lib/libstdc++.so.6
...
which correspond to unrelocated addresses 0x9c218 and 0x9c21d:
...
$ nm -C  repro/usr/lib/libstdc++.so.6.0.29 | grep 000000000009c218
000000000009c218 t __gnu_debug::_Error_formatter::_M_error() const \
  [clone .cold]
...
which belong to function __gnu_debug::_Error_formatter::_M_error() in
/build/gcc/src/gcc/libstdc++-v3/src/c++11/debug.cc.

The partial symtab that is found for the addresses is instead the one for
/build/gcc/src/gcc/libstdc++-v3/src/c++98/bitmap_allocator.cc, which is
incorrect.

This happens as follows.

The bitmap_allocator.cc CU has DW_AT_ranges at .debug_rnglist offset 0x4b50:
...
    00004b50 0000000000000000 0000000000000056
    00004b5a 00000000000a4790 00000000000a479c
    00004b64 00000000000a47a0 00000000000a47ac
...

When reading the first range 0x0..0x56, it doesn't trigger the "start address
of zero" complaint here:
...
      /* A not-uncommon case of bad debug info.
         Don't pollute the addrmap with bad data.  */
      if (range_beginning + baseaddr == 0
          && !per_objfile->per_bfd->has_section_at_zero)
        {
          complaint (_(".debug_rnglists entry has start address of zero"
                       " [in module %s]"), objfile_name (objfile));
          continue;
        }
...
because baseaddr != 0, which seems incorrect given that when loading the
shared library individually in gdb (and consequently baseaddr == 0), we do see
the complaint.

Consequently, we run into this case in dwarf2_get_pc_bounds:
...
  if (low == 0 && !per_objfile->per_bfd->has_section_at_zero)
    return PC_BOUNDS_INVALID;
...
which then results in this code in process_psymtab_comp_unit_reader being
called with cu_bounds_kind == PC_BOUNDS_INVALID, which sets the set_addrmap
argument to 1:
...
      scan_partial_symbols (first_die, &lowpc, &highpc,
                            cu_bounds_kind <= PC_BOUNDS_INVALID, cu);
...
and consequently, the CU addrmap gets build using address info from the
functions.

During that process, addrmap_set_empty is called with a range that includes
0x9c218 and 0x9c21d:
...
(gdb) p /x start
$7 = 0x9989c
(gdb) p /x end_inclusive
$8 = 0xb200d
...
but it's called for a function at DIE 0x54153 with DW_AT_ranges at 0x40ae:
...
    000040ae 00000000000b1ee0 00000000000b200e
    000040b9 000000000009989c 00000000000998c4
    000040c3 <End of list>
...
and neither range includes 0x9c218 and 0x9c21d.

This is caused by this code in partial_die_info::read:
...
            if (dwarf2_ranges_read (ranges_offset, &lowpc, &highpc, cu,
                                    nullptr, tag))
             has_pc_info = 1;
...
which pretends that the function is located at addresses 0x9989c..0xb200d,
which is indeed not the case.

This patch fixes the first problem encountered: fix the "start address of
zero" complaint warning by removing the baseaddr part from the condition.
Same for dwarf2_ranges_process.

The effect is that:
- the complaint is triggered, and
- the warning / Internal error is no longer triggered.

This does not fix the observed problem in partial_die_info::read, which is
filed as PR28200.

Tested on x86_64-linux.

Co-Authored-By: Simon Marchi <[email protected]>

gdb/ChangeLog:

2021-07-29  Simon Marchi  <[email protected]>
	    Tom de Vries  <[email protected]>

	PR symtab/28004
	* gdb/dwarf2/read.c (dwarf2_rnglists_process, dwarf2_ranges_process):
	Fix zero address complaint.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range-shlib.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.c: New test.
	* gdb/testsuite/gdb.dwarf2/dw2-zero-range.exp: New file.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
While working on the testsuite, I ended up noticing that GDB fails to
produce a full backtrace from a thread waiting in pthread_join.  When
selecting the waiting thread and using the 'bt' command, the following
result can be observed:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	riscvarchive#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	Backtrace stopped: frame did not save the PC

On my platform, I do not have debug symbols for glibc, so I need to rely
on prologue analysis in order to unwind stack.

Here is what the function prologue looks like:

	(gdb) disassemble __pthread_clockjoin_ex
	Dump of assembler code for function __pthread_clockjoin_ex:
	   0x0000003ff7fc42de <+0>:     addi    sp,sp,-144
	   0x0000003ff7fc42e0 <+2>:     sd      s5,88(sp)
	   0x0000003ff7fc42e2 <+4>:     auipc   s5,0xd
	   0x0000003ff7fc42e6 <+8>:     ld      s5,-2(s5) # 0x3ff7fd12e0
	   0x0000003ff7fc42ea <+12>:    ld      a5,0(s5)
	   0x0000003ff7fc42ee <+16>:    sd      ra,136(sp)
	   0x0000003ff7fc42f0 <+18>:    sd      s0,128(sp)
	   0x0000003ff7fc42f2 <+20>:    sd      s1,120(sp)
	   0x0000003ff7fc42f4 <+22>:    sd      s2,112(sp)
	   0x0000003ff7fc42f6 <+24>:    sd      s3,104(sp)
	   0x0000003ff7fc42f8 <+26>:    sd      s4,96(sp)
	   0x0000003ff7fc42fa <+28>:    sd      s6,80(sp)
	   0x0000003ff7fc42fc <+30>:    sd      s7,72(sp)
	   0x0000003ff7fc42fe <+32>:    sd      s8,64(sp)
	   0x0000003ff7fc4300 <+34>:    sd      s9,56(sp)
	   0x0000003ff7fc4302 <+36>:    sd      a5,40(sp)

As far as prologue analysis is concerned, the most interesting part is
done at address 0x0000003ff7fc42ee (<+16>): 'sd ra,136(sp)'. This stores
the RA (return address) register on the stack, which is the information
we are looking for in order to identify the caller.

In the current implementation of the prologue scanner, GDB stops when
hitting 0x0000003ff7fc42e6 (<+8>) because it does not know what to do
with the 'ld' instruction.  GDB thinks it reached the end of the
prologue but have not yet reached the important part, which explain
GDB's inability to unwind past this point.

The section of the prologue starting at <+4> until <+12> is used to load
the stack canary[1], which will then be placed on the stack at <+36> at
the end of the prologue.

In order to have the prologue properly handled, this commit proposes to
add support for the ld instruction in the RISC-V prologue scanner.
I guess riscv32 would use lw in such situation so this patch also adds
support for this instruction.

With this patch applied, gdb is now able to unwind past pthread_join:

	(gdb) bt
	#0  0x0000003ff7fccd20 in __futex_abstimed_wait_common64 () from /lib/riscv64-linux-gnu/libpthread.so.0
	riscvarchive#1  0x0000003ff7fc43da in __pthread_clockjoin_ex () from /lib/riscv64-linux-gnu/libpthread.so.0
	riscvarchive#2  0x0000002aaaaaa88e in bar() ()
	riscvarchive#3  0x0000002aaaaaa8c4 in foo() ()
	riscvarchive#4  0x0000002aaaaaa8da in main ()

I have had a look to see if I could reproduce this easily, but in my
simple testcases using '-fstack-protector-all', the canary is loaded
after the RA register is saved.  I do not have a reliable way of
generating a prologue similar to the problematic one so I forged one
instead.

The testsuite have been run on riscv64 ubuntu 21.01 with no regression
observed.

[1] https://en.wikipedia.org/wiki/Buffer_overflow_protection#Canaries
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
The original reproducer for PR28030 required use of a specific
compiler version - gcc-c++-11.1.1-3.fc34 is mentioned in the PR,
though it seems probable that other gcc versions might also be able to
reproduce the bug as well.  This commit introduces a test case which,
using the DWARF assembler, provides a reproducer which is independent
of the compiler version.  (Well, it'll work with whatever compilers
the DWARF assembler works with.)

To the best of my knowledge, it's also the first test case which uses
the DWARF assembler to provide debug info for a shared object.  That
being the case, I provided more than the usual commentary which should
allow this case to be used as a template when a combo shared
library / DWARF assembler test case is required in the future.

I provide some details regarding the bug in a comment near the
beginning of locexpr-dml.exp.

This problem was difficult to reproduce; I found myself constantly
referring to the backtrace while trying to figure out what (else) I
might be missing while trying to create a reproducer.  Below is a
partial backtrace which I include for posterity.

 #0  internal_error (
    file=0xc50110 "/ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c", line=5575,
    fmt=0xc520c0 "Unexpected type field location kind: %d")
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdbsupport/errors.cc:51
 riscvarchive#1  0x00000000006ef0c5 in copy_type_recursive (objfile=0x1635930,
    type=0x274c260, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5575
 riscvarchive#2  0x00000000006ef382 in copy_type_recursive (objfile=0x1635930,
    type=0x274ca10, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/gdbtypes.c:5602
 riscvarchive#3  0x0000000000a7409a in preserve_one_value (value=0x24269f0,
    objfile=0x1635930, copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2529
 riscvarchive#4  0x000000000072012a in gdbscm_preserve_values (
    extlang=0xc55720 <extension_language_guile>, objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/guile/scm-value.c:94
 riscvarchive#5  0x00000000006a3f82 in preserve_ext_lang_values (objfile=0x1635930,
    copied_types=0x30bb290)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/extension.c:568
 riscvarchive#6  0x0000000000a7428d in preserve_values (objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/value.c:2579
 riscvarchive#7  0x000000000082d514 in objfile::~objfile (this=0x1635930,
    __in_chrg=<optimized out>)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:549
 riscvarchive#8  0x0000000000831cc8 in std::_Sp_counted_ptr<objfile*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x1654580)
    at /usr/include/c++/11/bits/shared_ptr_base.h:348
 riscvarchive#9  0x00000000004e6617 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1654580) at /usr/include/c++/11/bits/shared_ptr_base.h:168
 riscvarchive#10 0x00000000004e1d2f in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x190bb88, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:705
 riscvarchive#11 0x000000000082feee in std::__shared_ptr<objfile, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1154
 riscvarchive#12 0x000000000082ff0a in std::shared_ptr<objfile>::~shared_ptr (
    this=0x190bb80, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr.h:122
 riscvarchive#13 0x000000000085ed7e in __gnu_cxx::new_allocator<std::_List_node<std::shared_ptr<objfile> > >::destroy<std::shared_ptr<objfile> > (this=0x114bc00,
    __p=0x190bb80) at /usr/include/c++/11/ext/new_allocator.h:168
 riscvarchive#14 0x000000000085e88d in std::allocator_traits<std::allocator<std::_List_node<std::shared_ptr<objfile> > > >::destroy<std::shared_ptr<objfile> > (__a=...,
    __p=0x190bb80) at /usr/include/c++/11/bits/alloc_traits.h:531
 riscvarchive#15 0x000000000085e50c in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::_M_erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/stl_list.h:1925
 riscvarchive#16 0x000000000085df0e in std::__cxx11::list<std::shared_ptr<objfile>, std::allocator<std::shared_ptr<objfile> > >::erase (this=0x114bc00, __position=
  std::shared_ptr<objfile> (expired, weak count 1) = {get() = 0x1635930})
    at /usr/include/c++/11/bits/list.tcc:158
 riscvarchive#17 0x000000000085c748 in program_space::remove_objfile (this=0x114bbc0,
    objfile=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/progspace.c:210
 riscvarchive#18 0x000000000082d3ae in objfile::unlink (this=0x1635930)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:487
 riscvarchive#19 0x000000000082e68c in objfile_purge_solibs ()
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/objfiles.c:875
 riscvarchive#20 0x000000000092dd37 in no_shared_libraries (ignored=0x0, from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/solib.c:1236
 riscvarchive#21 0x00000000009a37fe in target_pre_inferior (from_tty=1)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/target.c:2496
 riscvarchive#22 0x00000000007454d6 in run_command_1 (args=0x0, from_tty=1,
    run_how=RUN_NORMAL)
    at /ironwood1/sourceware-git/f34-pr28030/bld/../../worktree-pr28030/gdb/infcmd.c:437

I'll note a few points regarding this backtrace:

Frame riscvarchive#1 is where the internal error occurs.  It's caused by an
unhandled case for FIELD_LOC_KIND_DWARF_BLOCK.  The fix for this bug
adds support for this case.

Frame riscvarchive#22 - it's a partial backtrace - shows that GDB is attempting to
(re)run the program.  You can see the exact command sequence that was
used for reproducing this problem in the PR (at
https://sourceware.org/bugzilla/show_bug.cgi?id=28030), but in a
nutshell, after starting the program and advancing to the appropriate
source line, GDB was asked to step into libstdc++; a "finish" command
was issued, returning a value.  The fact that a value was returned is
very important.  GDB was then used to step back into libstdc++.  A
breakpoint was set on a source line in the library after which a "run"
command was issued.

Frame riscvarchive#19 shows a call to objfile_purge_solibs.  It's aptly named.

Frame riscvarchive#7 is a call to the destructor for one of the objfile solibs; it
turned out to be the one for libstdc++.

Frames riscvarchive#6 thru riscvarchive#3 show various value preservation frames.  If you look
at preserve_values() in gdb/value.c, the value history is preserved
first, followed by internal variables, followed by values for the
extension languages (python and guile).
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
…es.exp

When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#4  0x0000000000000001 in ?? ()^M
 riscvarchive#5  0x00007fffffffdaac in ?? ()^M
 riscvarchive#6  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-probes.exp: ensure using probes
...

The test-case intends to emit an UNTESTED in this case, but fails to do so
because it tries to do it in a regexp clause in a gdb_test_multiple, which
doesn't trigger.  Instead, a default clause triggers which produces the FAIL.

Also the use of UNTESTED is not appropriate, and we should use UNSUPPORTED
instead.

Fix this by silencing the FAIL, and emitting an UNSUPPORTED after the
gdb_test_multiple:
...
 if { ! $using_probes } {
+    unsupported "probes not present on this system"
     return -1
 }
...

Tested on x86_64-linux.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
When running test-case gdb.base/break-probes.exp on ubuntu 18.04.5, we have:
...
 (gdb) run^M
 Starting program: break-probes^M
 Stopped due to shared library event (no libraries added or removed)^M
 (gdb) bt^M
 #0  0x00007ffff7dd6e12 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#1  0x00007ffff7dedf50 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#2  0x00007ffff7dd5128 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#3  0x00007ffff7dd4098 in ?? () from /lib64/ld-linux-x86-64.so.2^M
 riscvarchive#4  0x0000000000000001 in ?? ()^M
 riscvarchive#5  0x00007fffffffdaac in ?? ()^M
 riscvarchive#6  0x0000000000000000 in ?? ()^M
 (gdb) UNSUPPORTED: gdb.base/break-probes.exp: probes not present on this system
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main, which is used as proof that we're using probes.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

Fix this by instead printing the pc and grepping for the value in the
info probes output:
...
(gdb) p /x $pc^M
$1 = 0x7ffff7dd6e12^M
(gdb) info probes^M
Type Provider Name           Where              Object                      ^M
  ...
stap rtld     init_start     0x00007ffff7dd6e12 /lib64/ld-linux-x86-64.so.2 ^M
  ...
(gdb)
...

Tested on x86_64-linux.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Sep 27, 2021
When running test-case gdb.base/break-interp.exp on ubuntu 18.04.5, we have:
...
 (gdb) bt^M
 #0  0x00007eff7ad5ae12 in ?? () from break-interp-LDprelinkNOdebugNO^M
 riscvarchive#1  0x00007eff7ad71f50 in ?? () from break-interp-LDprelinkNOdebugNO^M
 riscvarchive#2  0x00007eff7ad59128 in ?? () from break-interp-LDprelinkNOdebugNO^M
 riscvarchive#3  0x00007eff7ad58098 in ?? () from break-interp-LDprelinkNOdebugNO^M
 riscvarchive#4  0x0000000000000002 in ?? ()^M
 riscvarchive#5  0x00007fff505d7a32 in ?? ()^M
 riscvarchive#6  0x00007fff505d7a94 in ?? ()^M
 riscvarchive#7  0x0000000000000000 in ?? ()^M
 (gdb) FAIL: gdb.base/break-interp.exp: ldprelink=NO: ldsepdebug=NO: \
         first backtrace: dl bt
...

Using the backtrace, the test-case tries to establish that we're stopped in
dl_main.

However, the backtrace only shows an address, because:
- the dynamic linker contains no minimal symbols and no debug info, and
- gdb is build without --with-separate-debug-dir so it can't find the
  corresponding .debug file, which does contain the mimimal symbols and
  debug info.

As in "[gdb/testsuite] Improve probe detection in gdb.base/break-probes.exp",
fix this by doing info probes and grepping for the address.

Tested on x86_64-linux.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue Oct 25, 2021
The gdb.multi/multi-term-settings.exp testcase sometimes fails like so:

 Running /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.multi/multi-term-settings.exp ...
 FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

It's easier to reproduce if you stress the machine at the same time, like e.g.:

  $ stress -c 24

Looking at gdb.log, we see:

 (gdb) attach 60422
 Attaching to program: build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings, process 60422
 [New Thread 60422.60422]
 Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
 Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.31.so...
 Reading symbols from /lib64/ld-linux-x86-64.so.2...
 (No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
 0x00007f2fc2485334 in __GI___clock_nanosleep (clock_id=<optimized out>, clock_id@entry <mailto:clock_id@entry>=0, flags=flags@entry <mailto:flags@entry>=0, req=req@entry <mailto:req@entry>=0x7ffe23126940, rem=rem@entry <mailto:rem@entry>=0x0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
 78	../sysdeps/unix/sysv/linux/clock_nanosleep.c: No such file or directory.
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: inf2: attach
 set schedule-multiple on
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: set schedule-multiple on
 info inferiors
   Num  Description       Connection                         Executable
   1    process 60404     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 * 2    process 60422     1 (extended-remote localhost:2349) build/gdb/testsuite/outputs/gdb.multi/multi-term-settings/multi-term-settings
 (gdb) PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: info inferiors
 pid=60422, count=46
 pid=60422, count=47
 pid=60422, count=48
 pid=60422, count=49
 pid=60422, count=50
 pid=60422, count=51
 pid=60422, count=52
 pid=60422, count=53
 pid=60422, count=54
 pid=60422, count=55
 pid=60422, count=56
 pid=60422, count=57
 pid=60422, count=58
 pid=60422, count=59
 pid=60422, count=60
 pid=60422, count=61
 pid=60422, count=62
 pid=60422, count=63
 pid=60422, count=64
 pid=60422, count=65
 pid=60422, count=66
 pid=60422, count=67
 pid=60422, count=68
 pid=60422, count=69
 pid=60404, count=54
 pid=60404, count=55
 pid=60404, count=56
 pid=60404, count=57
 pid=60404, count=58
 PASS: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: continue
 Quit
 (gdb) FAIL: gdb.multi/multi-term-settings.exp: inf1_how=attach: inf2_how=attach: stop with control-c (SIGINT)

If you look at the testcase's sources, you'll see that the intention
is to resumes the program with "continue", wait to see a few of those
"pid=..., count=..." lines, and then interrupt the program with
Ctrl-C.  But somehow, that resulted in GDB printing "Quit", instead of
the Ctrl-C stopping the program with SIGINT.

Here's what is happening:

 riscvarchive#1 - those "pid=..., count=..." lines we see above weren't actually
      output by the inferior after it has been continued (see riscvarchive#1).
      Note that "inf1_how" and "inf2_how" are "attach".  What happened
      is that those "pid=..., count=..." lines were output by the
      inferiors _before_ they were attached to.  We see them at that
      point instead of earlier, because that's where the testcase
      reads from the inferiors' spawn_ids.

 riscvarchive#2 - The testcase mistakenly thinks those "pid=..., count=..." lines
      happened after the continue was processed by GDB, meaning it has
      waited enough, and so sends the Ctrl-C.  GDB hasn't yet passed
      the terminal to the inferior, so the Ctrl-C results in that
      Quit.

The fix here is twofold:

 riscvarchive#1 - flush inferior output right after attaching

 riscvarchive#2 - consume the "Continuing" printed by "continue", indicating the
      inferior has the terminal.  This is the same as done throughout
      the testsuite to handle this exact problem of sending Ctrl-C too
      soon.

gdb/testsuite/ChangeLog:
yyyy-mm-dd  Pedro Alves  <[email protected] <mailto:[email protected]>>

	* gdb.multi/multi-term-settings.exp (create_inferior): Flush
	inferior output.
	(coretest): Use $gdb_test_name.  After issuing "continue", wait
	for "Continuing".

Change-Id: Iba7671dfe1eee6b98d29cfdb05a1b9aa2f9defb9
kito-cheng pushed a commit that referenced this issue Nov 23, 2021
On openSUSE Tumbleweed with glibc-debuginfo installed I get:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread (arg=<optimized out>) \
                          at pthread_create.c:435^M
 #3  0x00007ffff7ddb640 in clone3 () \
                          at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81^M
 (gdb) PASS: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...
while without debuginfo installed I get instead:
...
 (gdb) PASS: gdb.threads/linux-dp.exp: continue to breakpoint: thread 5's print
 where^M
 #0  print_philosopher (n=3, left=33 '!', right=33 '!') at linux-dp.c:105^M
 #1  0x0000000000401628 in philosopher (data=0x40537c) at linux-dp.c:148^M
 #2  0x00007ffff7d56b37 in start_thread () from /lib64/libc.so.6^M
 #3  0x00007ffff7ddb640 in clone3 () from /lib64/libc.so.6^M
 (gdb) FAIL: gdb.threads/linux-dp.exp: first thread-specific breakpoint hit
...

The problem is that the regexp used:
...
  "\(from .*libpthread\|at pthread_create\|in pthread_create\)"
...
expects the 'from' part to match libpthread, but in glibc 2.34 libpthread has
been merged into libc.

Fix this by updating the regexp.

Tested on x86_64-linux.
kito-cheng pushed a commit that referenced this issue Nov 23, 2021
This commit fixes Bug 28308, titled "Strange interactions with
dprintf and break/commands":

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28308

Since creating that bug report, I've found a somewhat simpler way of
reproducing the problem.  I've encapsulated it into the GDB test case
which I've created along with this bug fix.  The name of the new test
is gdb.base/dprintf-execution-x-script.exp, I'll demonstrate the
problem using this test case, though for brevity, I've placed all
relevant files in the same directory and have renamed the files to all
start with 'dp-bug' instead of 'dprintf-execution-x-script'.

The script file, named dp-bug.gdb, consists of the following commands:

dprintf increment, "dprintf in increment(), vi=%d\n", vi
break inc_vi
commands
  continue
end
run

Note that the final command in this script is 'run'.  When 'run' is
instead issued interactively, the  bug does not occur.  So, let's look
at the interactive case first in order to see the correct/expected
output:

$ gdb -q -x dp-bug.gdb dp-bug
... eliding buggy output which I'll discuss later ...
(gdb) run
Starting program: /mesquite2/sourceware-git/f34-master/bld/gdb/tmp/dp-bug
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=1
dprintf in increment(), vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2
dprintf in increment(), vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539210) exited normally]

In this run, in which 'run' was issued from the gdb prompt (instead
of at the end of the script), there are three dprintf messages along
with three 'Breakpoint 2' messages.  This is the correct output.

Now let's look at the output that I snipped above; this is the output
when 'run' is issued from the script loaded via GDB's -x switch:

$ gdb -q -x dp-bug.gdb dp-bug
Reading symbols from dp-bug...
Dprintf 1 at 0x40116e: file dprintf-execution-x-script.c, line 38.
Breakpoint 2 at 0x40113a: file dprintf-execution-x-script.c, line 26.
vi=0
dprintf in increment(), vi=0

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	dprintf-execution-x-script.c: No such file or directory.
vi=1

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=2

Breakpoint 2, inc_vi () at dprintf-execution-x-script.c:26
26	in dprintf-execution-x-script.c
vi=3
[Inferior 1 (process 1539175) exited normally]

In the output shown above, only the first dprintf message is printed.
The 2nd and 3rd dprintf messages are missing!  However, all three
'Breakpoint 2...' messages are still printed.

Why does this happen?

bpstat_do_actions_1() in gdb/breakpoint.c contains the following
comment and code near the start of the function:

  /* Avoid endless recursion if a `source' command is contained
     in bs->commands.  */
  if (executing_breakpoint_commands)
    return 0;

  scoped_restore save_executing
    = make_scoped_restore (&executing_breakpoint_commands, 1);

Also, as described by this comment prior to the 'async' field
in 'struct ui' in top.h, the main UI starts off in sync mode
when processing command line arguments:

  /* True if the UI is in async mode, false if in sync mode.  If in
     sync mode, a synchronous execution command (e.g, "next") does not
     return until the command is finished.  If in async mode, then
     running a synchronous command returns right after resuming the
     target.  Waiting for the command's completion is later done on
     the top event loop.  For the main UI, this starts out disabled,
     until all the explicit command line arguments (e.g., `gdb -ex
     "start" -ex "next"') are processed.  */

This combination of things, the state of the static global
'executing_breakpoint_commands' plus the state of the async
field in the main UI causes this behavior.

This is a backtrace after hitting the dprintf breakpoint for
the second time when doing 'run' from the script file, i.e.
non-interactively:

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffc2b8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x1538090)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x137f450, ws=0x7fffffffc718,
     stop_chain=0x1538090) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:6172
 #4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffc6f0)
     at gdb/infrun.c:5662
 #5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 #6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 #7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 #8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 #9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 #10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 #11 0x00000000009d19b6 in wait_sync_command_done ()
     at gdb/top.c:528
 #12 0x00000000009d1a3f in maybe_wait_sync_command_done (was_sync=0)
     at gdb/top.c:545
 #13 0x00000000009d2033 in execute_command (p=0x7fffffffcb18 "", from_tty=0)
     at gdb/top.c:676
 #14 0x0000000000560d5b in execute_control_command_1 (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:547
 #15 0x000000000056134a in execute_control_command (cmd=0x13b9bb0, from_tty=0)
     at gdb/cli/cli-script.c:717
 #16 0x00000000004c3bbe in bpstat_do_actions_1 (bsp=0x137f530)
     at gdb/breakpoint.c:4469
 #17 0x00000000004c3d40 in bpstat_do_actions ()
     at gdb/breakpoint.c:4533
 #18 0x00000000006a473a in command_handler (command=0x1399ad0 "run")
     at gdb/event-top.c:624
 #19 0x00000000009d182e in read_command_file (stream=0x113e540)
     at gdb/top.c:443
 #20 0x0000000000563697 in script_from_file (stream=0x113e540, file=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-script.c:1642
 #21 0x00000000006abd63 in source_gdb_script (extlang=0xc44e80 <extension_language_gdb>, stream=0x113e540,
     file=0x13bb0b0 "dp-bug.gdb") at gdb/extension.c:188
 #22 0x0000000000544400 in source_script_from_stream (stream=0x113e540, file=0x7fffffffd91a "dp-bug.gdb",
     file_to_open=0x13bb0b0 "dp-bug.gdb")
     at gdb/cli/cli-cmds.c:692
 #23 0x0000000000544557 in source_script_with_search (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1, search_path=0)
     at gdb/cli/cli-cmds.c:750
 #24 0x00000000005445cf in source_script (file=0x7fffffffd91a "dp-bug.gdb", from_tty=1)
     at gdb/cli/cli-cmds.c:759
 #25 0x00000000007cf6d9 in catch_command_errors (command=0x5445aa <source_script(char const*, int)>,
     arg=0x7fffffffd91a "dp-bug.gdb", from_tty=1, do_bp_actions=false)
     at gdb/main.c:523
 #26 0x00000000007cf85d in execute_cmdargs (cmdarg_vec=0x7fffffffd1b0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND,
     ret=0x7fffffffd18c) at gdb/main.c:615
 #27 0x00000000007d0c8e in captured_main_1 (context=0x7fffffffd3f0)
     at gdb/main.c:1322
 #28 0x00000000007d0eba in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1343
 #29 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #30 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

There are two frames for bpstat_do_actions_1(), one at frame #16 and
the other at frame #0.  The one at frame #16 is processing the actions
for Breakpoint 2, which is a 'continue'.  The one at frame #0 is attempting
to process the dprintf breakpoint action.  However, at this point,
the value of 'executing_breakpoint_commands' is 1, forcing an early
return, i.e. prior to executing the command(s) associated with the dprintf
breakpoint.

For the sake of comparison, this is what the stack looks like when hitting
the dprintf breakpoint for the second time when issuing the 'run'
command from the GDB prompt.

Thread 1 "gdb" hit Breakpoint 3, bpstat_do_actions_1 (bsp=0x7fffffffccd8)
    at /ironwood1/sourceware-git/f34-master/bld/../../worktree-master/gdb/breakpoint.c:4431
4431	  if (executing_breakpoint_commands)

 #0  bpstat_do_actions_1 (bsp=0x7fffffffccd8)
     at gdb/breakpoint.c:4431
 #1  0x00000000004d8bc6 in dprintf_after_condition_true (bs=0x16b0290)
     at gdb/breakpoint.c:13048
 #2  0x00000000004c5caa in bpstat_stop_status (aspace=0x116dbc0, bp_addr=0x40116e, thread=0x13f0e60, ws=0x7fffffffd138,
     stop_chain=0x16b0290) at gdb/breakpoint.c:5498
 #3  0x0000000000768d98 in handle_signal_stop (ecs=0x7fffffffd110)
     at gdb/infrun.c:6172
 #4  0x00000000007678d3 in handle_inferior_event (ecs=0x7fffffffd110)
     at gdb/infrun.c:5662
 #5  0x0000000000763cd5 in fetch_inferior_event ()
     at gdb/infrun.c:4060
 #6  0x0000000000746d7d in inferior_event_handler (event_type=INF_REG_EVENT)
     at gdb/inf-loop.c:41
 #7  0x00000000007a702f in handle_target_event (error=0, client_data=0x0)
     at gdb/linux-nat.c:4207
 #8  0x0000000000b8cd6e in gdb_wait_for_event (block=block@entry=0)
     at gdbsupport/event-loop.cc:701
 #9  0x0000000000b8d032 in gdb_wait_for_event (block=0)
     at gdbsupport/event-loop.cc:597
 #10 gdb_do_one_event () at gdbsupport/event-loop.cc:212
 #11 0x00000000007cf512 in start_event_loop ()
     at gdb/main.c:421
 #12 0x00000000007cf631 in captured_command_loop ()
     at gdb/main.c:481
 #13 0x00000000007d0ebf in captured_main (data=0x7fffffffd3f0)
     at gdb/main.c:1353
 #14 0x00000000007d0f25 in gdb_main (args=0x7fffffffd3f0)
     at gdb/main.c:1368
 #15 0x00000000004186dd in main (argc=5, argv=0x7fffffffd508)
     at gdb/gdb.c:32

This relatively short backtrace is due to the current UI's async field
being set to 1.

Yet another thing to be aware of regarding this problem is the
difference in the way that commands associated to dprintf breakpoints
versus regular breakpoints are handled.  While they both use a command
list associated with the breakpoint, regular breakpoints will place
the commands to be run on the bpstat chain constructed in
bp_stop_status().  These commands are run later on.  For dprintf
breakpoints, commands are run via the 'after_condition_true' function
pointer directly from bpstat_stop_status().  (The 'commands' field in
the bpstat is cleared in dprintf_after_condition_true().  This
prevents the dprintf commands from being run again later on when other
commands on the bpstat chain are processed.)

Another thing that I noticed is that dprintf breakpoints are the only
type of breakpoint which use 'after_condition_true'.  This suggests
that one possible way of fixing this problem, that of making dprintf
breakpoints work more like regular breakpoints, probably won't work.
(I must admit, however, that my understanding of this code isn't
complete enough to say why.  I'll trust that whoever implemented it
had a good reason for doing it this way.)

The comment referenced earlier regarding 'executing_breakpoint_commands'
states that the reason for checking this variable is to avoid
potential endless recursion when a 'source' command appears in
bs->commands.  We know that a dprintf command is constrained to either
1) execution of a GDB printf command, 2) an inferior function call of
a printf-like function, or 3) execution of an agent-printf command.
Therefore, infinite recursion due to a 'source' command cannot happen
when executing commands upon hitting a dprintf breakpoint.

I chose to fix this problem by having dprintf_after_condition_true()
directly call execute_control_commands().  This means that it no
longer attempts to go through bpstat_do_actions_1() avoiding the
infinite recursion check for potential 'source' commands on the
command chain.  I think it simplifies this code a little bit too, a
definite bonus.

Summary:

	* breakpoint.c (dprintf_after_condition_true): Don't call
	bpstat_do_actions_1().  Call execute_control_commands()
	instead.
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
The immediate form of MSR has a 4-bit immediate field (in CRm).
However, many forms of MSR require a smaller immediate.  These cases
are identified by value in operand_general_constraint_met_p,
but they're now the common case rather than the exception.

This patch therefore adds the maximum value to the sys_reg
description and gets the range from there.  It also enforces
the minimum of 0, which avoids a situation in which:

  msr dit, #2

would give the expected:

  Error: immediate value out of range 0 to 1

whereas:

  msr dit, #-1

would give:

  Error: immediate value out of range 0 to 15

(from the later UIMM4 checking).

Also:

- we were reporting the first error above against the wrong operand
- TCO takes a single-bit immediate, but we previously allowed
  all 16 values.
  [https://developer.arm.com/documentation/ddi0596/2021-09/Base-Instructions/MSR--immediate---Move-immediate-value-to-Special-Register-?lang=en]

opcodes/
	* aarch64-opc.h (F_REG_MAX_VALUE, F_GET_REG_MAX_VALUE): New macros.
	* aarch64-opc.c (operand_general_constraint_met_p): Read the
	maximum MSR immediate value from aarch64_pstatefields.
	(aarch64_pstatefields): Add the maximum immediate value
	for each register.

gas/
	* testsuite/gas/aarch64/sysreg-4.s: Use an immediate value of 1
	rather than 8 for the TCO test.
	* testsuite/gas/aarch64/sysreg-4.d: Update accordingly.
	* testsuite/gas/aarch64/armv8_2-a-illegal.l: Fix operand number
	in MSR immediate error messages.
	* testsuite/gas/aarch64/diagnostic.l: Likewise.
	* testsuite/gas/aarch64/pan-illegal.l: Likewise.
	* testsuite/gas/aarch64/ssbs-illegal1.l: Likewise.
	* testsuite/gas/aarch64/illegal-sysreg-4b.s,
	* testsuite/gas/aarch64/illegal-sysreg-4b.d,
	* testsuite/gas/aarch64/illegal-sysreg-4b.l: New test.
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
On Fedora 35,

$ readelf -d /usr/bin/npc

caused readelf to run out of stack since load_separate_debug_info
returned the input main file as the separate debug info:

(gdb) bt
 #0  load_separate_debug_info (
    main_filename=main_filename@entry=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo",
    xlink=xlink@entry=0x4e5180 <debug_displays+4480>,
    parse_func=parse_func@entry=0x431550 <parse_gnu_debuglink>,
    check_func=check_func@entry=0x432ae0 <check_gnu_debuglink>,
    func_data=func_data@entry=0x7fffffffdb60, file=file@entry=0x51d430)
    at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11057
 #1  0x000000000043328d in check_for_and_load_links (file=0x51d430,
    filename=0x510f50 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo")
    at /export/gnu/import/git/sources/binutils-gdb/binutils/dwarf.c:11381
 #2  0x00000000004332ae in check_for_and_load_links (file=0x51b070,
    filename=0x518dd0 "/export/home/hjl/.cache/debuginfod_client/dcc33c51c49e7dafc178fdb5cf8bd8946f965295/debuginfo")

Return NULL if the separate debug info is the same as the input main
file to avoid infinite recursion.

	PR binutils/28679
	* dwarf.c (load_separate_debug_info): Don't return the input
	main file.
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
While working on another patch relating to remote targets, I wanted to
test with 'maint set target-async off' in place.  Unfortunately I ran
into some problems.  This commit is an attempt to fix one of the
issues I hit.

In my particular case I was actually running with:

  maint set target-async off
  maint set target-non-stop off

that is, we're telling GDB to force the targets to operate in
non-async mode, and in all-stop mode.  Here's my GDB session showing
the problem:

  (gdb) maintenance set target-async off
  (gdb) maintenance set target-non-stop off
  (gdb) target extended-remote :54321
  Remote debugging using :54321
  (gdb) attach 2365960
  Attaching to process 2365960
  No unwaited-for children left.
  (gdb)

Notice the 'No unwaited-for children left.' error, this is the
problem.  There's no reason why GDB should not be able to attach to
the process.

The problem is this:

  1. The user runs 'attach PID' and this sends GDB into attach_command
  in infcmd.c.  From here we call the ::attach method on the attach
  target, which will be the extended_remote_target.

  2. In extended_remote_target::attach, we attach to the remote target
  and get the first reply (which is a stop packet).  We put off
  processing the stop packet until the end of ::attach.  We setup the
  inferior and thread to represent the process we attached to, and
  download the target description.  Finally, we process the initial
  stop packet.

  If '!target_is_non_stop_p ()' and '!target_can_async_p ()', which is
  the case for us given the maintenance commands we used, we cache the
  stop packet within the remote_state::buf for later processing.

  3. Back in attach_command, if 'target_is_non_stop_p ()' then we
  request that the target stops.  This will either process any cached
  stop replies, or request that the target stops, and process the stop
  replies.  However, this code is not what we use due to non-stop mode
  being disabled.  So, we skip to the next step which is to call
  validate_exec_file.

  4. Calling validate_exec_file can cause packets to be sent to the
  remote target, and replies received, the first path I hit is the
  call to target_pid_to_exec_file, which calls
  remote_target::pid_to_exec_file, which can then try to read the
  executable from the remote.  Sending an receiving packets will make
  use of the remote_state::buf object.

  5. The attempt to attach continues, but the damage is already done...

So, the problem is that, in step #2 we cache a stop reply in the
remote_state::buf, and then in step #4 we reuse the remote_state::buf
object, discarding any cached stop reply.  As a result, the initial
stop, which is sent when GDB first attaches to the target, is lost.

This problem can clearly be seen, I feel, by looking at the
remote_state::cached_wait_status flag.  This flag tells GDB if there
is a wait status cached in remote_state::buf.  However, in
remote_target::putpkt_binary and remote_target::getpkt_or_notif_sane_1
this flag is just set back to 0, doing this immediately discards any
cached data.

I don't know if this scheme ever made sense,  looking at commit
2d717e4, where the cached_wait_status flag was added, it appears
that there was nothing between where the stop was cached, and where
the stop was consumed, so, I suspect, there never was a situation
where we ended up in putpkt_binary or getpkt_or_notif_sane_1 and
needed to clear to the flag, maybe the clearing was added "just in
case".  Whatever the history, I claim that this clearing this flag is
no longer a good idea.

So, my first step toward fixing this issue was to replace the two
instances of 'rs->cached_wait_status = 0;' in ::putpkt_binary and
::getpkt_or_notif_sane_1 with 'gdb_assert (rs->cached_wait_status ==
0);', this, at least would show me when GDB was doing something
dangerous, and indeed, this assert is now hit in my test case above.

I did play with using some kind of scoped restore to backup, and
restore the remote_state::buf object in all the places within remote.c
that I was hitting where the ::buf was being corrupted.  The first
problem with this is that, where the ::cached_wait_status flag is
reset is _not_ where ::buf is corrupted.  For the ::putpkt_binary
case, by the time we get to the method the buffer has already been
corrupted in many cases, so we end up needing to add the scoped
save/restore within the callers, which means we need the save/restore
in _lots_ of places.

Plus, using this save/restore model feels like the wrong solution.  I
don't think that it's obvious that the buffer might be holding cached
data, and I think it would be too easy for new corruptions of the
buffer to be introduced, which could easily go unnoticed for a long
time.

So, I really wanted a solution that didn't require us to cache data in
the ::buf object.

Luckily, I think we already have such a solution in place, the
remote_state::stop_reply_queue, it seems like this does exactly the
same task, just in a slightly different way.  With the
::stop_reply_queue, the stop packets are processed upon receipt and
the stop_reply object is added to the queue.  With the ::buf cache
solution, the unprocessed stop reply is cached in the ::buf, and
processed later.

So, finally, in this commit, I propose to remove the
remote_state::cached_wait_status flag and to stop using the ::buf to
cache stop replies.  Instead, stop replies will now always be stored
in the ::stop_reply_queue.

There are two places where we use the ::buf to hold a cached stop
reply, the first is in the ::attach method, and the second is in
remote_target::start_remote, however, the second of these cases is far
less problematic, as after caching the stop reply in ::buf we call the
global start_remote function, which does very little work before
calling normal_stop, which processes the cached stop reply.  However,
my plan is to switch both users over to using ::stop_reply_queue so
that the old (unsafe) ::cached_wait_status mechanism can be completely
removed.

The next problem is that the ::stop_reply_queue is currently only used
for async-mode, and so, in remote_target::push_stop_reply, where we
push stop_reply objects into the ::stop_reply_queue, we currently also
mark the async event token.  I've modified this so we only mark the
async event token if 'target_is_async_p ()' - note, _is_, not _can_
here. The ::push_stop_reply method is called in places where async
mode has been temporarily disabled, but, when async mode is switched
back on (see remote_target::async) we will mark the event token if
there are events in the queue.

Another change of interest is in remote_target::remote_interrupt_as.
Previously this code checked ::cached_wait_status, but didn't check
for events in the ::stop_reply_queue.  Now that ::cached_wait_status
has been removed we now check the queue length instead, which should
have the same result.

Finally, in remote_target::wait_as, I've tried to merge the processing
of the ::stop_reply_queue with how we used to handle the
::cached_wait_status flag.

Currently, when processing the ::stop_reply_queue we call
process_stop_reply and immediately return.  However, when handling
::cached_wait_status we run through the whole of ::wait_as, and return
at the end of the function.

If we consider a standard stop packet, the two differences I see are:

  1. Resetting of the remote_state::waiting_for_stop_reply, flag; this
  is not currently done when processing a stop from the
  ::stop_reply_queue.

  2. The final return value has the possibility of being adjusted at
  the end of ::wait_as, as well as there being calls to
  record_currthread, non of which are done if we process a stop from
  the ::stop_reply_queue.

After discussion on the mailing list:

  https://sourceware.org/pipermail/gdb-patches/2021-December/184535.html

it was suggested that, when an event is pushed into the
::stop_reply_queue, the ::waiting_for_stop_reply flag is never going
to be set.  As a result, we don't need to worry about the first
difference.  I have however, added a gdb_assert to validate the
assumption that the flag is never going to be set.  If in future the
situation ever changes, then we should find out pretty quickly.

As for the second difference, I have resolved this by having all stop
packets taken from the ::stop_reply_queue, pass through the return
value adjustment code at the end of ::wait_as.

An example of a test that reveals the benefits of this commit is:

  make check-gdb \
    RUNTESTFLAGS="--target_board=native-extended-gdbserver \
                  GDBFLAGS='-ex maint\ set\ target-async\ off \
                            -ex maint\ set\ target-non-stop\ off' \
                  gdb.base/attach.exp"

For testing I've been running test on x86-64/GNU Linux, and run with
target boards unix, native-gdbserver, and native-extended-gdbserver.
For each board I've run with the default GDBFLAGS, as well as with:

  GDBFLAGS='-ex maint\ set\ target-async\ off \
            -ex maint\ set\ target-non-stop\ off' \

Though running with the above GDBFLAGS is clearly a lot more unstable
both before and after my patch, I'm not seeing any consistent new
failures with my patch, except, with the native-extended-gdbserver
board, where I am seeing new failures, but only because more tests are
now running.  For that configuration alone I see the number of
unresolved go down by 49, the number of passes goes up by 446, and the
number of failures also increases by 144.  All of the failures are new
tests as far as I can tell.
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
Fedora Rawhide is now using gcc-12.0.  As part of updating to the
gcc-12.0 package set, Rawhide is also now using a version of libgcc_s
which lacks a .data section.  This causes gdb to fail in the following
fashion while debugging a program (such as gdb) which uses libgcc_s:

    (top-gdb) run
    Starting program: rawhide-master/bld/gdb/gdb
    ...
    objfiles.h:467: internal-error: sect_index_data not initialized
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    ...

I snipped the backtrace from the above output.  Instead, here's a
portion of a backtrace obtained using GDB's backtrace command.
(Obviously, in order to obtain it, I used a GDB which has been patched
with this commit.)

    #0  internal_error (
	file=0xc6a508 "gdb/objfiles.h", line=467,
	fmt=0xc6a4e8 "sect_index_data not initialized")
	at gdbsupport/errors.cc:51
    #1  0x00000000005f9651 in objfile::data_section_offset (this=0x4fa48f0)
	at gdb/objfiles.h:467
    #2  0x000000000097c5f8 in relocate_address (address=0x17244, objfile=0x4fa48f0)
	at gdb/stap-probe.c:1333
    #3  0x000000000097c630 in stap_probe::get_relocated_address (this=0xa1a17a0,
	objfile=0x4fa48f0)
	at gdb/stap-probe.c:1341
    #4  0x00000000004d7025 in create_exception_master_breakpoint_probe (
	objfile=0x4fa48f0)
	at gdb/breakpoint.c:3505
    #5  0x00000000004d7426 in create_exception_master_breakpoint ()
	at gdb/breakpoint.c:3575
    #6  0x00000000004efcc1 in breakpoint_re_set ()
	at gdb/breakpoint.c:13407
    #7  0x0000000000956998 in solib_add (pattern=0x0, from_tty=0, readsyms=1)
	at gdb/solib.c:1001
    #8  0x00000000009576a8 in handle_solib_event ()
	at gdb/solib.c:1269
    ...

The function 'relocate_address' in gdb/stap-probe.c attempts to do
its "relocation" by using objfile->data_section_offset().  That
method, data_section_offset() is defined as follows in objfiles.h:

  CORE_ADDR data_section_offset () const
  {
    return section_offsets[SECT_OFF_DATA (this)];
  }

The internal error occurs when the SECT_OFF_DATA macro finds that the
'sect_index_data' field is -1:

    #define SECT_OFF_DATA(objfile) \
	 ((objfile->sect_index_data == -1) \
	  ? (internal_error (__FILE__, __LINE__, \
			     _("sect_index_data not initialized")), -1)	\
	  : objfile->sect_index_data)

relocate_address() is obtaining the section offset in order to compute
a relocated address.  For some ABIs, such as the System V ABI, the
section offsets will all be the same.  So for those ABIs, it doesn't
matter which offset is used.  However, other ABIs, such as the FDPIC
ABI, will have different offsets for the various sections.  Thus, for
those ABIs, it is vital that this and other relocation code use the
correct offset.

In stap_probe::get_relocated_address, the address to which to add the
offset (thus forming the relocated address) is obtained via
this->get_address (); get_address is a getter for m_address in
probe.h.  It's documented/defined as follows (also in probe.h):

  /* The address where the probe is inserted, relative to
     SECT_OFF_TEXT.  */
  CORE_ADDR m_address;

(Thanks to Tom Tromey for this observation.)

So, based on this, the current use of data_section_offset /
SECT_OFF_DATA is wrong.  This relocation code should have been using
text_section_offset / SECT_OFF_TEXT all along.  That being the
case, I've adjusted the stap-probe.c relocation code accordingly.

Searching the sources turned up one other use of data_section_offset,
in gdb/dtrace-probe.c, so I've updated that code as well.  The same
reasoning presented above applies to this case too.

Summary:

	* gdb/dtrace-probe.c (dtrace_probe::get_relocated_address):
	Use method text_section_offset instead of data_section_offset.
	* gdb/stap-probe.c (relocate_address): Likewise.
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
g++ 11.1.0 has a bug where it will emit a negative
DW_AT_data_member_location in some cases:

    $ cat test.cpp
    #include <memory>

    int
    main()
    {
      std::unique_ptr<int> ptr;
    }
    $ g++ -g test.cpp
    $ llvm-dwarfdump -F a.out
    ...
    0x00000964:       DW_TAG_member
                        DW_AT_name [DW_FORM_strp]   ("_M_head_impl")
                        DW_AT_decl_file [DW_FORM_data1]     ("/usr/include/c++/11.1.0/tuple")
                        DW_AT_decl_line [DW_FORM_data1]     (125)
                        DW_AT_decl_column [DW_FORM_data1]   (0x27)
                        DW_AT_type [DW_FORM_ref4]   (0x0000067a "default_delete<int>")
                        DW_AT_data_member_location [DW_FORM_sdata]  (-1)
    ...

This leads to a GDB crash (when built with ASan, otherwise probably
garbage results), since it tries to read just before (to the left, in
ASan speak) of the value's buffer:

    ==888645==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000c52af at pc 0x7f711b239f4b bp 0x7fff356bd470 sp 0x7fff356bcc18
    READ of size 1 at 0x6020000c52af thread T0
        #0 0x7f711b239f4a in __interceptor_memcpy /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x555c4977efa1 in value_contents_copy_raw /home/simark/src/binutils-gdb/gdb/value.c:1347
        #2 0x555c497909cd in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3126
        #3 0x555c478f2eaa in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:333
        #4 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #5 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #6 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #10 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #11 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #12 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #13 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048
        #14 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151
        #15 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335
        #16 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #17 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #18 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #19 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #20 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #21 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048
        #22 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151
        #23 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335
        #24 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #25 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #26 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #27 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048
        #28 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151
        #29 0x555c4760f04c in c_value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:587
        #30 0x555c483ff954 in language_defn::value_print(value*, ui_file*, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:614
        #31 0x555c49759f61 in value_print(value*, ui_file*, value_print_options const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1189
        #32 0x555c48950f70 in print_formatted /home/simark/src/binutils-gdb/gdb/printcmd.c:337
        #33 0x555c48958eda in print_value(value*, value_print_options const&) /home/simark/src/binutils-gdb/gdb/printcmd.c:1258
        #34 0x555c48959891 in print_command_1 /home/simark/src/binutils-gdb/gdb/printcmd.c:1367
        #35 0x555c4895a3df in print_command /home/simark/src/binutils-gdb/gdb/printcmd.c:1458
        #36 0x555c4767f974 in do_simple_func /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:97
        #37 0x555c47692e25 in cmd_func(cmd_list_element*, char const*, int) /home/simark/src/binutils-gdb/gdb/cli/cli-decode.c:2475
        #38 0x555c4936107e in execute_command(char const*, int) /home/simark/src/binutils-gdb/gdb/top.c:670
        #39 0x555c485f1bff in catch_command_errors /home/simark/src/binutils-gdb/gdb/main.c:523
        #40 0x555c485f249c in execute_cmdargs /home/simark/src/binutils-gdb/gdb/main.c:618
        #41 0x555c485f6677 in captured_main_1 /home/simark/src/binutils-gdb/gdb/main.c:1317
        #42 0x555c485f6c83 in captured_main /home/simark/src/binutils-gdb/gdb/main.c:1338
        #43 0x555c485f6d65 in gdb_main(captured_main_args*) /home/simark/src/binutils-gdb/gdb/main.c:1363
        #44 0x555c46e41ba8 in main /home/simark/src/binutils-gdb/gdb/gdb.c:32
        #45 0x7f71198bcb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
        #46 0x555c46e4197d in _start (/home/simark/build/binutils-gdb-one-target/gdb/gdb+0x77f197d)

    0x6020000c52af is located 1 bytes to the left of 8-byte region [0x6020000c52b0,0x6020000c52b8)
    allocated by thread T0 here:
        #0 0x7f711b2b7459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x555c470acdc9 in xcalloc /home/simark/src/binutils-gdb/gdb/alloc.c:100
        #2 0x555c49b775cd in xzalloc(unsigned long) /home/simark/src/binutils-gdb/gdbsupport/common-utils.cc:29
        #3 0x555c4977bdeb in allocate_value_contents /home/simark/src/binutils-gdb/gdb/value.c:1029
        #4 0x555c4977be25 in allocate_value(type*) /home/simark/src/binutils-gdb/gdb/value.c:1040
        #5 0x555c4979030d in value_primitive_field(value*, long, int, type*) /home/simark/src/binutils-gdb/gdb/value.c:3092
        #6 0x555c478f6280 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:501
        #7 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #8 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #9 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #10 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #11 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #12 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #13 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #14 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #15 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048
        #16 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151
        #17 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335
        #18 0x555c478f63b2 in cp_print_value /home/simark/src/binutils-gdb/gdb/cp-valprint.c:513
        #19 0x555c478f02ca in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:161
        #20 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #21 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #22 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #23 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048
        #24 0x555c49759b17 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) /home/simark/src/binutils-gdb/gdb/valprint.c:1151
        #25 0x555c478f2fcb in cp_print_value_fields(value*, ui_file*, int, value_print_options const*, type**, int) /home/simark/src/binutils-gdb/gdb/cp-valprint.c:335
        #26 0x555c4760d45f in c_value_print_struct /home/simark/src/binutils-gdb/gdb/c-valprint.c:383
        #27 0x555c4760df4c in c_value_print_inner(value*, ui_file*, int, value_print_options const*) /home/simark/src/binutils-gdb/gdb/c-valprint.c:438
        #28 0x555c483ff9a7 in language_defn::value_print_inner(value*, ui_file*, int, value_print_options const*) const /home/simark/src/binutils-gdb/gdb/language.c:632
        #29 0x555c49758b68 in do_val_print /home/simark/src/binutils-gdb/gdb/valprint.c:1048

Since there are some binaries with this in the wild, I think it would be
useful for GDB to work around this.  I did the obvious simple thing, if
the DW_AT_data_member_location's value is -1, replace it with 0.  I
added a producer check to only apply this fixup for GCC 11.  The idea is
that if some other compiler ever uses a DW_AT_data_member_location value
of -1 by mistake, we don't know (before analyzing the bug at least) if
they did mean 0 or some other value.  So I wouldn't want to apply the
fixup in that case.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28063
Change-Id: Ieef3459b0b9bbce8bdad838ba83b4b64e7269d42
Nelson1225 pushed a commit that referenced this issue Feb 10, 2022
Starting with commit

  commit 1da5d0e
  Date:   Tue Jan 4 08:02:24 2022 -0700

    Change how Python architecture and language are handled

we see a failure in gdb.threads/killed-outside.exp:

  ...
  Executing on target: kill -9 16622    (timeout = 300)
  builtin_spawn -ignore SIGHUP kill -9 16622
  continue
  Continuing.
  Couldn't get registers: No such process.
  (gdb) [Thread 0x7ffff77c2700 (LWP 16626) exited]

  Program terminated with signal SIGKILL, Killed.
  The program no longer exists.
  FAIL: gdb.threads/killed-outside.exp: prompt after first continue (timeout)

This is not a regression but a failure due to a change in GDB's
output.  Prior to the aforementioned commit, GDB has been printing the
"Couldn't get registers: No such process." message twice.  The second
one came from

  (top-gdb) bt
  #0  amd64_linux_nat_target::fetch_registers (this=0x555557f31440 <the_amd64_linux_nat_target>, regcache=0x555558805ce0, regnum=16) at /gdb-up/gdb/amd64-linux-nat.c:225
  #1  0x000055555640ac5f in target_ops::fetch_registers (this=0x555557d636d0 <the_thread_db_target>, arg0=0x555558805ce0, arg1=16) at /gdb-up/gdb/target-delegates.c:502
  #2  0x000055555641a647 in target_fetch_registers (regcache=0x555558805ce0, regno=16) at /gdb-up/gdb/target.c:3945
  #3  0x0000555556278e68 in regcache::raw_update (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:587
  #4  0x0000555556278f14 in readable_regcache::raw_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:601
  #5  0x00005555562792aa in readable_regcache::cooked_read (this=0x555558805ce0, regnum=16, buf=0x555558881950 "") at /gdb-up/gdb/regcache.c:690
  #6  0x000055555627965e in readable_regcache::cooked_read_value (this=0x555558805ce0, regnum=16) at /gdb-up/gdb/regcache.c:748
  #7  0x0000555556352a37 in sentinel_frame_prev_register (this_frame=0x555558181090, this_prologue_cache=0x5555581810a8, regnum=16) at /gdb-up/gdb/sentinel-frame.c:53
  #8  0x0000555555fa4773 in frame_unwind_register_value (next_frame=0x555558181090, regnum=16) at /gdb-up/gdb/frame.c:1235
  #9  0x0000555555fa420d in frame_register_unwind (next_frame=0x555558181090, regnum=16, optimizedp=0x7fffffffd570, unavailablep=0x7fffffffd574, lvalp=0x7fffffffd57c, addrp=0x7fffffffd580,
      realnump=0x7fffffffd578, bufferp=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1143
  #10 0x0000555555fa455f in frame_unwind_register (next_frame=0x555558181090, regnum=16, buf=0x7fffffffd5b0 "") at /gdb-up/gdb/frame.c:1199
  #11 0x00005555560178e2 in i386_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/i386-tdep.c:1972
  #12 0x0000555555cd2b9d in gdbarch_unwind_pc (gdbarch=0x5555587c4a70, next_frame=0x555558181090) at /gdb-up/gdb/gdbarch.c:3007
  #13 0x0000555555fa3a5b in frame_unwind_pc (this_frame=0x555558181090) at /gdb-up/gdb/frame.c:948
  #14 0x0000555555fa7621 in get_frame_pc (frame=0x555558181160) at /gdb-up/gdb/frame.c:2572
  #15 0x0000555555fa7706 in get_frame_address_in_block (this_frame=0x555558181160) at /gdb-up/gdb/frame.c:2602
  #16 0x0000555555fa77d0 in get_frame_address_in_block_if_available (this_frame=0x555558181160, pc=0x7fffffffd708) at /gdb-up/gdb/frame.c:2665
  #17 0x0000555555fa5f8d in select_frame (fi=0x555558181160) at /gdb-up/gdb/frame.c:1890
  #18 0x0000555555fa5bab in lookup_selected_frame (a_frame_id=..., frame_level=-1) at /gdb-up/gdb/frame.c:1720
  #19 0x0000555555fa5e47 in get_selected_frame (message=0x0) at /gdb-up/gdb/frame.c:1810
  #20 0x0000555555cc9c6e in get_current_arch () at /gdb-up/gdb/arch-utils.c:848
  #21 0x000055555625b239 in gdbpy_before_prompt_hook (extlang=0x555557451f20 <extension_language_python>, current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ")
      at /gdb-up/gdb/python/python.c:1063
  #22 0x0000555555f7cfbb in ext_lang_before_prompt (current_gdb_prompt=0x555557f4d890 <top_prompt+16> "(gdb) ") at /gdb-up/gdb/extension.c:922
  #23 0x0000555555f7d442 in std::_Function_handler<void (char const*), void (*)(char const*)>::_M_invoke(std::_Any_data const&, char const*&&) (__functor=...,
      __args#0=@0x7fffffffd900: 0x555557f4d890 <top_prompt+16> "(gdb) ") at /usr/include/c++/7/bits/std_function.h:316
  #24 0x0000555555f752dd in std::function<void (char const*)>::operator()(char const*) const (this=0x55555817d838, __args#0=0x555557f4d890 <top_prompt+16> "(gdb) ")
      at /usr/include/c++/7/bits/std_function.h:706
  #25 0x0000555555f75100 in gdb::observers::observable<char const*>::notify (this=0x555557f49060 <gdb::observers::before_prompt>, args#0=0x555557f4d890 <top_prompt+16> "(gdb) ")
      at /gdb-up/gdb/../gdbsupport/observable.h:150
  #26 0x0000555555f736dc in top_level_prompt () at /gdb-up/gdb/event-top.c:444
  #27 0x0000555555f735ba in display_gdb_prompt (new_prompt=0x0) at /gdb-up/gdb/event-top.c:411
  #28 0x00005555564611a7 in tui_on_command_error () at /gdb-up/gdb/tui/tui-interp.c:205
  #29 0x0000555555c2173f in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
  #30 0x0000555555e10c20 in std::function<void ()>::operator()() const (this=0x5555580f9028) at /usr/include/c++/7/bits/std_function.h:706
  #31 0x0000555555e10973 in gdb::observers::observable<>::notify() const (this=0x555557f48d20 <gdb::observers::command_error>) at /gdb-up/gdb/../gdbsupport/observable.h:150
  #32 0x00005555560e9b3f in start_event_loop () at /gdb-up/gdb/main.c:438
  #33 0x00005555560e9bcc in captured_command_loop () at /gdb-up/gdb/main.c:481
  #34 0x00005555560eb616 in captured_main (data=0x7fffffffddd0) at /gdb-up/gdb/main.c:1348
  #35 0x00005555560eb67c in gdb_main (args=0x7fffffffddd0) at /gdb-up/gdb/main.c:1363
  #36 0x0000555555c1b6b3 in main (argc=12, argv=0x7fffffffded8) at /gdb-up/gdb/gdb.c:32

Commit 1da5d0e eliminated the call to 'get_current_arch'
in 'gdbpy_before_prompt_hook'.  Hence, the second instance of
"Couldn't get registers: No such process." does not appear anymore.

Fix the failure by updating the regular expression in the test.
pz9115 referenced this issue in plctlab/riscv-binutils-gdb Mar 7, 2022
…ync."

Commit 14b3360 ("do_target_wait_1: Clear
TARGET_WNOHANG if the target isn't async.") broke some multi-target
tests, such as gdb.multi/multi-target-info-inferiors.exp.  The symptom
is that execution just hangs at some point.  What happens is:

1. One remote inferior is started, and now sits stopped at a breakpoint.
   It is not "async" at this point (but it "can async").

2. We run a native inferior, the event loop gets woken up by the native
   target's fd.

3. In do_target_wait, we randomly choose an inferior to call target_wait
   on first, it happens to be the remote inferior.

4. Because the target is currently not "async", we clear
   TARGET_WNOHANG, resulting in synchronous wait.  We therefore block
   here:

  #0  0x00007fe9540dbb4d in select () from /usr/lib/libc.so.6
  #1  0x000055fc7e821da7 in gdb_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/posix-hdep.c:31
  #2  0x000055fc7ddef905 in interruptible_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/event-top.c:1134
  riscvarchive#3  0x000055fc7eda58e4 in ser_base_wait_for (scb=0x6250002e4100, timeout=1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:240
  riscvarchive#4  0x000055fc7eda66ba in do_ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:365
  riscvarchive#5  0x000055fc7eda6ff6 in generic_readchar (scb=0x6250002e4100, timeout=-1, do_readchar=0x55fc7eda663c <do_ser_base_readchar(serial*, int)>) at /home/simark/src/binutils-gdb/gdb/ser-base.c:444
  riscvarchive#6  0x000055fc7eda718a in ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:471
  riscvarchive#7  0x000055fc7edb1ecd in serial_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/serial.c:393
  riscvarchive#8  0x000055fc7ec48b8f in remote_target::readchar (this=0x617000038780, timeout=-1) at /home/simark/src/binutils-gdb/gdb/remote.c:9446
  riscvarchive#9  0x000055fc7ec4da82 in remote_target::getpkt_or_notif_sane_1 (this=0x617000038780, buf=0x6170000387a8, forever=1, expecting_notif=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:9928
  riscvarchive#10 0x000055fc7ec4f045 in remote_target::getpkt_or_notif_sane (this=0x617000038780, buf=0x6170000387a8, forever=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:10037
  riscvarchive#11 0x000055fc7ec354d4 in remote_target::wait_ns (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8147
  riscvarchive#12 0x000055fc7ec38aa1 in remote_target::wait (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8337
  riscvarchive#13 0x000055fc7f1409ce in target_wait (ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2612
  riscvarchive#14 0x000055fc7e19da98 in do_target_wait_1 (inf=0x617000038080, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3636
  riscvarchive#15 0x000055fc7e19e26b in operator() (__closure=0x7ffdb77c2f90, inf=0x617000038080) at /home/simark/src/binutils-gdb/gdb/infrun.c:3697
  riscvarchive#16 0x000055fc7e19f0c4 in do_target_wait (ecs=0x7ffdb77c33a0, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3716
  riscvarchive#17 0x000055fc7e1a31f7 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4061

Before the aforementioned commit, we would not have cleared
TARGET_WNOHANG, the remote target's wait would have returned nothing,
and we would have consumed the native target's event.

After applying this revert, the testsuite state looks as good as before
for me on Ubuntu 20.04 amd64.

Change-Id: Ic17a1642935cabcc16c25cb6899d52e12c2f5c3f
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
The current zombie leader detection code in linux-nat.c has a race --
if a multi-threaded inferior exits just before check_zombie_leaders
finds that the leader is now zombie via checking /proc/PID/status,
check_zombie_leaders deletes the leader, assuming we won't get an
event for that exit (which we won't in some scenarios, but not in this
one).  That might seem mostly harmless, but it has some downsides:

 - later when we continue pulling events out of the kernel, we will
   collect the exit event of the non-leader threads, and once we see
   the last lwp in our list exit, we return _that_ lwp's exit code as
   whole-process exit code to infrun, instead of the leader's exit
   code.

 - this can cause a hang in stop_all_threads in infrun.c.  Say there
   are 2 threads in the process.  stop_all_threads stops each of those
   threads, and then waits for two stop or exit events, one for each
   thread.  If the whole process exits, and check_zombie_leaders hits
   the false-positive case, linux-nat.c will only return one event to
   GDB (the whole-process exit returned when we see the last thread,
   the non-leader thread, exit), making stop_all_threads hang forever
   waiting for a second event that will never come.

However, in this false-positive scenario, where the whole process is
exiting, as opposed to just the leader (with pthread_exit(), for
example), we _will_ get an exit event shortly for the leader, after we
collect the exit event of all the other non-leader threads.  Or put
another way, we _always_ get an event for the leader after we see it
become zombie.

I tried a number of approaches to fix this:

riscvarchive#1 - My first thought to address the race was to make GDB always
report the whole-process exit status for the leader thread, not for
whatever is the last lwp in the list.  We _always_ get a final exit
(or exec) event for the leader, and when the race triggers, we're not
collecting it.

riscvarchive#2 - My second thought was to try to plug the race in the first place.

I thought of making GDB call waitpid/WNOHANG for all non-leader
threads immediately when the zombie leader is detected, assuming there
would be an exit event pending for each of them waiting to be
collected.  Turns out that that doesn't work -- you can see the leader
become zombie _before_ the kernel kills all other threads.  Waitpid in
that small time window returns 0, indicating no-event.  Thankfully we
hit that race window all the time, which avoided trading one race for
another.  Looking at the non-leader thread's status in /proc doesn't
help either, the threads are still in running state for a bit, for the
same reason.

riscvarchive#3 - My next attempt, which seemed promising, was to synchronously
stop and wait for the stop for each of the non-leader threads.  For
the scenario in question, this will collect all the exit statuses of
the non-leader threads.  Then, if we are left with only the zombie
leader in the lwp list, it means we either have a normal while-process
exit or an exec, in which case we should not delete the leader.  If
_only_ the leader exited, like in gdb.threads/leader-exit.exp, then
after pausing threads, we will still have at least one live non-leader
thread in the list, and so we delete the leader lwp.  I got this
working and polished, and it was only after staring at the kernel code
to convince myself that this would really work (and it would, for the
scenario I considered), that I realized I had failed to account for
one scenario -- if any non-leader thread is _already_ stopped when
some thread triggers a group exit, like e.g., if you have some threads
stopped and then resume just one thread with scheduler-locking or
non-stop, and that thread exits the process.  I also played with
PTRACE_EVENT_EXIT, see if it would help in any way to plug the race,
and I couldn't find a way that it would result in any practical
difference compared to looking at /proc/PID/status, with respect to
having a race.

So I concluded that there's no way to plug the race, we just have to
deal with it.  Which means, going back to approach riscvarchive#1.  That is the
approach taken by this patch.

Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
Bug 28980 shows that trying to value_copy an entirely optimized out
value causes an internal error.  The original bug report involves MI and
some Python pretty printer, and is quite difficult to reproduce, but
another easy way to reproduce (that is believed to be equivalent) was
proposed:

    $ ./gdb -q -nx --data-directory=data-directory -ex "py print(gdb.Value(gdb.Value(5).type.optimized_out()))"
    /home/smarchi/src/binutils-gdb/gdb/value.c:1731: internal-error: value_copy: Assertion `arg->contents != nullptr' failed.

This is caused by 5f8ab46 ("gdb: constify parameter of
value_copy").  It added an assertion that the contents buffer is
allocated if the value is not lazy:

  if (!value_lazy (val))
    {
      gdb_assert (arg->contents != nullptr);

This was based on the comment on value::contents, which suggest that
this is the case:

  /* Actual contents of the value.  Target byte-order.  NULL or not
     valid if lazy is nonzero.  */
  gdb::unique_xmalloc_ptr<gdb_byte> contents;

However, it turns out that it can also be nullptr also if the value is
entirely optimized out, for example on exit of
allocate_optimized_out_value.  That function creates a lazy value, marks
the entire value as optimized out, and then clears the lazy flag.  But
contents remains nullptr.

This wasn't a problem for value_copy before, because it was calling
value_contents_all_raw on the input value, which caused contents to be
allocated before doing the copy.  This means that the input value to
value_copy did not have its contents allocated on entry, but had it
allocated on exit.  The result value had it allocated on exit.  And that
we copied bytes for an entirely optimized out value (i.e. meaningless
bytes).

From here I see two choices:

 1. respect the documented invariant that contents is nullptr only and
    only if the value is lazy, which means making
    allocate_optimized_out_value allocate contents
 2. extend the cases where contents can be nullptr to also include
    values that are entirely optimized out (note that you could still
    have some entirely optimized out values that do have contents
    allocated, it depends on how they were created) and adjust
    value_copy accordingly

Choice riscvarchive#1 is safe, but less efficient: it's not very useful to allocate
a buffer for an entirely optimized out value.  It's even a bit less
efficient than what we had initially, because values coming out of
allocate_optimized_out_value would now always get their contents
allocated.

Choice riscvarchive#2 would be more efficient than what we had before: giving an
optimized out value without allocated contents to value_copy would
result in an optimized out value without allocated contents (and the
input value would still be without allocated contents on exit).  But
it's more risky, since it's difficult to ensure that all users of the
contents (through the various_contents* accessors) are all fine with
that new invariant.

In this patch, I opt for choice riscvarchive#2, since I think it is a better
direction than choice riscvarchive#1.  riscvarchive#1 would be a pessimization, and if we go
this way, I doubt that it will ever be revisited, it will just stay that
way forever.

Add a selftest to test this.  I initially started to write it as a
Python test (since the reproducer is in Python), but a selftest is more
straightforward.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28980
Change-Id: I6e2f5c0ea804fafa041fcc4345d47064b5900ed7
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
While working on a different patch, I triggered an assertion from the
initialize_current_architecture code, specifically from one of
the *_gdbarch_init functions in a *-tdep.c file.  This exposes a
couple of issues with GDB.

This is easy enough to reproduce by adding 'gdb_assert (false)' into a
suitable function.  For example, I added a line into i386_gdbarch_init
and can see the following issue.

I start GDB and immediately hit the assert, the output is as you'd
expect, except for the very last line:

  $ ./gdb/gdb --data-directory ./gdb/data-directory/
  ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  ----- Backtrace -----
  ... snip ...
  ---------------------
  ../../src.dev-1/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.
  Quit this debugging session? (y or n) ../../src.dev-1/gdb/ser-event.c:212:16: runtime error: member access within null pointer of type 'struct serial'

Something goes wrong when we try to query the user.  Note, I
configured GDB with --enable-ubsan, I suspect that without this the
above "error" would actually just be a crash.

The backtrace from ser-event.c:212 looks like this:

  (gdb) bt 10
  #0  serial_event_clear (event=0x675c020) at ../../src/gdb/ser-event.c:212
  riscvarchive#1  0x0000000000769456 in invoke_async_signal_handlers () at ../../src/gdb/async-event.c:211
  riscvarchive#2  0x000000000295049b in gdb_do_one_event () at ../../src/gdbsupport/event-loop.cc:194
  riscvarchive#3  0x0000000001f015f8 in gdb_readline_wrapper (
      prompt=0x67135c0 "../../src/gdb/i386-tdep.c:8455: internal-error: i386_gdbarch_init: Assertion `false' failed.\nA problem internal to GDB has been detected,\nfurther debugging may prove unreliable.\nQuit this debugg"...)
      at ../../src/gdb/top.c:1141
  riscvarchive#4  0x0000000002118b64 in defaulted_query(const char *, char, typedef __va_list_tag __va_list_tag *) (
      ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ", defchar=0 '\000', args=0x7fffffffa6e0)
      at ../../src/gdb/utils.c:934
  riscvarchive#5  0x0000000002118f72 in query (ctlstr=0x2e4eb68 "%s\nQuit this debugging session? ")
      at ../../src/gdb/utils.c:1026
  riscvarchive#6  0x00000000021170f6 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x6107bc0 <internal_error_problem>, file=0x2b976c8 "../../src/gdb/i386-tdep.c",
      line=8455, fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:417
  riscvarchive#7  0x00000000021175a0 in internal_verror (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455,
      fmt=0x2b96d7f "%s: Assertion `%s' failed.", ap=0x7fffffffa8e8) at ../../src/gdb/utils.c:485
  riscvarchive#8  0x00000000029503b3 in internal_error (file=0x2b976c8 "../../src/gdb/i386-tdep.c", line=8455,
      fmt=0x2b96d7f "%s: Assertion `%s' failed.") at ../../src/gdbsupport/errors.cc:55
  riscvarchive#9  0x000000000122d5b6 in i386_gdbarch_init (info=..., arches=0x0) at ../../src/gdb/i386-tdep.c:8455
  (More stack frames follow...)

It turns out that the problem is that the async event handler
mechanism has been invoked, but this has not yet been initialized.

If we look at gdb_init (in gdb/top.c) we can indeed see the call to
gdb_init_signals is after the call to initialize_current_architecture.

If I reorder the calls, moving gdb_init_signals earlier, then the
initial error is resolved, however, things are still broken.  I now
see the same "Quit this debugging session? (y or n)" prompt, but when
I provide an answer and press return GDB immediately crashes.

So what's going on now?  The next problem is that the call_readline
field within the current_ui structure is not initialized, and this
callback is invoked to process the reply I entered.

The problem is that call_readline is setup as a result of calling
set_top_level_interpreter, which is called from captured_main_1.
Unfortunately, set_top_level_interpreter is called after gdb_init is
called.

I wondered how to solve this problem for a while, however, I don't
know if there's an easy "just reorder some lines" solution here.
Looking through captured_main_1 there seems to be a bunch of
dependencies between printing various things, parsing config files,
and setting up the interpreter.  I'm sure there is a solution hiding
in there somewhere.... I'm just not sure I want to spend any longer
looking for it.

So.

I propose a simpler solution, more of a hack/work-around.  In utils.c
we already have a function filtered_printing_initialized, this is
checked in a few places within internal_vproblem.  In some of these
cases the call gates whether or not GDB will query the user.

My proposal is to add a new readline_initialized function, which
checks if the current_ui has had readline initialized yet.  If this is
not the case then we should not attempt to query the user.

After this change GDB prints the error message, the backtrace, and
then aborts (including dumping core).  This actually seems pretty sane
as, if GDB has not yet made it through the initialization then it
doesn't make much sense to allow the user to say "no, I don't want to
quit the debug session" (I think).
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
The variable right_lib_flags is not being set correctly to define RIGHT.
The value RIGHT is needed to force the address of the library functions
lib1_func3 and lib2_func4 to occur at different address in the wrong and
right libraries.

With RIGHT defined correctly, functions lib1_func3 and lib2_func4 occur
at different addresses the test runs correctly on Powerpc.

The test needs the lib2 addresses to be different in the right and
wrong cases.  That is the point of introducing function lib2_spacer
with the ifdef RIGHT compiler directive.

On Intel, the ARRAY_SIZE of 1 versus 8192 is sufficient to get the
dynamic linker to move the addresses of the library.  You can also get
the same effect on PowerPC but you must use a value much larger than
8192.

The key thing is that the test was not properly setting RIGHT to
defined to get the lib2_spacer function on Intel and Powerpc.

Without the patch, we have the Intel backtrace for the bad libraries:

backtrace
#0  break_here () at /home/ ... /gdb/testsuite/gdb.base/solib-search.c:30
riscvarchive#1  0x00007ffff7fae156 in ?? ()
riscvarchive#2  0x00007fffffffc150 in ?? ()
riscvarchive#3  0x00007ffff7fbb156 in ?? ()
riscvarchive#4  0x00007fffffffc160 in ?? ()
riscvarchive#5  0x00007ffff7fae146 in ?? ()
riscvarchive#6  0x00007fffffffc170 in ?? ()
riscvarchive#7  0x00007ffff7fbb146 in ?? ()
riscvarchive#8  0x00007fffffffc180 in ?? ()
riscvarchive#9  0x0000555555555156 in main () at /home/ ... /binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:23
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) (data collection)

The backtrace on Intel with the good libraries is:

backtrace
#0  break_here () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search.c:30
riscvarchive#1  0x00007ffff7fae156 in lib2_func4 () at /.../binutils-gdb/gdb/testsuite/gdb.base/solib-search-lib2.c:49
riscvarchive#2  0x00007ffff7fbb156 in lib1_func3 () at /.../gdb.base/solib-search-lib1.c:49
riscvarchive#3  0x00007ffff7fae146 in lib2_func2 () at /.../testsuite/gdb.base/solib-search-lib2.c:30
riscvarchive#4  0x00007ffff7fbb146 in lib1_func1 () at /.../gdb.base/solib-search-lib1.c:30
riscvarchive#5  0x0000555555555156 in main () at /...solib-search.c:23
(gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection)
PASS: gdb.base/solib-search.exp: backtrace (with right libs)

In one case the backtrace is correct and the other it
is wrong on Intel.  This is due to the fact that the ARRAY_SIZE caused
the dynamic linker to move the library function addresses around.  I
believe it has to do with the default size of the data and code
sections used by the dynamic linker.

So without the patch the backtrace on PowerPC looks like:

 backtrace
#0  break_here () at /.../solib-search.c:30
riscvarchive#1  0x00007ffff7f007f4 in lib2_func4 () at /.../solib-search-lib2.c:49
riscvarchive#2  0x00007ffff7f307f4 in lib1_func3 () at /.../solib-search-lib1.c:49
riscvarchive#3  0x00007ffff7f007ac in lib2_func2 () at /.../solib-search-lib2.c:30
riscvarchive#4  0x00007ffff7f307ac in lib1_func1 () at /.../solib-search-lib1.c:30
riscvarchive#5  0x000000001000074c in main () at /.../solib-search.c:23

for both the good and bad libraries.

The patch fixes defining RIGHT in solib-search-lib1.c and solib-search-
lib2.c.  Note, without the patch the lib1_spacer and lib2_spacer
functions do not show up in the object dump of the Intel or Powerpc
libraries as it should.  The patch fixes that by making sure RIGHT gets
defined.

Now with the patch the backtrace for the bad library on PowerPC looks
like:

backtrace
#0  break_here () at /.../solib-search.c:30
riscvarchive#1  0x00007ffff7f0083c in __glink_PLTresolve () from /.../solib-search-lib2.so
Backtrace stopped: frame did not save the PC

And the backtrace for the good libraries on PowerPC looks like:

backtrace
#0  break_here () at /.../solib-search.c:30
riscvarchive#1  0x00007ffff7f0083c in lib2_func4 () at /.../solib-search-lib2.c:49
riscvarchive#2  0x00007ffff7f3083c in lib1_func3 () at /.../solib-search-lib1.c:49
riscvarchive#3  0x00007ffff7f007cc in lib2_func2 () at /.../solib-search-lib2.c:30
riscvarchive#4  0x00007ffff7f307cc in lib1_func1 () at /.../solib-search-lib1.c:30
riscvarchive#5  0x000000001000074c in main () at /.../solib-search.c:23
(gdb) PASS: gdb.base/solib-search.exp: backtrace (with right libs) (data collection)
PASS: gdb.base/solib-search.exp: backtrace (with right libs)

The issue then is on Power where the ARRAY_SIZE of 1 versus 8192 is not
sufficient to cause the dymanic linker to allocate the libraries at
different addresses.  I don't claim to understand the specifics of how
the dynamic linker works and what the default size is for the data and
code sections are.  My guess is by default PowerPC allocates a larger
data size by default, which is large enough to hold array[8192].  The
default size of the data section allocated by the dynamic linker on
Intel is not large enough to hold array[8192] thus causing the code
section on Intel to have to move when the large array is defined.

Note on PowerPC, if you make ARRAY_SIZE big enough, then you will cause
the library addresses to occur at different addresses as the larger
data section forces the code section to a different address.  That was
actually my original fix for the program until I spoke with Doug Evans
who originally wrote the test.  Doug noticed that RIGHT was not getting
defined as he originally intended in the test.

With the patch to fix the definition of RIGHT, PowerPC has a bad and a
good backtrace because the address of lib1_func3 and lib2_func4 both
move because lib1_spacer and lib2_spacer are now defined
before lib1_func3 and lib2_func4.

Without the patch, the lib1_spacer and lib2_spacer function doesn't show
up in the binary for the correct or incorrect library on Intel or PowerPC.
With the patch, RIGHT gets defined as originally intended for the test on
both architectures and lib1_spacer and lib2_spacer function show up in the
binaries on both architectures changing the other function addresses as
intended thus causing the test work as intended on PowerPC.
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
… failing to attach

Running

  $ ../gdbserver/gdbserver --once --attach :1234 539436

with ASan while /proc/sys/kernel/yama/ptrace_scope is set to 1 (prevents
attaching) shows that we fail to free some platform-specific objects
tied to the process_info (process_info_private and arch_process_info):

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        riscvarchive#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100
        riscvarchive#2 0x562eaf251548 in xcnew<process_info_private> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122
        riscvarchive#3 0x562eaf22810c in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:426
        riscvarchive#4 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132
        riscvarchive#5 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308
        riscvarchive#6 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949
        riscvarchive#7 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084
        riscvarchive#8 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

    Indirect leak of 56 byte(s) in 1 object(s) allocated from:
        #0 0x7f6b558b3fb9 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
        riscvarchive#1 0x562eaf15d04a in xcalloc /home/simark/src/binutils-gdb/gdbserver/../gdb/alloc.c:100
        riscvarchive#2 0x562eaf2a0d79 in xcnew<arch_process_info> /home/simark/src/binutils-gdb/gdbserver/../gdbsupport/poison.h:122
        riscvarchive#3 0x562eaf295e2c in x86_target::low_new_process() /home/simark/src/binutils-gdb/gdbserver/linux-x86-low.cc:723
        riscvarchive#4 0x562eaf22819b in linux_process_target::add_linux_process_no_mem_file(int, int) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:428
        riscvarchive#5 0x562eaf22d33f in linux_process_target::attach(unsigned long) /home/simark/src/binutils-gdb/gdbserver/linux-low.cc:1132
        riscvarchive#6 0x562eaf1a7222 in attach_inferior /home/simark/src/binutils-gdb/gdbserver/server.cc:308
        riscvarchive#7 0x562eaf1c1016 in captured_main /home/simark/src/binutils-gdb/gdbserver/server.cc:3949
        riscvarchive#8 0x562eaf1c1d60 in main /home/simark/src/binutils-gdb/gdbserver/server.cc:4084
        riscvarchive#9 0x7f6b552f630f in __libc_start_call_main (/usr/lib/libc.so.6+0x2d30f)

Those objects are deleted by linux_process_target::mourn, but that is
not called if we fail to attach, we only call remove_process.  I
initially fixed this by making linux_process_target::attach call
linux_process_target::mourn on failure (before calling error).  But this
isn't done anywhere else (including in GDB) so it would just be
confusing to do things differently here.

Instead, add a linux_process_target::remove_linux_process helper method
(which calls remove_process), and call that instead of remove_process in
the Linux target.  Move the free-ing of the extra data from the mourn
method to that new method.

Change-Id: I277059a69d5f08087a7f3ef0b8f1792a1fcf7a85
Nelson1225 pushed a commit to Nelson1225/riscv-binutils-gdb that referenced this issue May 19, 2022
Luis noticed that the recent changes to gdbserver to make it track
process and threads independently regressed a few gdb.multi/*.exp
tests for aarch64-linux.

We started seeing the following internal error for
gdb.multi/multi-target-continue.exp for example:

 Starting program: binutils-gdb/gdb/testsuite/outputs/gdb.multi/multi-target-continue/multi-target-continue ^M
 Error in re-setting breakpoint 2: Remote connection closed^M
 ../../../repos/binutils-gdb/gdb/thread.c:85: internal-error: inferior_thread: Assertion `current_thread_ != nullptr' failed.^M
 A problem internal to GDB has been detected,^M
 further debugging may prove unreliable.

A backtrace looks like:

 #0  thread_regcache_data (thread=thread@entry=0x0) at ../../../repos/binutils-gdb/gdbserver/inferiors.cc:120
 riscvarchive#1  0x0000aaaaaaabf0e8 in get_thread_regcache (thread=0x0, fetch=fetch@entry=0) at ../../../repos/binutils-gdb/gdbserver/regcache.cc:31
 riscvarchive#2  0x0000aaaaaaad785c in is_64bit_tdesc () at ../../../repos/binutils-gdb/gdbserver/linux-aarch64-low.cc:194
 riscvarchive#3  0x0000aaaaaaad8a48 in aarch64_target::sw_breakpoint_from_kind (this=<optimized out>, kind=4, size=0xffffffffef04) at ../../../repos/binutils-gdb/gdbserver/linux-aarch64-low.cc:3226
 riscvarchive#4  0x0000aaaaaaabe220 in bp_size (bp=0xaaaaaab6f3d0) at ../../../repos/binutils-gdb/gdbserver/mem-break.cc:226
 riscvarchive#5  check_mem_read (mem_addr=187649984471104, buf=buf@entry=0xaaaaaab625d0 "\006", mem_len=mem_len@entry=56) at ../../../repos/binutils-gdb/gdbserver/mem-break.cc:1862
 riscvarchive#6  0x0000aaaaaaacc660 in read_inferior_memory (memaddr=<optimized out>, myaddr=0xaaaaaab625d0 "\006", len=56) at ../../../repos/binutils-gdb/gdbserver/target.cc:93
 riscvarchive#7  0x0000aaaaaaac3d9c in gdb_read_memory (len=56, myaddr=0xaaaaaab625d0 "\006", memaddr=187649984471104) at ../../../repos/binutils-gdb/gdbserver/server.cc:1071
 riscvarchive#8  gdb_read_memory (memaddr=187649984471104, myaddr=0xaaaaaab625d0 "\006", len=56) at ../../../repos/binutils-gdb/gdbserver/server.cc:1048
 riscvarchive#9  0x0000aaaaaaac82a4 in process_serial_event () at ../../../repos/binutils-gdb/gdbserver/server.cc:4307
 riscvarchive#10 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:4520
 riscvarchive#11 0x0000aaaaaaafbcd0 in gdb_wait_for_event (block=block@entry=1) at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:700
 riscvarchive#12 0x0000aaaaaaafc0b0 in gdb_wait_for_event (block=1) at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:596
 riscvarchive#13 gdb_do_one_event () at ../../../repos/binutils-gdb/gdbsupport/event-loop.cc:237
 riscvarchive#14 0x0000aaaaaaacacb0 in start_event_loop () at ../../../repos/binutils-gdb/gdbserver/server.cc:3518
 riscvarchive#15 captured_main (argc=4, argv=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:3998
 riscvarchive#16 0x0000aaaaaaab66dc in main (argc=<optimized out>, argv=<optimized out>) at ../../../repos/binutils-gdb/gdbserver/server.cc:4084

This sequence of functions is invoked due to a series of conditions:

 1 - The probe-based breakpoint mechanism failed (for some reason) so ...

 2 - ... gdbserver has to know what type of architecture it is dealing
     with so it can pick the right breakpoint kind, so it wants to
     check if we have a 64-bit target.

 3 - To determine the size of a register, we currently fetch the
     current thread's register cache, and the current thread pointer
     is now nullptr.

In riscvarchive#3, the current thread is nullptr because gdb_read_memory clears it
on purpose, via set_desired_process, exactly to expose code relying on
the current thread when it shouldn't.  It was always possible to end
up in this situation (when the current thread exits), but it was
harder to reproduce before.

This commit fixes it by tweaking is_64bit_tdesc to look at the current
process's tdesc instead of the current thread's tdesc.

Note that the thread's tdesc is itself filled from the process's
tdesc, so this should be equivalent:

 struct regcache *
 get_thread_regcache (struct thread_info *thread, int fetch)
 {
   struct regcache *regcache;

   regcache = thread_regcache_data (thread);

 ...
   if (regcache == NULL)
     {
       struct process_info *proc = get_thread_process (thread);

       gdb_assert (proc->tdesc != NULL);

       regcache = new_register_cache (proc->tdesc);
       set_thread_regcache_data (thread, regcache);
     }
 ...

Change-Id: Ibc809d7345e70a2f058b522bdc5cdbdca97e2cdc
kito-cheng pushed a commit that referenced this issue Jul 7, 2022
Simon reported that the recent change to make GDB and GDBserver avoid
reading shell registers caused a GDBserver regression, caught with
ASan while running gdb.server/non-existing-program.exp:

 $ /home/smarchi/build/binutils-gdb/gdb/testsuite/../../gdb/../gdbserver/gdbserver stdio non-existing-program
 =================================================================
 ==127719==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000000e9 at pc 0x55bcbfa301f4 bp 0x7ffd238a7320 sp 0x7ffd238a7310
 WRITE of size 1 at 0x60f0000000e9 thread T0
     #0 0x55bcbfa301f3 in scoped_restore_tmpl<bool>::~scoped_restore_tmpl() /home/smarchi/src/binutils-gdb/gdbserver/../gdbsupport/scoped_restore.h:86
     #1 0x55bcbfa2ffe9 in post_fork_inferior(int, char const*) /home/smarchi/src/binutils-gdb/gdbserver/fork-child.cc:120
     #2 0x55bcbf9c9199 in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:991
     #3 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941
     #4 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084
     #5 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
     #6 0x55bcbf8ef2bd in _start (/home/smarchi/build/binutils-gdb/gdbserver/gdbserver+0x1352bd)

 0x60f0000000e9 is located 169 bytes inside of 176-byte region [0x60f000000040,0x60f0000000f0)
 freed by thread T0 here:
     #0 0x7ff9d6c6f0c7 in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:160
     #1 0x55bcbf910d00 in remove_process(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/inferiors.cc:164
     #2 0x55bcbf9c4ac7 in linux_process_target::remove_linux_process(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:454
     #3 0x55bcbf9cdaa6 in linux_process_target::mourn(process_info*) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:1599
     #4 0x55bcbf988dc4 in target_mourn_inferior(ptid_t) /home/smarchi/src/binutils-gdb/gdbserver/target.cc:205
     #5 0x55bcbfa32020 in startup_inferior(process_stratum_target*, int, int, target_waitstatus*, ptid_t*) /home/smarchi/src/binutils-gdb/gdbserver/../gdb/nat/fork-inferior.c:515
     #6 0x55bcbfa2fdeb in post_fork_inferior(int, char const*) /home/smarchi/src/binutils-gdb/gdbserver/fork-child.cc:111
     #7 0x55bcbf9c9199 in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:991
     #8 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941
     #9 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084
     #10 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)

 previously allocated by thread T0 here:
     #0 0x7ff9d6c6e5a7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
     #1 0x55bcbf910ad0 in add_process(int, int) /home/smarchi/src/binutils-gdb/gdbserver/inferiors.cc:144
     #2 0x55bcbf9c477d in linux_process_target::add_linux_process_no_mem_file(int, int) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:425
     #3 0x55bcbf9c8f4c in linux_process_target::create_inferior(char const*, std::__debug::vector<char*, std::allocator<char*> > const&) /home/smarchi/src/binutils-gdb/gdbserver/linux-low.cc:985
     #4 0x55bcbf954549 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3941
     #5 0x55bcbf9552f0 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4084
     #6 0x7ff9d663b0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)

Above we see that in the non-existing-program case, the process gets
deleted before the starting_up flag gets restored to false.

This happens because startup_inferior calls target_mourn_inferior
before throwing an error, and in GDBserver, unlike in GDB, mourning
deletes the process.

Fix this by not using a scoped_restore to manage the starting_up flag,
since we should only clear it when startup_inferior doesn't throw.

Change-Id: I67325d6f81c64de4e89e20e4ec4556f57eac7f6c
kito-cheng pushed a commit that referenced this issue Jul 7, 2022
When building gdb with -fsanitize=thread and gcc 12, and running test-case
gdb.dwarf2/dwz.exp, we run into a data race between thread T2 and the main
thread in the same write:
...
Write of size 1 at 0x7b200000300c:^M
    #0 cutu_reader::cutu_reader(dwarf2_per_cu_data*, dwarf2_per_objfile*, \
    abbrev_table*, dwarf2_cu*, bool, abbrev_cache*) gdb/dwarf2/read.c:6252 \
    (gdb+0x82f3b3)^M
...
which is here:
...
         this_cu->dwarf_version = cu->header.version;
...

Both writes are called from the parallel for in dwarf2_build_psymtabs_hard,
this one directly:
...
    #1 process_psymtab_comp_unit gdb/dwarf2/read.c:6774 (gdb+0x8304d7)^M
    #2 operator() gdb/dwarf2/read.c:7098 (gdb+0x8317be)^M
    #3 operator() gdbsupport/parallel-for.h:163 (gdb+0x872380)^M
...
and this via the PU import:
...
    #1 cooked_indexer::ensure_cu_exists(cutu_reader*, dwarf2_per_objfile*, \
    sect_offset, bool,  bool) gdb/dwarf2/read.c:17964 (gdb+0x85c43b)^M
    #2 cooked_indexer::index_imported_unit(cutu_reader*, unsigned char const*, \
    abbrev_info const*) gdb/dwarf2/read.c:18248 (gdb+0x85d8ff)^M
    #3 cooked_indexer::index_dies(cutu_reader*, unsigned char const*, \
    cooked_index_entry const*, bool) gdb/dwarf2/read.c:18302 (gdb+0x85dcdb)^M
    #4 cooked_indexer::make_index(cutu_reader*) gdb/dwarf2/read.c:18443 \
    (gdb+0x85e68a)^M
    #5 process_psymtab_comp_unit gdb/dwarf2/read.c:6812 (gdb+0x830879)^M
    #6 operator() gdb/dwarf2/read.c:7098 (gdb+0x8317be)^M
    #7 operator() gdbsupport/parallel-for.h:171 (gdb+0x8723e2)^M
...

Fix this by setting the field earlier, in read_comp_units_from_section.

The write in cutu_reader::cutu_reader() is still needed, in case
read_comp_units_from_section is not used (run the test-case with say, target
board cc-with-gdb-index).

Make the write conditional, such that it doesn't trigger if the field is
already set by read_comp_units_from_section.  Instead, verify that the
field already has the value that we're trying to set it to.

Move this logic into into a member function set_version (in analogy to the
already present member function version) to make sure it's used consistenly,
and make the field private in order to enforce access through the member
functions, and rename it to m_dwarf_version.

While we're at it, make sure that the version is set before read, to avoid
say returning true for "per_cu.version () < 5" if "per_cu.version () == 0".

Tested on x86_64-linux.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants