forked from riscvarchive/riscv-binutils-gdb
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDB overlay support for RISC-V #9
Open
dmi391
wants to merge
1
commit into
sifive:master
Choose a base branch
from
dmi391:gdb-riscv-overlay
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Fixed problem with overlay support for RISC-V To able GDB support overlay debugging it is necessary to initialize pointer `gdbarch->overlay_update` with function pointer `simple_overlay_update(struct obj_section *osect)`: In file `/gdb/riscv-tdep.c` at the end of definition of `riscv_gdbarch_init(...)` should be called `set_gdbarch_overlay_update(gdbarch, simple_overlay_update)` - similarly with file `/gdb/m32r-tdep.c`. Without this fix GDB-client can't update overlay table `_ovly_table` from target RAM and overlay debugging doesn't work: (gdb) overlay list No sections are mapped. (gdb) overlay load This target does not know how to read its overlay state. With this fix GDB-client is able to support overlay debugging in auto-mode (GDB-client updates overlay table `_ovly_table` from target RAM): (gdb) set verbose on (gdb) overlay auto Automatic overlay debugging enabled. ... (gdb) overlay list Section .ovly1, loaded at 0x10080444 - 0x100805d0, mapped at 0x10000060 - 0x100001ec 2. Fixed problem with output of overlay GDB-commands GDB-commands `overlay auto`, `overlay manual`, `overlay off` have incorrect output message. In file `/gdb/symfile.c` in functions `overlay_auto_command(...)`, `overlay_manual_command(...)`, `overlay_off_command(...)` in call `printf_filtered(_("..."))` it is necessary to add '\n' at the end of string. Otherwise output message of this GDB-commands unseparated with next GDB-command output message. This messages are displayed only with `set verbose on`. Without this fix. Incorrect (unseparated): (gdb) set verbose on (gdb) overlay auto <nothing> (gdb) overlay list Automatic overlay debugging enabled.No sections are mapped. With this fix (added '\n'). Correct: (gdb) set verbose on (gdb) overlay auto Automatic overlay debugging enabled. (gdb) overlay list No sections are mapped.
kito-cheng
pushed a commit
that referenced
this pull request
Sep 25, 2023
While working on a later patch, which changes gdb.base/foll-vfork.exp, I noticed that sometimes I would hit this assert: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. I eventually tracked it down to a combination of schedule-multiple mode being on, target-non-stop being off, follow-fork-mode being set to child, and some bad timing. The failing case is pretty simple, a single threaded application performs a vfork, the child process then execs some other application while the parent process (once the vfork child has completed its exec) just exits. As best I understand things, here's what happens when things go wrong: 1. The parent process performs a vfork, GDB sees the VFORKED event and creates an inferior and thread for the vfork child, 2. GDB resumes the vfork child process. As schedule-multiple is on and target-non-stop is off, this is translated into a request to start all processes (see user_visible_resume_ptid), 3. In the linux-nat layer we spot that one of the threads we are about to start is a vfork parent, and so don't start that thread (see resume_lwp), the vfork child thread is resumed, 4. GDB waits for the next event, eventually entering linux_nat_target::wait, which in turn calls linux_nat_wait_1, 5. In linux_nat_wait_1 we eventually call resume_stopped_resumed_lwps, this should restart threads that have stopped but don't actually have anything interesting to report. 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for vfork parents like resume_lwp does, so at this point the vfork parent is resumed. This feels like the start of the bug, and this is where I'm proposing to fix things, but, resuming the vfork parent isn't the worst thing in the world because.... 7. As the vfork child is still alive the kernel holds the vfork parent stopped, 8. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 9. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 10. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 11. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 12. While this has been going on the vfork parent is executing, and might even exit, 13. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 14. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 15. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 16. An exited thread is not classified as stopped, and so the assert triggers! So there's two bugs I see here. The first, and most critical one here is in step #6. I think that resume_stopped_resumed_lwps should not resume a vfork parent, just like resume_lwp doesn't resume a vfork parent. With this change in place the vfork parent will remain stopped in step instead GDB will only see the EXECD/EXITED/SIGNALLED event. The problems in #9 and #10 are therefore skipped and we arrive at #11, handling the EXECD event. As the parent is still stopped riscvarchive#12 doesn't apply, and in riscvarchive#13 when we try to stop the process we will see that it is already stopped, there's no risk of the vfork parent exiting before we get to this point. And finally, in riscvarchive#15 we are safe to poke the process registers because it will not have exited by this point. However, I did mention two bugs. The second bug I've not yet managed to actually trigger, but I'm convinced it must exist: if we forget vforks for a moment, in step riscvarchive#13 above, when linux_nat_target::detach is called, we first try to stop all threads in the process GDB is detaching from. If we imagine a multi-threaded inferior with many threads, and GDB running in non-stop mode, then, if the user tries to detach there is a chance that thread could exit just as linux_nat_target::detach is entered, in which case we should be able to trigger the same assert. But, like I said, I've not (yet) managed to trigger this second bug, and even if I could, the fix would not belong in this commit, so I'm pointing this out just for completeness. There's no test included in this commit. In a couple of commits time I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. Unfortunately there are at least two other bugs (separate from the ones discussed above) that need fixing first, these will be fixed in the next commits before the gdb.base/foll-vfork.exp test is expanded. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now GDB-client supports overlay debugging in auto-mode.
To able GDB support overlay debugging it is necessary to initialize pointer
gdbarch->overlay_update
with function pointersimple_overlay_update(struct obj_section *osect)
: In file/gdb/riscv-tdep.c
at the end of definition ofriscv_gdbarch_init(...)
should be calledset_gdbarch_overlay_update(gdbarch, simple_overlay_update)
- similarly with file/gdb/m32r-tdep.c
.Without this fix GDB-client can't update overlay table
_ovly_table
from target RAM and overlay debugging doesn't work:With this fix GDB-client is able to support overlay debugging in auto-mode (GDB-client updates overlay table
_ovly_table
from target RAM):GDB-commands
overlay auto
,overlay manual
,overlay off
have incorrect output message. In file/gdb/symfile.c
in functionsoverlay_auto_command(...)
,overlay_manual_command(...)
,overlay_off_command(...)
in callprintf_filtered(_("..."))
it is necessary to add '\n' at the end of string. Otherwise output message of this GDB-commands unseparated with next GDB-command output message. This messages are displayed only withset verbose on
.Without this fix. Incorrect (unseparated):
With this fix (added '\n'). Correct:
My overlay demo project for RISC-V: dmi391/overlay_demo
My custom build of GDB-client with this two fixes (it works correct): release gdb-riscv-ovly