Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for Bare-Metal OS Support #56

Closed
xobs opened this issue May 25, 2021 · 50 comments
Closed

Add API for Bare-Metal OS Support #56

xobs opened this issue May 25, 2021 · 50 comments
Labels
API-breaking Breaking API change API-ergonomics Nothing's "broken", but the API could be improved new-api Add a new feature to the API (possibly non-breaking)
Milestone

Comments

@xobs
Copy link
Contributor

xobs commented May 25, 2021

gdbstub makes it very easy to add a GDB server to any project. The current design uses long-polling to interact with the target. Conceptually, gdbstub sits between a network Connection and the Target being debugged. When the target is running, gdbstub is blocked.

An API for bare metal could live in an interrupt handler such as a UART. Each time a character is received, it would be passed to gdbstub for collection and processing. This has several nice properties:

  1. It is easy to remove, allowing for feature gating -- no gdbstub means no debugger, and you simply have to remove the interrupt hook.
  2. It is very nonintrusive -- you only need to add code to the interrupt handler without needing to adjust the main loop at all.
  3. Because interrupt handlers run in an interrupt context, the system is necessarily in a stable state with interrupts disabled, meaning things won't be changing as you inspect them.
  4. Being in an interrupt handler allows you to debug even the kernel itself.
@xobs
Copy link
Contributor Author

xobs commented May 25, 2021

Adding more context from #31 -- Specifically the target I have in mind is Xous running on Betrusted.

Xous is an operating system we're working on. It's bare-metal and runs on custom hardware. Currently it is targeting Betrusted, which runs a RISC-V core on a Xilinx FPGA in a keyboard "phone" form factor. Xous is written entirely in Rust, and is a microkernel architecture. We have the beginnings of libstd ported and running on Rust 1.52.1, however debugging it up until now has been a challenge. It would be fantastic if we could get a full debugger working.

The kernel itself is a cooperative microkernel and only has access to a TRNG for generating random server IDs. We're considering adding a timer for certain interruptable mutex operations as an extension. All operating system fundamentals are handled by servers. For example, the console UART is handled by a log server, which acts as stdout.

The current approach to debugging is to use Renode, which has support for acting as a GDB server. Its support for the MMU is still somewhat limited, which makes perfect sense given that it's either used to debug deeply-embedded projects that have no MMU, or to run something heavy like Linux that has support for native debuggers.

gdbstub allows us to either repurpose the console UART, or create an entirely new UART that only speaks GDB. With simple monitor commands we can switch processes and enable arbitrary debugging of processes within the system. Renode supports connecting an emulated UART to a TCP socket, so gdb should be able to talk directly with the emulated kernel, and should work just fine when running on real hardware.

@xobs
Copy link
Contributor Author

xobs commented May 25, 2021

I'm trying to use 297e065 but I have a few questions:

  • What should I return for gdbstub::Connection::read() when there are no bytes? Data is now being injected via pump(), so it would seem that Connection really only needs to implement write(). Additionally, peek() will now always return None since characters are injected
  • What is the return value of MultiThreadOps::resume()? If the stub decides to issue a resume, I'll unblock that process. However, once the process is running, I'll need to return something indicating success.

In my kernel I'm introducing a new process state that indicates a process is being debugged, which allows other processes to interact with it but without allowing it to get scheduled. For example, other processes can send it messages which will pile up in its inbox.

@daniel5151 daniel5151 added API-ergonomics Nothing's "broken", but the API could be improved new-api Add a new feature to the API (possibly non-breaking) labels May 25, 2021
@daniel5151
Copy link
Owner

daniel5151 commented May 25, 2021

Thanks for opening the issue, and for giving my POC branch a try!

Like I said in the other thread, I'm really excited to see what you come up with, since using gdbstub for bare-metal debugging is something I've been eager to validate for a very long time.

To answer your questions about 297e065:

Stubbing out Connection::read and Connection::peek

As you're aware, the current design still requires passing a Connection to GdbStubStateMachine, even though under-the-hood, it'll never actually call Connection::read. This is definitely an API wart that should get buffed out before merging a final version, but for now, it should be reasonable to simply have Connection::read panic! if it is ever called (which it won't).

As for Connection::peek: the only place it is ever called is as part of the call to GdbInterruptNoAsync::pending, which is used to poll for incoming GDB client interrupts. As I mentioned in my original comment, you'll want to bypass this mechanism entirely, and check for interrupt packets (which is just a bare 0x03 byte) as part of your external interrupt handler infrastructure. As such, you can simply return None from Connection::peek, or alternatively, have it panic! as a safeguard against accidental misuse.

Implementing MultiThreadOps::resume

To answer the first part of your question: a full list of stop reasons is enumerated by the ThreadStopReason enum.

To answer the second part of your question, MultiThreadOps::resume()'s API is designed such that an implementor will resume the target, and then block until some kind of stop event occurs. There is no mechanism to return a "okay, the target is now running" message.

This has been totally fine up to now, as gdbstub only implements All-Stop mode, where the GDB client does not send any packets (aside from the 0x3 interrupt packet) while a target is running. This is in contrast to Non-Stop mode, where background threads can execute freely while the gdbstub continues to service packets.

One key thing to note is that All-Stop mode is by far the most common way to implement a gdbstub, and I have yet to find a reasonable implementation of a non-gdbserver GDB stub that implements this mode. This is because implementing Non-Stop mode introduces the challenge of how to handle asynchronously sending notification packets when the target encounters a stop event. I'll be honest, I've been putting off working on this feature since I've always had a sneaking suspicion it'll require some non-trivial rework of large chunks of gdbstub's connection management infrastructure...

In addition, supporting non-stop mode will also require "mode-gating" certain current/future protocol extensions, as not all packets are supported when non-stop mode is active. e.g: Remote File I/O isn't supported in non-stop mode, nor is the O xx stop reply packet, etc... Not a deal-breaker of course, but just some added complexity that'll need to be handled appropriately.

Now, with all the context out of the way, lets pivot back to the problem at hand: how should you implement resume. Well... given that you're already running this code in an interrupt handler, and you've already got the infrastructure to context switch out of a IRQ, if it were me, I would probably have just structured the code as follows:

fn resume(...) {
    resume_thread(tid);
    __context_switch();
    // the process runs, encounters some stop event, and you've context-switched back to here
    let reason = retrieve_stop_reason(tid); // <-- assuming you've stashed away the stop reason somewhere
    return translate_stop_reason(reason);
}

If this isn't feasible, then congratulations, you've opened Pandora's box and forced me to consider how gdbstub could implement Non-Stop mode! Getting that working will likely be quite an endeavor, and not something I could hammer out a POC of in a single evening.
Alternatively, we could also explore a similar state-machine based API for resuming targets, but I'd expect that to also require quite a bit of non-trivial development effort and refactoring...

Let me know what you think!

@xobs
Copy link
Contributor Author

xobs commented May 26, 2021

The problem with that code is that it's running in an interrupt handler, and so has no context to begin with. In this kernel at least, a brand-new stack frame and context is generated every time an interrupt hits, and it's disposed of when the interrupt exits. Things in the .data section persist though, and they can have one word of data passed to them.

The interrupt handler looks something like this:

static mut GDB_SERVER: Option<(GdbStubStateMachine<XousTarget, Uart>, XousTarget)> = None;
pub fn irq(_irq_number: usize, _arg: *mut usize) {
    let b = Uart {}
        .getc()
        .expect("no character queued despite interrupt") as char;

    // GDB server exists, hand character to the GDB server
    unsafe {
        if let Some((gdb, target)) = GDB_SERVER.as_mut() {
            gdb.pump(&mut target, b).unwrap();
            return;
        }
    }

    // Not currently in GDB, process normally.
    match b {
         // ...
        'g' => {
            use gdb_server::*;
            println!("Starting GDB server -- attach your debugger now");
            let xous_target = XousTarget::new();
            match gdbstub::GdbStubBuilder::new(Uart {})
                .with_packet_buffer(unsafe { &mut GDB_BUFFER })
                .build()
            {
                Ok(gdb) => unsafe { GDB_SERVER = Some((gdb, xous_target)) },
                Err(e) => println!("Unable to start GDB server: {}", e),
            }
        }
        'h' => print_help(),
    }
}

Therefore, code execution will flow into gdbstub as part of gdb.pump() but I don't think it will ever exit. Or will it? The state should be stored in the .bss in the GDB_BUFFER variable, but as I said the actual stack frame is thrown away when the interrupt returns.

"blocking" until a stop event occurs is fine, and was the mental model I've been working with. Except we wouldn't block, we'd return. The process in question would be descheduled.

Calling resume(...) would re-schedule the process and then return from the interrupt handler. At this point, the process can stop in one of two ways:

  1. The remote GDB session requests a break, which will be interpreted as a keypress. This will be passed to gdbstub via pump().
  2. The program hits an invalid instruction, because the processor executes an EBREAK call. This will get trapped by the interrupt handler, and will essentially enter the same function. Perhaps that should call something like gdb.request_break(); gdb.pump(None);?

I almost wonder if this isn't a thing that async couldn't help with. I don't know enough about async, but I gather that it makes state machines such as this, and it does seem like the inner function simply needs to yield while it waits. But perhaps that is overkill.

@daniel5151
Copy link
Owner

Ahhh, I understand...

My shoddy little microkernel used stateful interrupt handlers that actually maintained their own separate execution contexts, but it's totally reasonable to keep interrupt handlers lean instead (arguably better tbh).

I almost wonder if this isn't a thing that async couldn't help with.

Kinda. Ideally, Rust would stabilize native generators, since what we're really interested here is some sort of "yield" construct. It would be unfortunate to add a bunch of async complexity to an otherwise quite straightforward API...

That said, I just took a fresh look at the source code, and I realized that past-me may have inadvertently structured gdbstub's implementation in such a way that adding a "return after resume" path wouldn't actually require that much refactoring! To see what I mean, take a look at how target-resume is handled within GdbStub https://github.com/daniel5151/gdbstub/blob/master/src/gdbstub_impl/ext/base.rs#L587-L604

You'll find that the do_vcont method (which is invoked when the GDB client wishes to resume a target) ends up inking a target's resume method, obtaining a stop reason, and then invoking a separate finish_exec function. In other words, the act of calling + returning from resume is already decoupled from the act of reporting a stop reason to the GDB client!

With this in mind, it shouldn't be too difficult to add a new StopReason::Resumed variant, which when returned, will short-circuit the current do_vcont -> finish_exec loop, and manually yield execution back to the callee of gdb.pump() (possibly returning something like DisconnectReason::WaitForResume). At that point, you can safely tear down the interrupt context, switch back to the task, and once you're back in the interrupt handler, you can wire up a direct interface to call finish_exec with the appropriate stop reason.

Now, unfortunately, I am going to be quite busy for the next week or so (work is ramping up + it'll be a long weekend here in the states), so I don't think I'll have time to hack together an implementation until sometime next week. That said, if you're willing to get your hands dirty, I think I've provided enough context such that you can fork gdbstub and hack together a rough POC. If you can pull that off, you should be able to start implementing / validating other bits of gdbstub's functionality, and we can work on getting a more robust implementation upstreamed.

Let me know if what I described makes sense, and if you think that's something you might be able to tackle yourself.

@xobs
Copy link
Contributor Author

xobs commented May 27, 2021

That does make sense, and I'll see if I can hack something together. Unfortunately, I'll also be busy with another project for the next week, so I'm not sure how much I'll be able to complete.

@daniel5151
Copy link
Owner

Sounds good.

I'll set a reminder to circle back on this in a week or so to see how things are going, and in the meantime, I'll try and squeeze in some time to throw something together myself.

@xobs
Copy link
Contributor Author

xobs commented May 27, 2021

Something strange is happening with linking. When I include the call to gdb.pump() it looks like it uses a different atomic primitive than exists. Or maybe I have to manually create that symbol. This is building on riscv32imac, which has atomic instructions:

$ cargo build --target riscv32imac-unknown-none-elf
...
  = note: rust-lld: error: undefined symbol: __atomic_load_4
          >>> referenced by atomic.rs:0 (C:\Users\smcro\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library/core/src/sync\atomic.rs:0)
          >>>               lto.tmp:(core::sync::atomic::AtomicUsize::load::h6c2d4ab500043b24)
          >>> referenced by atomic.rs:0 (C:\Users\smcro\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library/core/src/sync\atomic.rs:0)
          >>>               lto.tmp:(core::sync::atomic::atomic_load::hd311338cf6f0be89)


error: aborting due to previous error; 6 warnings emitted

error: could not compile `kernel`

To learn more, run the command again with --verbose.
$

If I comment out the call to gdb.pump() it works. Let me try to dig into it and figure out what sort of atomic calls it's looking for...

@daniel5151
Copy link
Owner

Ahh, how odd...

I know for a fact that gdbstub itself doesn't use any atomics directly (or any synchronization primitives at all for that matter). Indirectly, the only dependency that might be using atomics under-the-hood would be the log crate, which does try and use atomic instructions when available...

@xobs
Copy link
Contributor Author

xobs commented May 27, 2021

Fascinating. I'm pretty sure it's a rust bug. If I build for riscv32i-unknown-none-elf, which as no atomic support, it builds just fine. If I build it for 1.51, it builds just fine. However, Rust 1.52 breaks.

I'll open a bug there. In the meantime, I have workarounds.

@xobs
Copy link
Contributor Author

xobs commented May 27, 2021

I opened a new bug at rust-lang/rust#85736 -- seems like a regression, since it works in the older compiler.

@xobs
Copy link
Contributor Author

xobs commented Jun 1, 2021

It's a regression in lto support, which unfortunately derailed my momentum. Now I'm pondering how to actually implement single-step debugging.

Do you know if gdb is happy to run without single-step support?

@daniel5151
Copy link
Owner

Alright, I'm back from the long weekend, and it's time to get back into the swing of things.
Lets see what's going on here...


Ah, yeah, would you look at that! It seems you really did stumble on a regression in the Rust compiler. Good thing you reported it, and hopefully it'll get fixed ASAP!


As for your question regarding single stepping, I would refer to the GDB docs here: https://sourceware.org/gdb/current/onlinedocs/gdb/Overview.html#Overview

At a minimum, a stub is required to support the ‘?’ command to tell GDB the reason for halting, ‘g’ and ‘G’ commands for register access, and the ‘m’ and ‘M’ commands for memory access. Stubs that only control single-threaded targets can implement run control with the ‘c’ (continue) command, and if the target architecture supports hardware-assisted single-stepping, the ‘s’ (step) command. Stubs that support multi-threading targets should support the ‘vCont’ command. All other commands are optional.

And hey, would you look at that! It turns out that you can get away without supporting single stepping support! I'll be honest, I totally didn't realize that something as seemingly fundamental as single-stepping is actually an optional operation, and as such, gdbstub doesn't actually have a mechanism to signal that the target does not support single stepping...

Thanks for bringing this to my attention! Thankfully, this wouldn't be a difficult option to "plumb" through. My gut feeling is that we can just add a new fn supports_single_step(&self) -> bool method to {Single,Multi}ThreadOps (including a default impl that returns true for backwards-compat reasons), which is then used by gdbstub internally to determine if the response to vCont? should include support for single-stepping.

While this isn't actually the cleanest solution (requiring implementors to add a logically unreachable match arm in their resume implementation) it is backwards compatible and easy to implement. We can land it as part of these other bare-metal API changes, and I'll pop open a tracking issue to clean up the API a bit for 0.6 (which will likely end up looking similar to how the current optional range-stepping API).

I'm expecting to budget some time to hack away at gdbstub sometime later today / tomorrow, but in the meantime, you can "emulate" this functionality by manually modifying this line to omit the s;S variants.


Lastly, I was also planning on taking a crack at implementing support for "deferred resume" sometime later today / tomorrow. I've got some neat ideas on how to structure the API, and I'm excited to see what I'll be able to hack together :)

@daniel5151
Copy link
Owner

Alright, I just pushed up a [hopefully] working version of "deferred resume" to the feature/state-machine-recv-packet branch.

Fair warning: I have not had a chance to test this code outside of "making sure it compiles". That said, given that we are writing Rust, I feel fairly confident that it'll Just Work™️ right off the bat. Lets see if I'm right 😄.

Note that I decided to use a typestate based API to enforce correct GDB protocol sequencing at compile-time, and as such, the API might be a bit confusing if you're not familiar with this pattern. Please refer to the code in example_no_std/src/main.rs for an overview of how the API should be used.

Please give these changes a shot, and let me know how it goes!


I didn't get the chance to work on making single-stepping optional, though I should have some time to finish it up tomorrow. In the meantime, you'll just have to use the workaround I mentioned above.

@daniel5151
Copy link
Owner

daniel5151 commented Jun 3, 2021

I've been thinking about this typestate token based API for a couple days now, and I've realized that to really make the API rock-solid, I think I'll need to make some API breaking changes at some point. That said, I'm not sure if that'll be as part of this WIP PR or as part of some future work, so for now, I'm just going to jot down these ideas before I forget them.

Note that these are not pressing issues, and are only useful as a way to "tighten up" the API and make it impossible to misuse the library at compile time. A truly malicious implementation could unsafely craft valid typestate tokens out of thin air, and there isn't anything we can do about it.

Enforce (at compile time) that StopReason::Defer can only be constructed as part of the resume method.

As it stands, there's nothing stopping the user from constructing a StopReason::Defer and passing to to GdbStubStateMachine::deferred_stop_reason. This is obviously incorrect, as it doesn't make any sense to defer a defer, and right now, this case is handled with a runtime check.

I realized that it'd actually be possible to enforce this property at compile time with some similar typestate token shenanigans:

// Create a new typestate token that is tied to a specific lifetime
#[non_exhaustive]
struct DeferredStopReasonToken<'a> {
    lifetime: core::marker::PhantomData<&'a ()>
}

// update the StopReason::Defer to require an instance of `DeferredStopReasonToken`
enum StopReason<'a, U> { // <-- requires adding a lifetime parameter :'(
    // ...
    Defer(DeferredStopReasonToken<'a>)
}

// update the `resume` method to pass a valid instance of `DeferredStopReasonToken`
    fn resume(
        &mut self,
        default_resume_action: ResumeAction,
        gdb_interrupt: GdbInterrupt<'_>, // <-- i'll probably remove this in the next breaking version
        defer: DeferredStopReasonToken<'_>, // <-- tied to the lifetime of `resume`!
    ) -> Result<ThreadStopReason<<Self::Arch as Arch>::Usize>, Self::Error> 
{
    // if a target wants to defer a stop reason, they must consume the defer token
   return Ok(ThreadStopReason::Defer(defer))
}

With this infrastructure in place, we can observe the following: Back at the top level, in the scope where one would call GdbStubStateMachine::deferred_stop_reason, there is no way to construct a DeferredStopReasonToken out of thin air!

Moreover, since the lifetime of the token was tied to the resume method, a target implementation wouldn't be able to "stash away" the token in an attempt to break this invariant.

Tying typestate tokens to a unique instance of GdbStub

The current system has a flaw whereby there's nothing stopping someone from constructing two-or-more instances of GdbStub simultaneously, and using them to acquire arbitrary out-of-sequence typestate tokens. i.e: it is trivial to construct arbitrary typestate tokens out of sequence by instantiating + running transient GdbStub instances with just enough dummy-data to yield the desired token.

The mitigation would be to somehow "tie" typestate tokens with a particular GdbStub instance - preferably at the type system level. I've got a few ideas on how this might be possible, and the more I think about it, I feel like it ought to be possible to do this without requiring any breaking API changes.

Maybe I'll take a crack at this at some point this weekend...

@daniel5151
Copy link
Owner

FYI, I just pushed up yet another update that addressed the issue of "Tying typestate tokens to a unique instance of GdbStub". The solution? Get rid of tokens entirely, and instead have the GdbStubStateMachine struct itself transition between states.

Shoutout to https://hoverbear.org/blog/rust-state-machine-pattern/ for providing a very easy-to-use example that could be modified with impunity.

As always, check out the example_no_std code for details on how to use this API. It should be pretty straightforward.

Cheers

@xobs
Copy link
Contributor Author

xobs commented Jun 6, 2021

Thanks! That does seem easier. I'm struggling a bit with lifetimes trying to convince Rust to store mutable items inside an Option<T>.

Specifically, my pattern looks something like this:

    pub static mut GDB_SERVER: Option<GdbStubStateMachine<XousTarget, super::Uart>> = None;
    pub static mut GDB_TARGET: Option<XousTarget> = None;
    pub static mut GDB_BUFFER: [u8; 4096] = [0u8; 4096];

pub fn irq(_irq_number: usize, _arg: *mut usize) {
    let b = Uart {}
        .getc()
        .expect("no character queued despite interrupt");

    #[cfg(feature = "gdbserver")]
    unsafe {
        use crate::debug::gdb_server::{GDB_SERVER, GDB_TARGET};
        use gdbstub::state_machine::GdbStubStateMachine;
        use gdbstub::{DisconnectReason, GdbStubError};
        if let Some(gdb) = GDB_SERVER.as_mut().take() {
            let target = GDB_TARGET.as_mut().unwrap();
            let new_gdb = match gdb {
                GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
                    // Remote disconnected -- leave the `GDB_SERVER` as `None`.
                    Ok((_, Some(disconnect_reason))) => {
                        match disconnect_reason {
                            DisconnectReason::Disconnect => println!("GDB Disconnected"),
                            DisconnectReason::TargetExited(_) => println!("Target exited"),
                            DisconnectReason::TargetTerminated(_) => println!("Target halted"),
                            DisconnectReason::Kill => println!("GDB sent a kill command"),
                        }
                        return;
                    }
                    Err(GdbStubError::TargetError(_e)) => {
                        println!("Target raised a fatal error");
                        return;
                    }
                    Err(_e) => {
                        println!("gdbstub internal error");
                        return;
                    }
                    Ok((gdb, None)) => gdb,
                },
                // example_no_std stubs out resume, so this will never happen
                GdbStubStateMachine::DeferredStopReason(_) => {
                    panic!("Deferred stop shouldn't happen")
                }
            };
            GDB_SERVER = Some(new_gdb);
            return;
        }
    }
    match b {
        #[cfg(feature = "gdbserver")]
        b'g' => {
            use gdb_server::{XousTarget, GDB_BUFFER, GDB_SERVER, GDB_TARGET};
            println!("Starting GDB server -- attach your debugger now");
            unsafe { GDB_TARGET = Some(XousTarget::new()) };
            match gdbstub::GdbStubBuilder::new(Uart {})
                .with_packet_buffer(unsafe { &mut GDB_BUFFER })
                .build()
            {
                Ok(gdb) => match gdb.run_state_machine() {
                    Ok(state) => unsafe { GDB_SERVER = Some(state) },
                    Err(e) => println!("Unable to start GDB state machine: {}", e),
                },
                Err(e) => println!("Unable to start GDB server: {}", e),
            }
        }
        // Other characters handled here
    }
    // Remainder of `debug` console here, for when a terminal is not connected
}

The problem I'm running into right now is that gdb_state.pump(target, b) consumes gdb_state, which is a member of gdb and is stored in the .data section (as part of gdb I'm guessing?). So I can't convince the compiler to build this:

error[E0507]: cannot move out of `*gdb` which is behind a mutable reference
   --> src\debug.rs:311:57
    |
311 |                 GdbStubStateMachine::Pump(gdb) => match gdb.pump(target, b) {
    |                                                         ^^^ move occurs because `*gdb` has type `GdbStubStateMachineInner<'_, gdbstub::state_machine::state::Pump, XousTarget, Uart>`, which does not implement the `Copy` trait

error: aborting due to previous error; 2 warnings emitted

For more information about this error, try `rustc --explain E0507`.
error: could not compile `kernel`

To learn more, run the command again with --verbose.

@xobs
Copy link
Contributor Author

xobs commented Jun 6, 2021

Apologies, I've managed to at least convince it to compile by removing the .as_mut():

        if let Some(mut gdb) = GDB_SERVER.take() {
            let target = GDB_TARGET.as_mut().unwrap();
            let new_gdb = match gdb {
                GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
                    // Remote disconnected -- leave the `GDB_SERVER` as `None`.
                    Ok((_, Some(disconnect_reason))) => {
                        match disconnect_reason {
                            DisconnectReason::Disconnect => println!("GDB Disconnected"),
                            DisconnectReason::TargetExited(_) => println!("Target exited"),
                            DisconnectReason::TargetTerminated(_) => println!("Target halted"),
                            DisconnectReason::Kill => println!("GDB sent a kill command"),
                        }
                        return;
                    }
                    Err(GdbStubError::TargetError(_e)) => {
                        println!("Target raised a fatal error");
                        return;
                    }
                    Err(_e) => {
                        println!("gdbstub internal error");
                        return;
                    }
                    Ok((gdb, None)) => gdb,
                },
                // example_no_std stubs out resume, so this will never happen
                GdbStubStateMachine::DeferredStopReason(_) => {
                    panic!("Deferred stop shouldn't happen")
                }
            };
            GDB_SERVER = Some(new_gdb);
            return;
        }

@xobs
Copy link
Contributor Author

xobs commented Jun 6, 2021

Still working on it, but in case you're curious the [messy] repo is checked in at https://github.com/betrusted-io/xous-core/blob/debugger/kernel/src/debug.rs

I need to add OS-level support for addressing memory and putting processes in a Debug(u32) state, but at least the possibility is there.

@daniel5151
Copy link
Owner

Fantastic! I'm glad to see that you've got at least a "skeleton" of a full-fledged gdbstub integration up and running 😄

As always, I'm looking forward to seeing how your implementation shapes up, and whether you run into any more API papercuts. Once you've got something up and running, we can move on to polishing up the feature branch and merging it into master 🚀


By the way, you should remove those #[inline(never)] calls - they're only there in the example_no_std as a profiling measure to take the size of the dummy code into account. Production code should be annotation free.

Oh, and one more thing: I noticed that you've already got a few OS specific debug features in your code (e.g: dumping page tables, reporting ram usage, etc...) which become inaccessible once you enter GDB mode.

In case you aren't aware, but the GDB RSP actually defines a mechanism by which targets can implement custom client commands, which gdbstub exposes via the MonitorCmd extension. In a nutshell, you can set it up such that when a user types monitor [payload] from the GDB client, the target will receive the payload and have a chance to interpret + send arbitrary data back to the host console. Using this feature, you can define custom commands such as monitor ram to dump ram, monitor page-tables to dump page tables, etc... all while in the middle of a GDB debugging session!

@xobs
Copy link
Contributor Author

xobs commented Jun 6, 2021

Yep! I was planning on adding monitor commands to allow switching processes, since GDB doesn't really have anything like that. Threads will be threads, and processes will be managed by way of the process command.

Right now the big issue is supporting pausing processes on the system, so it's taking a bit longer than I'd like to integrate things.

@daniel5151
Copy link
Owner

I was planning on adding monitor commands to allow switching processes, since GDB doesn't really have anything like that.

Actually, that isn't true. GDB can attach to multiple running processes, and debug all of them simultaneously. This functionality is available when using the RSP when the target supports running in extended-mode, which coincidentally, is something gdbstub already supports via the ExtendedMode IDET.

Unfortunately, while gdbstub already supports running in extended mode, I have never gotten around to adding a MultiProcessOps interface to compliment the existing SingleThreadOps and MultiThreadOps interfaces. As it happens, gdbstub already has almost all of the internal infrastructure required to add a MultiThreadOps interface (notably, gdbstub unconditionally uses the RSP multiprocess extensions under-the-hood), and the only reason this interface hasn't been implemented is that up until this point, no-one has needed to debug multiple processes at once. Implementing support for this has been on my TODO list for a while, but given its relatively high effort+testing to payoff ratio, i'm not sure when I'll get a chance to implement it...

That said, you probably can leverage the current ExtendedMode support alongside the already-implemented MultiThreadOps interface to switch between processes. i.e: whenever you attach to a new process, you immediately detach from the previous process, thereby always having a single active process being debugged. While it would undeniably be a bit janky, it would likely be cleaner than using a custom monitor command to switch processes, as the GDB client will be enlightened to the fact that you've switched processes, and can update its internal model of the remote system accordingly (e.g: switching to a fresh set of breakpoints, using a different set of symbols (if appropriate), etc...)

@daniel5151
Copy link
Owner

daniel5151 commented Jun 11, 2021

Just a heads up, but now that #60 has been reported, I'm almost certainly going to be publishing a 0.6 release of gdbstub at some point in the next few weeks. It'd be nice if we could nail down any lingering warts / issues in the current state machine / broader gdbstub API so I can lump these changes in with other 0.6 changes.

While the changes in feature/state-machine-recv-packet are technically non-breaking and could be punted to a 0.6.x release, it would be nice to merge some of the internal plumbing improvements from that branch into master, as they lay the groundwork for some future features I've been meaning to implement (e.g: support for Non-Stop mode, GDB Agent support, etc...). Moreover, I'm somewhat tempted to remove the current "blocking" GdbStub::run method entirely, and make the state machine interface the primary interface to gdbstub. By doing so, I'd also be able to remove the read method from Connection.

As always, any and all feedback would be much appreciated!

@xobs
Copy link
Contributor Author

xobs commented Jun 18, 2021

I apologise, I was called away to another project. However, I am now able to resume work on this, and I will continue to work to integrate gdbstub into Xous.

I just managed to get process suspension working. That is, I can now move a process from Ready(x) to Debug(x), which means that it will not get scheduled. The rest of the system continues to work, and IPC continues to function. This means I can now implement the debugger.

@xobs
Copy link
Contributor Author

xobs commented Jun 18, 2021

I got it wired up, so now I'm able to peek and poke memory, inspect registers, and list threads. There are a few things to note.

  1. It's not able to continue execution. When I say c it seems to come back immediately for whatever reason:
c continue
w $vCont;c:p1.-1#0f
r <Timeout: 0 seconds>$S05#b8
w $g#67
r $0*$f413060020ff0560*-80ff0560f400*!8000*!60ff0560203106600f00*!eb73cd4837ea903d130d0215015c260d0*52000*!400*!243106600f000*!c0*B98210660f000*!30*$200*!f*$3e1e0600#9a
w $qfThreadInfo#bb
r $mp01.01,p01.02,p01.03,p01.04,p01.05,p01.06,p01.07#2a
w $qsThreadInfo#c8
r $l#6c
  1. The first thing GDB does is list all threads in the process. However, when the debugger is first attached, there is no process selected. I've worked around this in my implementation by arbitrarily saying PID 3 is the process being debugged. It's enough to keep things going, but it's not ideal.

@xobs
Copy link
Contributor Author

xobs commented Jun 18, 2021

Here's an example of a process being debugged:

[4:40:18 PM] ~/Desktop> riscv64-unknown-elf-gdb -ex 'set remotelogfile gdbproxy_logfile.org' -ex 'tar ext :9999' D:\Code\Xous\Core\target\riscv32imac-unknown-none-elf\release\shellchat
GNU gdb (SiFive GDB 8.3.0-2019.08.0) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-w64-mingw32 --target=riscv64-unknown-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://github.com/sifive/freedom-tools/issues>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.                                                                                                                                                                                           For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from D:\Code\Xous\Core\target\riscv32imac-unknown-none-elf\release\shellchat...
Remote debugging using :9999
_xous_syscall_rust () at src/asm.S:11
11          lw          t0, 0(sp)
(gdb) info thr
  Id   Target Id         Frame
* 1    Thread 1.1        _xous_syscall_rust () at src/asm.S:11
  2    Thread 1.2        _xous_syscall_rust () at src/asm.S:11
  3    Thread 1.3        _xous_syscall_rust () at src/asm.S:11
  4    Thread 1.4        _xous_syscall_rust () at src/asm.S:11
  5    Thread 1.5        _xous_syscall_rust () at src/asm.S:11
  6    Thread 1.6        _xous_syscall_rust () at src/asm.S:11
  7    Thread 1.7        _xous_syscall_rust () at src/asm.S:11
(gdb) bt
#0  _xous_syscall_rust () at src/asm.S:11
#1  0x000613f4 in xous::syscall::rsyscall (call=...) at xous-rs\src/syscall.rs:1616
#2  xous::syscall::receive_message (server=...) at xous-rs\src/syscall.rs:1263
#3  0x000478f4 in xous_entry () at services\shellchat\src\main.rs:282
#4  0x000494f6 in shellchat::_start (pid=15) at D:\Code\Xous\Core\xous-rs\src/lib.rs:168
Backtrace stopped: frame did not save the PC
(gdb)     

@xobs
Copy link
Contributor Author

xobs commented Jun 18, 2021

Alright, I realized the problem with (1) above where it would get interrupted immediately.

Previously I returning ThreadStopReason::GdbInterrupt from resume(), which would immediately cause it to pause. Now that I return ThreadStopReason::Defer, it properly resumes.

The issue now is that when I hit Control-C, it's in an odd state. Notably, the GdbStubStateMachine is in the DeferredStopReason(_) state, which makes sense. The IRQ handler I'm working on processes one character at a time, so it doesn't currently handle gdb being in any state other than Pump.

What would be the best way to handle this? That is, I have a character that's appeared because gdb has sent a packet, but the state is in DeferredStopReason(_) so it is unable to accept any new characters.

@daniel5151
Copy link
Owner

daniel5151 commented Jun 18, 2021

Alright, I've pushed up a hotfix that should unblock you.

I've updated the code in the armv4t example to use the state machine API, which allowed me to validate + debug the full e2e state machine flow - including how to handle interrupts packets.

The hotfix I've pushed up is a bit gnarly, but it should suffice until I find some time this weekend to hammer out a cleaner API. In a nutshell: that character you're receiving when you hit ctrl-c should be 0x03, which is a special non-ascii packet GDB sends to signal a client interrupt. For now, you ought to assert that the byte you received in the DeferredStopReason state was in-fact 0x03, and then pass the ThreadStopReason::GdbInterrupt state as a deferred stop reason. See the armv4t example code for details. See EDIT below...

Note that technically speaking, you're allowed to pass whatever stop-reason you like when you're interrupted. ThreadStopReason::GdbInterrupt is simply a convenient alias for ThreadStopReason::Signal(5), or "stopped due to trap".

I think the proper fix will involve exposing the call to pump as part of the DeferredStopReason state, and adding a new GdbStubStateMachine::Interrupted state, which will be transitioned into in the case that the packet parser encounters an interrupt packet. I'm still mulling things over, so if you have any suggestions, do let me know.


EDIT: I found some time to play around with the API, and after a bit of iteration and experimentation, I've settled on something I'm fairly happy with.

In a nutshell, I added a new pump method to GdbStubStateMachine::DeferredStopReason. This pump method is similar to the other pump method, except it reports an Event instead of a Option<DisconnectReason>. An Event can be one of None, Disconnect(DisconnectReason), and a new event: CtrlCInterrupt. When you receive an Event::CtrlCInterrupt, you should immediately run whatever interrupt logic you need to interrupt the debugging session (e.g: stop the process being debugged + report the deferred stop reason).

I also introduced a few easy-to-fix breaking API changes, such as removing the read method from the Connection trait, and removing the explicit StopReason::Defer variant in favor of tweaking the resume API to return an Option<StopReason>.

I've updated the armv4t example with some sample code that should make things pretty clear.

@daniel5151
Copy link
Owner

daniel5151 commented Jun 18, 2021

The first thing GDB does is list all threads in the process. However, when the debugger is first attached, there is no process selected. I've worked around this in my implementation by arbitrarily saying PID 3 is the process being debugged. It's enough to keep things going, but it's not ideal.

Unless you're implemented extended-mode support, GDB will expect to be connected to a running process when it is connected. Please refer to my previous comment for more details: #56 (comment)

Here is the relevant line from the GDB documentation:

With target remote mode: You must either specify the program to debug on the gdbserver command line or use the --attach option (see Attaching to a Running Program).

With target extended-remote mode: You may specify the program to debug on the gdbserver command line, or you can load the program or attach to it using GDB commands after connecting to gdbserver.

That said, do make note the disclaimer at the top of the ExtendedMode IDET docs. Extended mode support is something I implemented in gdbstub on a whim, and as far as I know, it hasn't been thoroughly validated by any "production" implementation 😅

@xobs
Copy link
Contributor Author

xobs commented Jun 21, 2021

I just merged initial support for gdbstub. It's very rough, but it's up at https://github.com/betrusted-io/xous-core/blob/main/kernel/src/debug/gdb_server.rs

@daniel5151
Copy link
Owner

daniel5151 commented Jun 21, 2021

Hey, that's fantastic news!
I can't begin to describe how happy I am seeing gdbstub running in a true bare-metal no_std environment 😊

Thanks again for helping test and validate this new state-machine based execution API for gdbstub. I'm actually really excited about all the possibilities this new state-machine based API brings to the table, as aside from enabling gdbstub to run via interrupt handlers, this API might also be the key to solving some of long-standing design questions, such as how to efficiently wait for Ctrl-C interrupts without polling, or how to support non-stop mode.

It seems that we're finally at the point where the broad strokes of the API are working as intended, and as such, I'll start hammering away at polishing up + documenting all these new changes. I intend to merge the current feature/state-machine-recv-packet branch into dev/0.6 sometime in the next week, thereby unifying gdbstub's two concurrent development streams. I'll let to know once that's done, as you'll definitely want to switch over to the dev/0.6.

As a heads up, there will be a few more breaking changes coming down the pipeline in the dev/0.6 branch. Notably for your integration, I will be tweaking the resume API to make single-stepping optional.

As always, please let me know if you run into any other API warts / issues while fleshing out your gdbstub integration. If you have time time, I would really appreciate it if you could flesh out your gdbstub integration over the next month or so, as that would let us catch / remedy any API issues prior to publishing release 0.6.


Lastly, a couple things I noticed in your initial implementation (yes, I know it's very rough, but I'll still point things out regardless 😉):

  • You really ought to remove those #[inline(never)] annotations for your Target implementation! The only reason those are there in example_no_std are because if they weren't, the optimizing compiler would aggressively inline the dummy methods, making it substantially harder to gauge what kind of binary overhead gdbstub introduces into a project. When integrating gdbstub into an actual project, you want those aggressive optimizations, as it'll substantially reduce gdbstub's binary overhead, and can significantly improve performance.
    • On that note, you wouldn't happen to have any empirical numbers on how much binary overhead enabling the gdbserver feature introduces, would you? I'd be curious to see if my ~10kb ballpark estimate was somewhat close.
  • Make sure to properly implement the set/clear_resume_action handlers, even if it's just to assert that the GDB client is only ever sending a single continue command.
  • In read_addrs, instead of doing .unwrap_or(0xff) of a memory read fails, you should either return a non-fatal TargetError, or truncate the response to just the memory that was able to be read.
    • I just realized that the current read_addrs API doesn't support returning only part of the requested address space, even though the GDB RSP supports that feature. Looks like that's another breaking API change I'll be making in release 0.6 😅
  • Similarly, in write_addrs, instead of panicking if a memory write fails, you should return a non-fatal TargetError instead, as this will report the error back to the GDB client.

@xobs
Copy link
Contributor Author

xobs commented Jun 22, 2021

Here's a version with gdbstub enabled:

[10:01:19 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
 102532    8312    4540  115384   1c2b8 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:01:25 AM] D:/Code/Xous/Core/kernel>

And here it is with the gdbserver feature disabled:

[10:04:35 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
  79360    8276     436   88072   15808 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:04:38 AM] D:/Code/Xous/Core/kernel>

It's closer to 20kB, but that may be due to various debug strings. If I remove #[inline(never)], then the size goes up slightly, and nearly everything gets inlined into the ISR:

No gdbserver:

[10:40:30 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf --features print-panics --no-default-features
[10:41:50 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
  79360    8276     436   88072   15808 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:41:52 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000ac2 t kernel::debug::irq
[10:41:56 AM] D:/Code/Xous/Core/kernel>

With gdbserver:

[10:41:56 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf
[10:42:17 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
 103196    8312    4540  116048   1c550 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:42:20 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<<kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::is_thread_alive::{{closure}}>
00000024 d kernel::debug::gdb_server::GDB_STATE
0000002c t kernel::debug::irq
00000356 t <kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::list_active_threads
00001000 b kernel::debug::gdb_server::GDB_BUFFER
0000447a t kernel::debug::process_irq_character
[10:42:22 AM] D:/Code/Xous/Core/kernel>

Keep in mind that I'm building for speed and not for size, though curiously enough if I switch to optimize for size and not speed, it also is around 20 kB:

[10:42:22 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf --features print-panics --no-default-features
[10:43:14 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
  64952    8276     436   73664   11fc0 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:43:23 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<&mut kernel::debug::Uart>
0000001c t core::fmt::Formatter::debug_struct
00000026 t core::fmt::Formatter::debug_list
00000028 t <kernel::debug::Uart as core::fmt::Write>::write_str
00000678 t kernel::debug::irq
[10:43:26 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf
[10:43:50 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
   text    data     bss     dec     hex filename
  83748    8312    4540   96600   17958 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:43:52 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<<kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::is_thread_alive::{{closure}}>
00000002 t core::ptr::drop_in_place<gdbstub::gdbstub_impl::ext::base::<impl gdbstub::gdbstub_impl::GdbStubImpl<kernel::debug::gdb_server::XousTarget,kernel::debug::Uart>>::handle_base::{{closure}}>
00000002 t core::ptr::drop_in_place<gdbstub::gdbstub_impl::ext::base::<impl gdbstub::gdbstub_impl::GdbStubImpl<kernel::debug::gdb_server::XousTarget,kernel::debug::Uart>>::get_sane_any_tid::{{closure}}>
00000002 t core::ptr::drop_in_place<&mut kernel::debug::Uart>
0000001c t core::fmt::Formatter::debug_struct
00000022 t <kernel::debug::Uart as gdbstub::connection::Connection>::write
00000024 d kernel::debug::gdb_server::GDB_STATE
00000026 t core::fmt::Formatter::debug_list
00000028 t <kernel::debug::Uart as core::fmt::Write>::write_str
000000c2 t <kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::list_active_threads
00001000 b kernel::debug::gdb_server::GDB_BUFFER
000023d6 t kernel::debug::irq
[10:43:53 AM] D:/Code/Xous/Core/kernel>

One thing I have not yet figured out is how to trap on a breakpoint, or even what a breakpoint is. My assumption had been that when gdb inserted a breakpoint, it would replace code in RAM with an illegal instruction that would cause an exception. Perhaps it's doing this and my kernel has configured the CPU to ignore such things. But as a result, I'm not able to issue commands such as finish.

Additionally, stepi never returns. Is this the sort of thing that the RSP needs to announce that it doesn't support, so gdb manually handles stepi?

@daniel5151
Copy link
Owner

Thanks for sharing those numbers! In past experience, I've found that it can be pretty tricky measuring the true overhead of just gdbstub code, since it is inextricably tied to the Target implementation being used. And then there's also the consideration that RISC-V code is typically less "dense" than equivalent x86 code, which can also contribute to overhead. In any case, 20kb is still pretty negligible with today's flash memory sizes 😛


My assumption had been that when gdb inserted a breakpoint, it would replace code in RAM with an illegal instruction that would cause an exception.

Err, not exactly. When you set a breakpoint from the GDB client, the only thing the GDB client does is send the target a request to set a breakpoint at the specific address. The specifics of how to implement that breakpoint are left entirely up to the target. In other words, the GDB client will not automatically patch the instruction stream to insert a breakpoint - if that's how you want to implement breakpoints in your target, that's something you'll need to do yourself.

To get breakpoints working, you'll need to implement the appropriate breakpoint IDETs. In your case, you'll probably just want to start off with basic software breakpoints (i.e: patching the instruction stream with a breakpoint instruction + catching the exception), which would fall under the SwBreakpoints IDET.

Additionally, stepi never returns.

This is likely a byproduct of your stubbed out set/clear_resume_action handlers. When you run stepi, the GDB client will request that the target perform a single-step by calling set_resume_action with ResumeAction::Step and the specified TID. You need to move your current "single-step unimplemented" check into set_resume_action, as right now, it only checks if the default resume action (i.e: the resume action all non-specified threads in the process should take) is a single-step, which is mostly likely isn't (IIRC, default_resume_action is always set to ResumeAction::Continue).

Note that once I tweak the resume API to make single-stepping optional (tracked under #59), the gdbstub won't even report that it supports single-stepping, which means the GDB client won't even let you invoke stepi.

@xobs
Copy link
Contributor Author

xobs commented Jun 24, 2021

I'm looking to hook the kernel print!() command to use gdbstub::output!() instead. I've run into two issues:

  1. The output!() macro uses std::fmt instead of core::fmt. Is this intentional, or is it easy enough to switch it to use core::fmt?
  2. How do I create a ConsoleOutput struct? I'll need to pass this to the first argument of output!() but I don't see any way to do that, aside from inside handle_monitor_cmd(). I have something hooked that manually writes to the pipe when it's not halted. Is there another approach I should take?

Regarding breakpoint types, I thought GDB would set its own breakpoints by poking into RAM if it detected that a program was in RAM. At least that's what I recall observing when I was working on my own gdb server targeting bare metal: https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/crates/core/gdb.rs

It only supports two BreakPointType::BreakHard, but I thought I recall that if an address was in RAM gdb would create a breakpoint without sending a Z packet. Probably that wasn't actually the case and I'm just misremembering.


How do I signal to gdbstub that it should halt? Is there something I pass to pump() in order to get it to do that?


How should I fill in set_resume_action? When does it get called? Is it called when the system halts, or when it resumes? Is there an equivalent halt() to the existing resume()?

@daniel5151
Copy link
Owner

daniel5151 commented Jun 24, 2021

  1. The output!() macro uses std::fmt instead of core::fmt

Yikes, good catch 😱
I just pushed a fix up on the feature branch. That's definitely a oversight on my part. Good catch!

  1. How do I create a ConsoleOutput struct? ... I don't see any way to do that, aside from inside handle_monitor_cmd()

See #51. That should give you all the context you need.

Then again, on further thought, now that we have this state machine based model, I bet it'd be pretty easy to add a console_write method to the GdbStubStateMachine::DeferredStopReason state. Maybe I'll look into adding that...

Given that's the state you'll probably be in most of the time, that'd make it pretty easy to log output via GDB. Of course, you would have to be careful not to log from inside any gdbstub methods, as you would need to get a double-reference to the global gdbstub instance, which will result in some serious badness.

Regarding breakpoint types, I thought GDB would set its own breakpoints by poking into RAM

Read through https://sourceware.org/gdb/onlinedocs/gdb/Remote-Stub.html, particularly the "Stub Contents" and "Bootstrapping" sections. Based on my reading + personal observations, GDB has never tried to change memory contents with the intention of setting a breakpoint. Then again, I've never tried to set a breakpoint without having the breakpoint IDETs implemented, so maybe there's a fallback path somewhere in the GDB client, but I highly doubt it.

If you're feeling adventurous, consider reading through the GDB client source and seeing if you can find any of the logic you're describing. A good starting point might be somewhere in https://github.com/bminor/binutils-gdb/blob/master/gdb/remote.c

How do I signal to gdbstub that it should halt?

AFAIK, the GDB RSP doesn't define a server-to-client disconnect packet. You've basically got two options when you want to end a debugging session:

  • detach the client from the server (i.e: typing quit or detach in the GDB client). this will result in a DisconnectReason::Disconnect event, which you can handle appropriately.
  • define a custom fatal Target::Error, and return it whenever you want to stop the debugging session. It won't be a clean disconnect, but hey, it'll work.

Please let me know if I've missed some obscure feature of the RSP that enables GDB targets to cleanly end a debugging session on their terms.

How should I fill in set_resume_action

The docs and examples are there for a reason 😏

Is there an equivalent halt() to the existing resume()?

I'm not sure what you mean? Are you thinking of something like ExtendedMode::kill?

@xobs
Copy link
Contributor Author

xobs commented Jun 25, 2021

See #51. That should give you all the context you need.

Thanks! That does seem better than my current hamfisted approach. I'll track that and integrate it when it's ready.

If you're feeling adventurous, consider reading through the GDB client source and seeing if you can find any of the logic you're describing.

The very end of remote.c https://github.com/bminor/binutils-gdb/blob/master/gdb/remote.c#L10535-L10593 calls return memory_insert_breakpoint (this, gdbarch, bp_tgt); under certain circumstances that I don't currently understand.

This hands the call off to gdbarch_memory_insert_breakpoint() for the given target. On all platforms except m32r, this calls default_memory_insert_breakpoint() which is at https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/mem-break.c#L36-L70

This function figures out what a breakpoint looks like on this platform by calling gdbarch_sw_breakpoint_from_kind(). On RISC-V this is one of two byte patterns: https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/riscv-tdep.c#L262-L278 and on ARM it's more complicated because they also have endianness to deal with: https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/arm-linux-tdep.c#L78-L95

After it has the breakpoint value and size, default_memory_insert_breakpoint() reads the contents of memory, stashes it in a shadow value, and writes the target's breakpoint value to RAM.

Please let me know if I've missed some obscure feature of the RSP that enables GDB targets to cleanly end a debugging session on their terms.

At this moment, I'm more interested in telling the client that the server has hit a breakpoint. What do I pass to pump() to tell gdbstub that I've hit a breakpoint? As-is, there are two ways to go from Run mode to Stop mode: The user hits Control-C, or the target hits a breakpoint for some reason. The former case is handled with pump(). How do I handle the latter case?

The docs and examples are there for a reason 😏

The example I'm reading has this, which doesn't give much of an idea what it does or when it gets called:

#[inline(never)]
fn set_resume_action(&mut self, _tid: Tid, _action: ResumeAction) -> Result<(), Self::Error> {
    print_str("> set_resume_action");
    Ok(())
}

I'll try to understand what the function is for by reading the code, but the documentation isn't really helping me to understand it. What is it used for? The docs say "A simple implementation of this method would simply update an internal HashMap<Tid, ResumeAction>.", but why wouldit do that? When is this HashMap consulted? Is this function called when the target is halted due to a breakpoint? When it's halted due to the user hitting Control-C? When the target is resumed? Once when the host connects? And when is the value consulted? What's the difference between resume() and set_resume_action()?

I'm not sure what you mean? Are you thinking of something like ExtendedMode::kill?

The resume() Trait method gets called when the target is going to be resumed. Is there an equivalent Trait method that will get called when the target is halting? For example, when gdbstub wants to halt the target due to the user hitting Control-C. Is that indicated by e.g. having the state machine in the GdbStubStateMachine::DeferredStopReason() state? And what value should I be passing to gdb_state.deferred_stop_reason()? Right now I seem to be passing ThreadStopReason::DoneStep which I feel may not be correct, since it may be GdbInterrupt.

Overall I think I'm still subtly misusing gdbstub, but it does seem almost all there.

@daniel5151
Copy link
Owner

Wow, thanks for digging into the GDB client code!

It seems that memory_insert_breakpoint is hit when the target doesn't support the Z0 packet (i.e: the SwBreakpoints IDET). If I'm reading this right, then if you sniff the data being sent to/from gdbstub after setting a breakpoint, you should be seeing the target set up the breakpoint instructions appropriately...

If that's the case, then you'll need to make sure you've set up your interrupt / exception handlers correctly to intercept the breakpoint.

What do I pass to pump() to tell gdbstub that I've hit a breakpoint?

You don't call pump() if you've hit a breakpoint. If you've hit a breakpoint, then you should be inside the breakpoint exception handler, at which point you wouldn't even have a byte to pass to pump(). In that case, you'd call deferred_stop_reason with ThreadStopReason::SwBreakpoint.

See the big-picture summary at the end of this comment if you're still confused.

I'll try to understand what the function is for by reading the code, but the documentation isn't really helping me to understand it.

Fair enough! In this case, the documentation you actually want to read is resume's documentation, since it explains the "lifecycle" of resume, set_resume_action, and clear_resume_actions:

Resume execution on the target.

Prior to calling resume, gdbstub will call clear_resume_actions, followed by zero or more calls to set_resume_action, specifying any thread-specific resume actions.

The default_action parameter specifies the “fallback” resume action for any threads that did not have a specific resume action set via set_resume_action. The GDB client typically sets this to ResumeAction::Continue, though this is not guaranteed. <-- This isn't true, it'll always be set to ResumeAction::Continue. I'll be removing this entirely as part of the "support optional single threading" work.

I'll add a pointer to refer to the resume docs from set_resume_action and clear_resume_actions, to make sure folks don't miss it.

As for the example code... there's a reason I explicitly linked you to the armv4t_multicore example, as opposed to the example_no_std example. That code sample you linked isn't descriptive because it's example_no_std doesn't implement an actual target - it's just a barebones framework for testing / validating the gdbstub runs in no_std environment.

When you're working on implementing actual functions, you'll want to refer to the significantly more fleshed out armv4t and armv4t_multicore examples.

For example, when gdbstub wants to halt the target due to the user hitting Control-C.

Hitting Ctrl-C does not halt the target. The only thing hitting Ctrl-C does is send a "interrupt" packet to the target, which is then processed via pump, and eventually results in a Event::CtrlCInterrupt. At that point, you'll want to handle the interrupt request however you need to (e.g: by pausing the current process), and then passing an appropriate stop reason to deferred_stop_reason.

Please re-read my earlier comment: #56 (comment)
I edited it adding more context, which you may have missed.


Taking a step back, I think I need to explain what the GdbStubStateMachine::DeferredStopReason state entails.

In a nutshell, the first time you start up gdbstub, it will be in the Pump state, as it hasn't established a connection yet, and it doesn't know the runtime state of the target. At some point, the GDB client will request the run the target, at which point, your implementation will return None from resume, indicating that the stub should transition into the DeferredStopReason state. At this point the GDB stub is waiting for a Stop Reply Packet

deferred_stop_reason is how you report that stop reply packet.

In your implementation, you'll end up calling deferred_stop_reason in both your UART handler's GDB loop (i.e: the whole "Ctrl-C stop reason business), and also from the breakpoint exception handler!

Let me know if that clears things up.

@xobs
Copy link
Contributor Author

xobs commented Jun 25, 2021

With the current design, it seems as though GdbStubError::TargetError(_) is fatal.

This is due to the fact that a gdb_state_machine is an enum. If the enum is in the state GdbStubStateMachine::Pump(inner), then we must call inner.pump(&mut target, incoming_byte). However, inner.pump() takes self, and therefore consumes inner, which moves it out of GdbStubStateMachine.

To work around this, I basically consume and reconstitute the state machine on every loop.

I.e.

    if let Some(gdb_state_machine) = unsafe { GDB_STATE.take() }
    {
        let new_gdb = match gdb_state_machine {
            GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(&mut target, b) {
                Ok((gdb, None)) => gdb,
                Err(e) => {
                    cleanup();
                    println!("gdbstub error: {}", e);
                    return true;
                }
            GdbStubStateMachine::DeferredStopReason(gdb_state) => {
                match gdb_state.deferred_stop_reason(&mut target, ThreadStopReason::DoneStep) {
                    Ok((gdb, None)) => gdb,
                    Err(e) => {
                        cleanup();
                        println!("deferred_stop_reason_error: {:?}", e);
                        return true;
                    }

                }
            }
        };
        unsafe { GDB_STATE = Some(new_gdb) };
    }

The problem is that the happy path returns (GdbStubStateMachineInner, $something), but the error path simply returns Error<T>. So whenever an error hits, the state machine's Inner is consumed and I can no longer replace GDB_STATE.

@daniel5151
Copy link
Owner

To work around this, I basically consume and reconstitute the state machine on every loop.

Yes, that is the intended way to work with the API.

The problem is that the happy path returns (GdbStubStateMachineInner, $something), but the error path simply returns Error<T>. So whenever an error hits, the state machine's Inner is consumed and I can no longer replace GDB_STATE.

Ah, indeed, good catch. I should tweak the API to return a (GdbStubStateMachine, Result<..>), instead of a Result<(GdbStubStateMachine, ..), ..>. It ought to be possible to report a TargetError, but then re-enter the GDB debugging loop to perform "post mortem" debugging. I'll probably push a fix up sometime tomorrow / over the weekend.


...so remember when you asked "How do I signal to gdbstub that it should halt" and I said:

AFAIK, the GDB RSP doesn't define a server-to-client disconnect packet.

yeah, that was pre-morning-coffee me talking.

The way to signal a "halt" is to pass ThreadStopReason::Terminated to deferred_stop_reason, and then perform whatever cleanup logic you need to when handling the resulting DisconnectReason::TargetTerminated.

Theoretically, I can add a new signal_terminate() method that provides a shortcut to this logic.

Reporting ThreadStopReason::Terminated will inform the GDB client that the target has "halted", which should close the debugging session on the client side.

@daniel5151
Copy link
Owner

Hahahaha, I just tested it, and GDB totally rewrites memory to insert a breakpoint if I disable the Breakpoints IDET in the armvt4 example!

Time to update my docs...

@daniel5151 daniel5151 added the API-breaking Breaking API change label Jun 25, 2021
@daniel5151
Copy link
Owner

Heads up, I've merged the feature/state-machine-recv-packet branch into dev/0.6, and will be deleting the feature branch shortly. All subsequent development will be happening on the dev/0.6 branch, so make sure to switch over!

@gz
Copy link
Contributor

gz commented Sep 14, 2021

Sorry for hijacking this, but I'm basing my implementation directly on the dev/0.6 branch because of this issue here and I have some basic question regarding the GdbStubStateMachine and GdbStub:

  • When should I have just a GdbStub and at what point during the lifetime of a program/debug session am I supposed to call run_state_machine() to get to a GdbStubStateMachine? It seems as if as soon as I get a GdbStub I can immediately call run_state_machine and forget about GdbStub, but then I don't understand why I'd need to keep a GdbStub around ever in the first place?

  • In your code-examples the state machine struct is either on the stack or lives in some static Option<> type. On the stack is rarely feasible, and a static Option<> type is a bit unsafe, especially in a multi-core scenario. However, sticking it in say a spin::Mutex seems difficult to me given the API (needs self instead of &mut self in several places). Is the idea here to always have an Option<> or Mutex<Option<>> and take it out and put it back every time you interact with the STM, or am I missing something obvious?

@daniel5151
Copy link
Owner

Hey @gz, no need to apologize, these are perfectly relevant questions :)

Plus, it serves as yet another reminder that I really need to find some time to polish up and push out 0.6, since you're far from the only one who's currently working directly off the dev/0.6 branch 😅


  • It seems as if as soon as I get a GdbStub I can immediately call run_state_machine and forget about GdbStub

Yep, that's exactly right.

When you spell it out like this, I realize that it does seem a bit silly to have this be a two-step operation... I think a better long-term approach would be to update GdbStubBuilder to have a build_state_machine method that bypasses this "transient" GdbStub.

  • Is the idea here to always have an Option<> or Mutex<Option<>> and take it out and put it back every time you interact with the STM, or am I missing something obvious?

With the current API, yes, that would indeed be how you'd have to use it. It's been a while since I hacked on this code, but IIRC I couldn't find a good way to implement the typestate state machine API without requiring the implementer to play "hot potato" with the various states.

Consequently, if you've got a suggestion on how you might tweak the API to maintain type-system enforced sequencing/correctness while using &mut self, I would be more than happy to tweak the current approach.


Also, if you're at liberty to share more details, I'm very interested to hear more about what context you're using gdbstub in.

@gz
Copy link
Contributor

gz commented Sep 14, 2021

Hi @daniel5151 thanks for the clarification! I hacked a bit more on this yesterday and I think I was able to wrap my head around the typed state-machine. IMO the organization you have made a lot of sense once I grasped it.

Re sharing my use-case: Sure, it's all open-source, I'm currently adding gdb remote debugging support for an experimental x86-64 multicore kernel here: vmware-labs/node-replicated-kernel#52
(still in pretty rough shape but hopefully I get it in something more polished within the next few weeks :))

@daniel5151
Copy link
Owner

Ahh, very cool! Thanks for sharing!

If you have any more questions / suggestions about the state machine API (or any other gdbstub API for that matter), please let me know.

@gz
Copy link
Contributor

gz commented Sep 14, 2021

Actually, I do have a more general question where you might be able to know how to best express this with gdbstub. The kernel binary in my case is relocated at a random address (e.g., changes every time the system boots, similar to Linux with KASLR). GDB is usually confused with this as it looks for the symbols at the offsets given in the ELF file and so one needs to go and manually set this offset when loading the binary (e.g., using the symbol-file command).

I was trying to find if I can somehow send the symbol-file command or something equivalent from the kernel directly to gdb with the right offset information. Or, if this isn't possible (which is what I suspect) what would some other good way be to pass this offset back to gdb in an automated fashion (combination of custom monitor command and .gdbinit scripting etc.?).

PS: And thanks btw for all the hard and amazing work on gdbstub, it definitely saved me a ton of work if I had to implement all this by myself!

@daniel5151
Copy link
Owner

daniel5151 commented Sep 14, 2021

Would SectionOffsets be what you're looking for?
https://docs.rs/gdbstub/0.5.0/gdbstub/target/ext/section_offsets/index.html

Also check out #20, which might be related.

dev/0.6 also includes the MemoryMap IDET, which could also be useful.

As a meta note: gdbstub implements protocol extensions on an as-needed basis, so if you find that gdbstub is missing a particular protocol feature, you can check if the feature is implemented at the protocol level, and if it is, it should be easy to open a PR and add it to gdbstub :)

https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets

@gz
Copy link
Contributor

gz commented Sep 14, 2021

Just wanted to report section offsets works great, thanks a lot for the help!

@daniel5151
Copy link
Owner

To give a quick update regarding this feature:

I'm pretty happy with the current state of the state machine API, and once @gz has a chance to integrate the latest changes, I'll try to find some time to properly document everything + push out a proper 0.6 release.

@xobs, not sure if you're still tracking this, but if you get the chance, it would be super useful if you could update your implementation as well, and let me know if there are any issues / feature gaps.

Cheers!

@daniel5151
Copy link
Owner

I believe that the time has finally come close out this issue.

The goal of this issue was to add an bare-metal API to gdbstub, and as far as I can tell, the current iteration of the state machine API has indeed done that.

If you encounter any issues with this API down the line, please open a new issue instead of re-opening / commenting on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking Breaking API change API-ergonomics Nothing's "broken", but the API could be improved new-api Add a new feature to the API (possibly non-breaking)
Projects
None yet
Development

No branches or pull requests

3 participants