-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for Bare-Metal OS Support #56
Comments
Adding more context from #31 -- Specifically the target I have in mind is Xous running on Betrusted. Xous is an operating system we're working on. It's bare-metal and runs on custom hardware. Currently it is targeting Betrusted, which runs a RISC-V core on a Xilinx FPGA in a keyboard "phone" form factor. Xous is written entirely in Rust, and is a microkernel architecture. We have the beginnings of libstd ported and running on The kernel itself is a cooperative microkernel and only has access to a TRNG for generating random server IDs. We're considering adding a timer for certain interruptable mutex operations as an extension. All operating system fundamentals are handled by servers. For example, the console UART is handled by a log server, which acts as stdout. The current approach to debugging is to use Renode, which has support for acting as a GDB server. Its support for the MMU is still somewhat limited, which makes perfect sense given that it's either used to debug deeply-embedded projects that have no MMU, or to run something heavy like Linux that has support for native debuggers.
|
I'm trying to use 297e065 but I have a few questions:
In my kernel I'm introducing a new process state that indicates a process is being debugged, which allows other processes to interact with it but without allowing it to get scheduled. For example, other processes can send it messages which will pile up in its inbox. |
Thanks for opening the issue, and for giving my POC branch a try! Like I said in the other thread, I'm really excited to see what you come up with, since using To answer your questions about 297e065: Stubbing out
|
The problem with that code is that it's running in an interrupt handler, and so has no context to begin with. In this kernel at least, a brand-new stack frame and context is generated every time an interrupt hits, and it's disposed of when the interrupt exits. Things in the The interrupt handler looks something like this: static mut GDB_SERVER: Option<(GdbStubStateMachine<XousTarget, Uart>, XousTarget)> = None;
pub fn irq(_irq_number: usize, _arg: *mut usize) {
let b = Uart {}
.getc()
.expect("no character queued despite interrupt") as char;
// GDB server exists, hand character to the GDB server
unsafe {
if let Some((gdb, target)) = GDB_SERVER.as_mut() {
gdb.pump(&mut target, b).unwrap();
return;
}
}
// Not currently in GDB, process normally.
match b {
// ...
'g' => {
use gdb_server::*;
println!("Starting GDB server -- attach your debugger now");
let xous_target = XousTarget::new();
match gdbstub::GdbStubBuilder::new(Uart {})
.with_packet_buffer(unsafe { &mut GDB_BUFFER })
.build()
{
Ok(gdb) => unsafe { GDB_SERVER = Some((gdb, xous_target)) },
Err(e) => println!("Unable to start GDB server: {}", e),
}
}
'h' => print_help(),
}
} Therefore, code execution will flow into "blocking" until a stop event occurs is fine, and was the mental model I've been working with. Except we wouldn't block, we'd return. The process in question would be descheduled. Calling
I almost wonder if this isn't a thing that |
Ahhh, I understand... My shoddy little microkernel used stateful interrupt handlers that actually maintained their own separate execution contexts, but it's totally reasonable to keep interrupt handlers lean instead (arguably better tbh).
Kinda. Ideally, Rust would stabilize native generators, since what we're really interested here is some sort of "yield" construct. It would be unfortunate to add a bunch of That said, I just took a fresh look at the source code, and I realized that past-me may have inadvertently structured You'll find that the With this in mind, it shouldn't be too difficult to add a new Now, unfortunately, I am going to be quite busy for the next week or so (work is ramping up + it'll be a long weekend here in the states), so I don't think I'll have time to hack together an implementation until sometime next week. That said, if you're willing to get your hands dirty, I think I've provided enough context such that you can fork Let me know if what I described makes sense, and if you think that's something you might be able to tackle yourself. |
That does make sense, and I'll see if I can hack something together. Unfortunately, I'll also be busy with another project for the next week, so I'm not sure how much I'll be able to complete. |
Sounds good. I'll set a reminder to circle back on this in a week or so to see how things are going, and in the meantime, I'll try and squeeze in some time to throw something together myself. |
Something strange is happening with linking. When I include the call to
If I comment out the call to |
Ahh, how odd... I know for a fact that |
Fascinating. I'm pretty sure it's a rust bug. If I build for I'll open a bug there. In the meantime, I have workarounds. |
I opened a new bug at rust-lang/rust#85736 -- seems like a regression, since it works in the older compiler. |
It's a regression in lto support, which unfortunately derailed my momentum. Now I'm pondering how to actually implement single-step debugging. Do you know if gdb is happy to run without single-step support? |
Alright, I'm back from the long weekend, and it's time to get back into the swing of things. Ah, yeah, would you look at that! It seems you really did stumble on a regression in the Rust compiler. Good thing you reported it, and hopefully it'll get fixed ASAP! As for your question regarding single stepping, I would refer to the GDB docs here: https://sourceware.org/gdb/current/onlinedocs/gdb/Overview.html#Overview
And hey, would you look at that! It turns out that you can get away without supporting single stepping support! I'll be honest, I totally didn't realize that something as seemingly fundamental as single-stepping is actually an optional operation, and as such, Thanks for bringing this to my attention! Thankfully, this wouldn't be a difficult option to "plumb" through. My gut feeling is that we can just add a new While this isn't actually the cleanest solution (requiring implementors to add a logically unreachable match arm in their I'm expecting to budget some time to hack away at Lastly, I was also planning on taking a crack at implementing support for "deferred resume" sometime later today / tomorrow. I've got some neat ideas on how to structure the API, and I'm excited to see what I'll be able to hack together :) |
Alright, I just pushed up a [hopefully] working version of "deferred resume" to the Fair warning: I have not had a chance to test this code outside of "making sure it compiles". That said, given that we are writing Rust, I feel fairly confident that it'll Just Work™️ right off the bat. Lets see if I'm right 😄. Note that I decided to use a typestate based API to enforce correct GDB protocol sequencing at compile-time, and as such, the API might be a bit confusing if you're not familiar with this pattern. Please refer to the code in Please give these changes a shot, and let me know how it goes! I didn't get the chance to work on making single-stepping optional, though I should have some time to finish it up tomorrow. In the meantime, you'll just have to use the workaround I mentioned above. |
I've been thinking about this typestate token based API for a couple days now, and I've realized that to really make the API rock-solid, I think I'll need to make some API breaking changes at some point. That said, I'm not sure if that'll be as part of this WIP PR or as part of some future work, so for now, I'm just going to jot down these ideas before I forget them. Note that these are not pressing issues, and are only useful as a way to "tighten up" the API and make it impossible to misuse the library at compile time. A truly malicious implementation could unsafely craft valid typestate tokens out of thin air, and there isn't anything we can do about it. Enforce (at compile time) that
|
FYI, I just pushed up yet another update that addressed the issue of "Tying typestate tokens to a unique instance of GdbStub". The solution? Get rid of tokens entirely, and instead have the Shoutout to https://hoverbear.org/blog/rust-state-machine-pattern/ for providing a very easy-to-use example that could be modified with impunity. As always, check out the Cheers |
Thanks! That does seem easier. I'm struggling a bit with lifetimes trying to convince Rust to store mutable items inside an Specifically, my pattern looks something like this: pub static mut GDB_SERVER: Option<GdbStubStateMachine<XousTarget, super::Uart>> = None;
pub static mut GDB_TARGET: Option<XousTarget> = None;
pub static mut GDB_BUFFER: [u8; 4096] = [0u8; 4096];
pub fn irq(_irq_number: usize, _arg: *mut usize) {
let b = Uart {}
.getc()
.expect("no character queued despite interrupt");
#[cfg(feature = "gdbserver")]
unsafe {
use crate::debug::gdb_server::{GDB_SERVER, GDB_TARGET};
use gdbstub::state_machine::GdbStubStateMachine;
use gdbstub::{DisconnectReason, GdbStubError};
if let Some(gdb) = GDB_SERVER.as_mut().take() {
let target = GDB_TARGET.as_mut().unwrap();
let new_gdb = match gdb {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
// Remote disconnected -- leave the `GDB_SERVER` as `None`.
Ok((_, Some(disconnect_reason))) => {
match disconnect_reason {
DisconnectReason::Disconnect => println!("GDB Disconnected"),
DisconnectReason::TargetExited(_) => println!("Target exited"),
DisconnectReason::TargetTerminated(_) => println!("Target halted"),
DisconnectReason::Kill => println!("GDB sent a kill command"),
}
return;
}
Err(GdbStubError::TargetError(_e)) => {
println!("Target raised a fatal error");
return;
}
Err(_e) => {
println!("gdbstub internal error");
return;
}
Ok((gdb, None)) => gdb,
},
// example_no_std stubs out resume, so this will never happen
GdbStubStateMachine::DeferredStopReason(_) => {
panic!("Deferred stop shouldn't happen")
}
};
GDB_SERVER = Some(new_gdb);
return;
}
}
match b {
#[cfg(feature = "gdbserver")]
b'g' => {
use gdb_server::{XousTarget, GDB_BUFFER, GDB_SERVER, GDB_TARGET};
println!("Starting GDB server -- attach your debugger now");
unsafe { GDB_TARGET = Some(XousTarget::new()) };
match gdbstub::GdbStubBuilder::new(Uart {})
.with_packet_buffer(unsafe { &mut GDB_BUFFER })
.build()
{
Ok(gdb) => match gdb.run_state_machine() {
Ok(state) => unsafe { GDB_SERVER = Some(state) },
Err(e) => println!("Unable to start GDB state machine: {}", e),
},
Err(e) => println!("Unable to start GDB server: {}", e),
}
}
// Other characters handled here
}
// Remainder of `debug` console here, for when a terminal is not connected
} The problem I'm running into right now is that
|
Apologies, I've managed to at least convince it to compile by removing the if let Some(mut gdb) = GDB_SERVER.take() {
let target = GDB_TARGET.as_mut().unwrap();
let new_gdb = match gdb {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
// Remote disconnected -- leave the `GDB_SERVER` as `None`.
Ok((_, Some(disconnect_reason))) => {
match disconnect_reason {
DisconnectReason::Disconnect => println!("GDB Disconnected"),
DisconnectReason::TargetExited(_) => println!("Target exited"),
DisconnectReason::TargetTerminated(_) => println!("Target halted"),
DisconnectReason::Kill => println!("GDB sent a kill command"),
}
return;
}
Err(GdbStubError::TargetError(_e)) => {
println!("Target raised a fatal error");
return;
}
Err(_e) => {
println!("gdbstub internal error");
return;
}
Ok((gdb, None)) => gdb,
},
// example_no_std stubs out resume, so this will never happen
GdbStubStateMachine::DeferredStopReason(_) => {
panic!("Deferred stop shouldn't happen")
}
};
GDB_SERVER = Some(new_gdb);
return;
} |
Still working on it, but in case you're curious the [messy] repo is checked in at https://github.com/betrusted-io/xous-core/blob/debugger/kernel/src/debug.rs I need to add OS-level support for addressing memory and putting processes in a |
Fantastic! I'm glad to see that you've got at least a "skeleton" of a full-fledged As always, I'm looking forward to seeing how your implementation shapes up, and whether you run into any more API papercuts. Once you've got something up and running, we can move on to polishing up the feature branch and merging it into By the way, you should remove those Oh, and one more thing: I noticed that you've already got a few OS specific debug features in your code (e.g: dumping page tables, reporting ram usage, etc...) which become inaccessible once you enter GDB mode. In case you aren't aware, but the GDB RSP actually defines a mechanism by which targets can implement custom client commands, which |
Yep! I was planning on adding Right now the big issue is supporting pausing processes on the system, so it's taking a bit longer than I'd like to integrate things. |
Actually, that isn't true. GDB can Unfortunately, while That said, you probably can leverage the current |
Just a heads up, but now that #60 has been reported, I'm almost certainly going to be publishing a While the changes in As always, any and all feedback would be much appreciated! |
I apologise, I was called away to another project. However, I am now able to resume work on this, and I will continue to work to integrate I just managed to get process suspension working. That is, I can now move a process from |
I got it wired up, so now I'm able to peek and poke memory, inspect registers, and list threads. There are a few things to note.
|
Here's an example of a process being debugged:
|
Alright, I realized the problem with (1) above where it would get interrupted immediately. Previously I returning The issue now is that when I hit Control-C, it's in an odd state. Notably, the What would be the best way to handle this? That is, I have a character that's appeared because gdb has sent a packet, but the state is in |
Alright, I've pushed up a hotfix that should unblock you. I've updated the code in the
Note that technically speaking, you're allowed to pass whatever stop-reason you like when you're interrupted. I think the proper fix will involve exposing the call to EDIT: I found some time to play around with the API, and after a bit of iteration and experimentation, I've settled on something I'm fairly happy with. In a nutshell, I added a new I also introduced a few easy-to-fix breaking API changes, such as removing the I've updated the |
Unless you're implemented Here is the relevant line from the GDB documentation:
That said, do make note the disclaimer at the top of the |
I just merged initial support for gdbstub. It's very rough, but it's up at https://github.com/betrusted-io/xous-core/blob/main/kernel/src/debug/gdb_server.rs |
Hey, that's fantastic news! Thanks again for helping test and validate this new state-machine based execution API for It seems that we're finally at the point where the broad strokes of the API are working as intended, and as such, I'll start hammering away at polishing up + documenting all these new changes. I intend to merge the current As a heads up, there will be a few more breaking changes coming down the pipeline in the As always, please let me know if you run into any other API warts / issues while fleshing out your Lastly, a couple things I noticed in your initial implementation (yes, I know it's very rough, but I'll still point things out regardless 😉):
|
Here's a version with gdbstub enabled:
And here it is with the
It's closer to 20kB, but that may be due to various debug strings. If I remove No gdbserver:
With gdbserver:
Keep in mind that I'm building for speed and not for size, though curiously enough if I switch to optimize for size and not speed, it also is around 20 kB:
One thing I have not yet figured out is how to trap on a breakpoint, or even what a breakpoint is. My assumption had been that when gdb inserted a breakpoint, it would replace code in RAM with an illegal instruction that would cause an exception. Perhaps it's doing this and my kernel has configured the CPU to ignore such things. But as a result, I'm not able to issue commands such as Additionally, |
Thanks for sharing those numbers! In past experience, I've found that it can be pretty tricky measuring the true overhead of just
Err, not exactly. When you set a breakpoint from the GDB client, the only thing the GDB client does is send the target a request to set a breakpoint at the specific address. The specifics of how to implement that breakpoint are left entirely up to the target. In other words, the GDB client will not automatically patch the instruction stream to insert a breakpoint - if that's how you want to implement breakpoints in your target, that's something you'll need to do yourself. To get breakpoints working, you'll need to implement the appropriate breakpoint IDETs. In your case, you'll probably just want to start off with basic software breakpoints (i.e: patching the instruction stream with a breakpoint instruction + catching the exception), which would fall under the
This is likely a byproduct of your stubbed out Note that once I tweak the resume API to make single-stepping optional (tracked under #59), the |
I'm looking to hook the kernel
Regarding breakpoint types, I thought GDB would set its own breakpoints by poking into RAM if it detected that a program was in RAM. At least that's what I recall observing when I was working on my own gdb server targeting bare metal: https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/crates/core/gdb.rs It only supports two How do I signal to How should I fill in |
Yikes, good catch 😱
See #51. That should give you all the context you need. Then again, on further thought, now that we have this state machine based model, I bet it'd be pretty easy to add a Given that's the state you'll probably be in most of the time, that'd make it pretty easy to log output via GDB. Of course, you would have to be careful not to log from inside any gdbstub methods, as you would need to get a double-reference to the global gdbstub instance, which will result in some serious badness.
Read through https://sourceware.org/gdb/onlinedocs/gdb/Remote-Stub.html, particularly the "Stub Contents" and "Bootstrapping" sections. Based on my reading + personal observations, GDB has never tried to change memory contents with the intention of setting a breakpoint. Then again, I've never tried to set a breakpoint without having the breakpoint IDETs implemented, so maybe there's a fallback path somewhere in the GDB client, but I highly doubt it. If you're feeling adventurous, consider reading through the GDB client source and seeing if you can find any of the logic you're describing. A good starting point might be somewhere in https://github.com/bminor/binutils-gdb/blob/master/gdb/remote.c
AFAIK, the GDB RSP doesn't define a server-to-client disconnect packet. You've basically got two options when you want to end a debugging session:
Please let me know if I've missed some obscure feature of the RSP that enables GDB targets to cleanly end a debugging session on their terms.
The docs and examples are there for a reason 😏
I'm not sure what you mean? Are you thinking of something like |
Thanks! That does seem better than my current hamfisted approach. I'll track that and integrate it when it's ready.
The very end of This hands the call off to This function figures out what a breakpoint looks like on this platform by calling After it has the breakpoint value and size,
At this moment, I'm more interested in telling the client that the server has hit a breakpoint. What do I pass to
The example I'm reading has this, which doesn't give much of an idea what it does or when it gets called: #[inline(never)]
fn set_resume_action(&mut self, _tid: Tid, _action: ResumeAction) -> Result<(), Self::Error> {
print_str("> set_resume_action");
Ok(())
} I'll try to understand what the function is for by reading the code, but the documentation isn't really helping me to understand it. What is it used for? The docs say "A simple implementation of this method would simply update an internal HashMap<Tid, ResumeAction>.", but why wouldit do that? When is this
The Overall I think I'm still subtly misusing |
Wow, thanks for digging into the GDB client code! It seems that If that's the case, then you'll need to make sure you've set up your interrupt / exception handlers correctly to intercept the breakpoint.
You don't call See the big-picture summary at the end of this comment if you're still confused.
Fair enough! In this case, the documentation you actually want to read is
I'll add a pointer to refer to the As for the example code... there's a reason I explicitly linked you to the When you're working on implementing actual functions, you'll want to refer to the significantly more fleshed out
Hitting Ctrl-C does not halt the target. The only thing hitting Ctrl-C does is send a "interrupt" packet to the target, which is then processed via Please re-read my earlier comment: #56 (comment) Taking a step back, I think I need to explain what the In a nutshell, the first time you start up
In your implementation, you'll end up calling Let me know if that clears things up. |
With the current design, it seems as though This is due to the fact that a To work around this, I basically consume and reconstitute the state machine on every loop. I.e. if let Some(gdb_state_machine) = unsafe { GDB_STATE.take() }
{
let new_gdb = match gdb_state_machine {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(&mut target, b) {
Ok((gdb, None)) => gdb,
Err(e) => {
cleanup();
println!("gdbstub error: {}", e);
return true;
}
GdbStubStateMachine::DeferredStopReason(gdb_state) => {
match gdb_state.deferred_stop_reason(&mut target, ThreadStopReason::DoneStep) {
Ok((gdb, None)) => gdb,
Err(e) => {
cleanup();
println!("deferred_stop_reason_error: {:?}", e);
return true;
}
}
}
};
unsafe { GDB_STATE = Some(new_gdb) };
} The problem is that the happy path returns |
Yes, that is the intended way to work with the API.
Ah, indeed, good catch. I should tweak the API to return a ...so remember when you asked "How do I signal to gdbstub that it should halt" and I said:
yeah, that was pre-morning-coffee me talking. The way to signal a "halt" is to pass Theoretically, I can add a new Reporting |
Hahahaha, I just tested it, and GDB totally rewrites memory to insert a breakpoint if I disable the Time to update my docs... |
Heads up, I've merged the |
Sorry for hijacking this, but I'm basing my implementation directly on the dev/0.6 branch because of this issue here and I have some basic question regarding the
|
Hey @gz, no need to apologize, these are perfectly relevant questions :) Plus, it serves as yet another reminder that I really need to find some time to polish up and push out 0.6, since you're far from the only one who's currently working directly off the
Yep, that's exactly right. When you spell it out like this, I realize that it does seem a bit silly to have this be a two-step operation... I think a better long-term approach would be to update
With the current API, yes, that would indeed be how you'd have to use it. It's been a while since I hacked on this code, but IIRC I couldn't find a good way to implement the typestate state machine API without requiring the implementer to play "hot potato" with the various states. Consequently, if you've got a suggestion on how you might tweak the API to maintain type-system enforced sequencing/correctness while using Also, if you're at liberty to share more details, I'm very interested to hear more about what context you're using |
Hi @daniel5151 thanks for the clarification! I hacked a bit more on this yesterday and I think I was able to wrap my head around the typed state-machine. IMO the organization you have made a lot of sense once I grasped it. Re sharing my use-case: Sure, it's all open-source, I'm currently adding gdb remote debugging support for an experimental x86-64 multicore kernel here: vmware-labs/node-replicated-kernel#52 |
Ahh, very cool! Thanks for sharing! If you have any more questions / suggestions about the state machine API (or any other |
Actually, I do have a more general question where you might be able to know how to best express this with gdbstub. The kernel binary in my case is relocated at a random address (e.g., changes every time the system boots, similar to Linux with KASLR). GDB is usually confused with this as it looks for the symbols at the offsets given in the ELF file and so one needs to go and manually set this offset when loading the binary (e.g., using the symbol-file command). I was trying to find if I can somehow send the PS: And thanks btw for all the hard and amazing work on gdbstub, it definitely saved me a ton of work if I had to implement all this by myself! |
Would Also check out #20, which might be related.
As a meta note: https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets |
Just wanted to report section offsets works great, thanks a lot for the help! |
To give a quick update regarding this feature: I'm pretty happy with the current state of the state machine API, and once @gz has a chance to integrate the latest changes, I'll try to find some time to properly document everything + push out a proper 0.6 release. @xobs, not sure if you're still tracking this, but if you get the chance, it would be super useful if you could update your implementation as well, and let me know if there are any issues / feature gaps. Cheers! |
I believe that the time has finally come close out this issue. The goal of this issue was to add an bare-metal API to If you encounter any issues with this API down the line, please open a new issue instead of re-opening / commenting on this one. |
gdbstub
makes it very easy to add a GDB server to any project. The current design uses long-polling to interact with the target. Conceptually,gdbstub
sits between a networkConnection
and theTarget
being debugged. When the target is running,gdbstub
is blocked.An API for bare metal could live in an interrupt handler such as a UART. Each time a character is received, it would be passed to
gdbstub
for collection and processing. This has several nice properties:gdbstub
means no debugger, and you simply have to remove the interrupt hook.The text was updated successfully, but these errors were encountered: