-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: gdb stepping behavior when debugging Go program on ppc64le changed when split-stack prologue marked non-preemptible #37126
Comments
/cc @aclements @randall77 per owners. |
So if I set GODEBUG=asyncpreemptoff=1 then the problem doesn't happen. I'm not entirely sure that it only happens at the beginning of a function. |
I don't see how this could be related specifically to marking the split-stack prologue non-preemptible (/cc @cherrymui), but I could see asynchronous preemption affecting this. My hunch is that this is just an extension of a problem we've had for a long time. If you try to step over a split stack prologue and it actually grows the stack, GDB gets very confused because the SP changes dramatically, but GDB assumes the SP is a stable indication of what frame you're in. Though I'm not quite sure why asynchronous preemption would make that worse because that won't actually grow the stack (it can't). It may be that the preemption gets injected and switches to a different goroutine and the dramatic SP change of the goroutine switch confuses GDB. It wouldn't surprise me if GDB sees the SP increase dramatically from the goroutine switch and assumes you've exited the frame you were trying to next in, and it just gives up on nexting. |
Here is what I just tried, using math.test built from the math package. I set a break at the start of the sin function.
If I set GODEBUG=asyncpreemptoff=1 this doesn't happen. If I use Go 1.13 this doesn't happen. If I break at a later spot, the ni doesn't hang either, at least in this case. The last instruction it tried was the blt. I don't think it is related to morestack because it will happen on any function, and I don't think it is called that often. I don't think this is a problem that has been around for a long time, I use gdb on programs built by Golang all the time and have never seen this before. |
My guess is that when one thread is stopped in the prologue, which is non-preemptible, other threads may keep sending signals to try to preempt it. Maybe the signals keep gdb busy? Maybe try |
I was not familiar with set scheduler-locking but that does seem to fix it. In any case, when it hangs, here is the output from the active threads and their backtraces. I don't see anything that would be sending signals. I also tried this same experiment with Go 1.13 and if I stop at the same location, the threads and their stacks look the same.
|
Thanks, @laboger . Okay, it seems other threads are not actively sending signals (which is expected), even not actively running. So maybe it is gdb itself stuck? I guess the next step is to see what gdb does at this point. Maybe run gdb inside gdb to find out? Also, does this happen with other non-preemptible places? For example, if you single step in |
Normally, when single stepping in one thread, other threads are still running. |
Today was kind of strange, I found some systems where I didn't hit the error when at one time I thought it happened all the time. I am fairly sure in Go code it only happens within the code that checks the stack and calls morestack and that I can still reproduce on some systems. I thought I was hitting this a lot while debugging an asm function called update in poly1305 which now I see does not have the morestack sequence but today I can't reproduce it. I kind of got into the habit of setting the asyncpremptoff=1 while debugging because it was so annoying so I didn't really keep track of when and where it happened and didn't happen. I can't really make it happen today in other asm files. I don't know what's different, although I do see some gdb's have the scheduler-locking default set to replay while some have it set to step. I can't find the documentation that tells what those two values mean so not sure if it makes a difference. If this only affects Power I don't really expect a fix. Probably no one else does this type of debugging except me. (When you set a break at a symbol it sets it after the morestack sequence). I don't know if there is documentation for using gdb with Go but maybe could be mentioned there how to work around this problem. |
So, if it is set to step, other threads will be suspended while single stepping. If it is set to replay, other threads can still run. I tried to reproduce on linux/amd64 with math.test but couldn't. Do the systems you run have the same version of GDB? Does it only happen if it actually calls morestack, or it can happen no matter whether it actually grows the stack? |
OK, the scheduler-locking values make sense. On the system where the default is step, it does not hang, but all other systems default to replay and on those it hangs. There are different versions of gdb on the systems I tried. Once it gets into this code sequence the delays when stepping happen pretty consistently. I don't think it is related to the morestack call, because the delay in stepping starts before it gets to the call to morestack, either on the first cmp or the blt. I also tried breaking at the call to math/big.mulAddVWW which is asm without the morestack sequence and I do see some step delays in the code as well on the systems with replay as the default. As mentioned above, once scheduler-locking is set to step or off then the delays are gone. |
I have not seen this behavior in a while so I am closing it. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Tried to debug Go program on ppc64le using gdb by breaking at a function's address then stepping through the first several instructions.
What did you expect to see?
Normal gdb behavior, gdb prompt returning within a reasonable amount of time (usually immediately) when stepping through code at the beginning of a function. This type of stepping works normally for programs compiled with Go 1.13.
I found that this behavior started with b2482e4, so with that knowledge I now know that if I break after the split-stack prologue code the hanging/delay doesn't happen and only happens with functions that have split-stack prologue.
What did you see instead?
Break at the beginning of a function that has a split-stack prolog using its address. After typing 'ni' to go to the next instruction, gdb hangs and the prompt can take a long time and sometimes does not return (or I didn't wait long enough).
The text was updated successfully, but these errors were encountered: