-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeBSD initial support #1480
FreeBSD initial support #1480
Conversation
In bpcountstest.go (the fixture used by TestBreakpointCounts) the main goroutine specifically waits for the two child goroutines before exiting. If TestBreakpointCounts doesn't pass either there's a bug in your backend implementation or in Go. I think it's more likely that there's a bug here. TestBreakpointCounts actually checks that the backend correctly detects when the same breakpoint is hit simultaneously on two different threads. Not doing this correctly is a common mistake, it was made by different people on the darwin, linux and windows native backend. It's likely that you have the same problem. |
// OSProcessDetails contains FreeBSD specific | ||
// process details. | ||
type OSProcessDetails struct { | ||
comm string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find where this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it's not used anywhere. I've tried to use tid (main process thread id) in singleStep mimicking comparison in line 59 from linux_threads.go:
if (status == nil || status.Exited()) && wpid == t.dbp.pid
But later replaced all logic in singleStep with a single call to trapWait(). Btw can we do the same for Linux? The problem with Linux's singleStep implementation is that it doesn't handle threads add\remove properly. I see that we are waiting for a TRAP from "this" thread ignoring signals from other threads. Maybe we can just pass tid into trapWait as additional param to simulate the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw can we do the same for Linux?
see the comment in (*Process).wait in proc_linux.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only comment there is about a workaround for lock when main thread exits and leaves zombies. Could you clarify how does that rely to replacing logic in singleStep() with a call to waitTrap()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full wait is slow because of the workaround. Also all threads are stopped and we are only executing one instruction of threads stopped on a breakpoint, so it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On FreeBSD, even when all threads are stopped, wait() in cycle may return multiple queued signals from multiple stopped threads (and even thread creation\exit events queued before trap).
I'm preparing a detailed step by step explanation of why "TestBreakpointCounts" fails, hopefully someone can help with a fix.
So far I was able to reproduce "TRAP events disappear" issue with a pure C code, without go stuff involved. Under some circumstances, app is exiting without delivering queued TRAP events to debugger. Preparing a message to freebsd-hackers group. |
FreeBSD bug created: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236035 |
Thank you, let me know how it goes. |
Finished a cursory review this morning, will be doing a more in depth review tomorrow or the following day. In the meantime, I saw some commented out code, I can just add a blanket statement now to clean all that up. By the way, thanks for submitting this patch! |
Regarding the discussion around TestBreakpointCounts I'm ok with disabling that test on freebsd and merging this now to prevent this PR from coderotting. There's also a free CI service that has FreeBSD servers: https://cirrus-ci.org/guide/FreeBSD/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and changes. Aside from this I agree with @aarzilli we should disable that test in the meantime only for FreeBSD and merge this soon.
One thing this PR has made apparent is the amount of duplication that must go into supporting another OS. I think we have an issue of code duplication anyways between arches, and that's something we need to fix swiftly. I think a set smaller, surgical refactoring PRs would be ideal and the least disruptive. I'm mostly thinking out loud with this last comment, but wanted to voice it. I will handle refactoring after this PR lands.
pkg/proc/native/threads_freebsd.go
Outdated
} | ||
|
||
t.dbp.trapWait(t.dbp.pid) | ||
/*_, status, err := t.dbp.waitFast(t.dbp.pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment this code out instead of implementing similar to how is being done elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that this logic seems to be broken in any existing implementation, e.g. in Linux waitFast is used, then exiting conditions checked, but there is no proper handling of thread creation\death in this place. trapWait handles all these cases properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, understood and makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got all the cases, but the PTRACE API has limitations regarding the thread that called the initial attach must be the thread issuing subsequent PTRACE calls, meaning they must be made from the goroutine which is locked to a thread.
@derekparker, wrapped all places except PtraceDetach - it doesn't work that way. Thanks. |
@rayrapetyan thanks for the prompt changes! The last thing I have is to disable any concurrent tests that are failing due to the FreeBSD issue with an informative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some changes to vendored code, for example vendor/golang.org/x/sys/unix/dirent.go, but no changes to go.mod/go.sum. How did you update the vendored code and is it going to break the next time we run go mod vendor
?
@derekparker, there are few weird test cases. When running:
But whenever you run each of these two individually (e.g. go test -run TestAttachDetach) - they always PASS. Any thoughts of why "go test" may perform like that? Thanks. |
The TestAttachDetach could happen if you are running tests in parallel (like |
I've tried |
It could be that Detach(true) isn't actually killing the target process, one instance of testnextnethttp stays running and makes TestAttachDetach fail. |
"Skipped" concurrent tests. Please let me know if I can help to setup FreeBSD CI in Travis. |
@rayrapetyan thanks! You should just have to run |
@derekparker, hm, |
@rayrapetyan are you making manual changes to the vendor directory? Yeah, that won't work for the reason you described. If you've made any changes to any dependency of Delve, you must commit that change upstream and then re-vendor it in Delve (from upstream). I didn't take a close look at what changes were made in the vendor dir, but is it possible to implement without any code changes to our dependencies? |
@derekparker, @aarzilli, I've cleared vendor-related changes, the only artifacts left are constants for ptrace and registers-related structs, I believe they should stay in vendors. Please suggest how to proceed. I've never committed into golang core, it requires a special procedure (https://golang.org/doc/contribute.html), so if you have accounts there could you commit these changes? Thanks. |
I'm taking a look at the tests you have disabled. It's quite a lot of tests, if they all need to be disabled it seems that there is something that's pretty thoroughly broken here. As for the changes to golang.org/x/sys/unix, I don't have a freebsd install and I don't know pretty much anything about it, I can't make them for you. For the constants it's easier to copy them to our side. The changes to Stat_t, Dirent, etc are worrying however, that's an ABI change, I don't know that's handled on golang.org/x/sys/unix. It's not that hard, you just have to follow steps 0 through 4 at https://golang.org/doc/contribute.html. Ignore the rest of that page, then you make a branch for your changes and mail them with |
@aarzilli, these are all parallel_next and concurrentstep tests, there are also two other tests I will investigate. Will also proceed with commits into sys/unix, thanks! |
My guess is that when a thread executes an INT 3 freebsd will send us immediately a signal: if other threads are running on different cores they will keep running until their quantum exipres, but the signal will be sent immediately. Which means that when (*Process).stop is called some of the threads could still be running and that's why we are missing some breakpoints in all those tests. |
golang.org: commit for ptrace support on FreeBSD: https://go-review.googlesource.com/c/go/+/166423 |
@rayrapetyan what's the status of the above patch? Looks like there might be some outstanding review comments to address before it's considered for merging. |
@derekparker, I was stuck with committing changes into golang - along with i386 changes which I provided they require changes for CPU architectures I don't own (ARM) and therefore can't test... |
@rayrapetyan It looks like all you need to do is generate the files for ARM. I think if you do that and submit it to the patch, the burden for review and testing will be on them. They can use their own internal builders for testing. |
Pushed "generated" ARM files. Awaiting merge. |
@rayrapetyan looks like that PR was closed because that stdlib package is locked, shoulda caught that earlier. In any event, can you open a PR against |
Yes, it passes all tests locally in FreeBSD. I can reproduce issue with x/crypto in Ubuntu VM. Do you know how to make x/crypto build correctly? |
@rayrapetyan the vendor has been updated with your changes via #1600 thanks to @aarzilli. You should be able to just remove your last commit and rebase on master. |
3714727
to
b90e4e1
Compare
@derekparker, @aarzilli, thanks guys, I've spent several hours trying to make latest x/crypto to work with x/sys with no luck. Hopefully we can merge my branch now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small things, otherwise looks good.
Exposes some areas which could benefit from some refactoring, but should be done in a follow up.
pkg/proc/fbsdutil/regs.go
Outdated
return | ||
} | ||
|
||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of duplicating this, let's reuse what already exists in linutil
.
pkg/proc/gdbserial/gdbserver.go
Outdated
@@ -1327,6 +1327,9 @@ func (p *Process) loadGInstr() []byte { | |||
case "darwin": | |||
// mov rcx,QWORD PTR gs:{uint32(off)} | |||
op = []byte{0x65, 0x48, 0x8B, 0x0C, 0x25} | |||
case "freebsd": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean this up a bit by just merging this into the case above, e.g. case "darwin", "freebsd":
pkg/proc/native/proc_freebsd.go
Outdated
var err error | ||
dbp.execPtraceFunc(func() { err = sys.PtraceLwpEvents(dbp.pid, 1) }) | ||
if err == syscall.ESRCH { | ||
// XXX why do we wait here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still an open question?
@derekparker, updated. An interesting finding - with fixed bitmasks another concurrent test is constantly failing, so I've added it to skipped for now. Definitely bitmasks affect other concurrent tests also. I will look into all skipped tests after July 20. |
@rayrapetyan hmm, that's odd. That change didn't break any tests on any currently supported OS. I have one last request before merging. The |
@aarzilli I'm planning on merging this by EOD today even if the above comment isn't addressed (and just do my own follow up commit if need be). Is there any concern you have with this PR or are you good with it at this point? I have questions on why some of the tests are failing and we need to chase those down (@rayrapetyan has mentioned starting after July 20). I'm eager to get this merged, but as we are also really close to a release and I want to make sure we're not releasing something big like a new OS without very much vetting. The upside of merging soon however is we can start getting early adopters to test and provide feedback. |
if err != nil { | ||
return nil, fmt.Errorf("wait err %s %d", err, pid) | ||
} | ||
if status.Killed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this has anything to do with the failing concurrent tests, since we're effectively ignoring status.Killed
, yes we send SIGKILL
to the process in process.kill()
. I have a freebsd VM I can test this theory out on tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging then creating an issue to track test failures.
@rayrapetyan a huge thanks to you for this port! Now looking forward to getting the concurrency issues sorted out. |
* FreeBSD initial support * first code review fixes * regs slice upd * execPtraceFunc wrap * disabled concurrency tests fixed kill() issue * disabled concurrency tests fixed kill() issue * cleanup vendor related code * cleanup ptrace calls * vendoring latest changes * Revert "vendoring latest changes" This reverts commit 833cb87 * vendoring latest changes * requested changes
* FreeBSD initial support * first code review fixes * regs slice upd * execPtraceFunc wrap * disabled concurrency tests fixed kill() issue * disabled concurrency tests fixed kill() issue * cleanup vendor related code * cleanup ptrace calls * vendoring latest changes * Revert "vendoring latest changes" This reverts commit 833cb87 * vendoring latest changes * requested changes
This pull request adds initial support for FreeBSD.
It's primarily based on fbsd branch from: github.com:asomers/delve.git, just fixes few critical bugs and adds few new functions.
It passes almost all tests (281 of 288), except few "Concurrency" ones, but I believe it's not a bug, but a natural behavior for FreeBSD and these tests should be slightly changed.
E.g. on "TestBreakpointCounts" two threads are expected to hit same breakpoint 100 times each. In FreeBSD, when main thread exits, all "child" threads are not triggering TRAP signals anymore, process just finishes. That's why when main thread reaches 100 breakpoint hits and exits, child thread will never reach 100 and stops somewhere around 95. Sometimes, when child thread is faster and completes BP race first, we can see this test as PASSED.
Other than that it works fine.
ref: #213
Thanks.