Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD initial support #1480

Merged
merged 12 commits into from
Jul 13, 2019
Merged

Conversation

rayrapetyan
Copy link
Contributor

This pull request adds initial support for FreeBSD.
It's primarily based on fbsd branch from: github.com:asomers/delve.git, just fixes few critical bugs and adds few new functions.

It passes almost all tests (281 of 288), except few "Concurrency" ones, but I believe it's not a bug, but a natural behavior for FreeBSD and these tests should be slightly changed.

E.g. on "TestBreakpointCounts" two threads are expected to hit same breakpoint 100 times each. In FreeBSD, when main thread exits, all "child" threads are not triggering TRAP signals anymore, process just finishes. That's why when main thread reaches 100 breakpoint hits and exits, child thread will never reach 100 and stops somewhere around 95. Sometimes, when child thread is faster and completes BP race first, we can see this test as PASSED.

Other than that it works fine.

ref: #213

Thanks.

@aarzilli
Copy link
Member

In bpcountstest.go (the fixture used by TestBreakpointCounts) the main goroutine specifically waits for the two child goroutines before exiting. If TestBreakpointCounts doesn't pass either there's a bug in your backend implementation or in Go. I think it's more likely that there's a bug here.

TestBreakpointCounts actually checks that the backend correctly detects when the same breakpoint is hit simultaneously on two different threads. Not doing this correctly is a common mistake, it was made by different people on the darwin, linux and windows native backend. It's likely that you have the same problem.

pkg/proc/native/proc_freebsd.go Show resolved Hide resolved
// OSProcessDetails contains FreeBSD specific
// process details.
type OSProcessDetails struct {
comm string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find where this is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it's not used anywhere. I've tried to use tid (main process thread id) in singleStep mimicking comparison in line 59 from linux_threads.go:
if (status == nil || status.Exited()) && wpid == t.dbp.pid
But later replaced all logic in singleStep with a single call to trapWait(). Btw can we do the same for Linux? The problem with Linux's singleStep implementation is that it doesn't handle threads add\remove properly. I see that we are waiting for a TRAP from "this" thread ignoring signals from other threads. Maybe we can just pass tid into trapWait as additional param to simulate the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw can we do the same for Linux?

see the comment in (*Process).wait in proc_linux.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only comment there is about a workaround for lock when main thread exits and leaves zombies. Could you clarify how does that rely to replacing logic in singleStep() with a call to waitTrap()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full wait is slow because of the workaround. Also all threads are stopped and we are only executing one instruction of threads stopped on a breakpoint, so it's fine.

Copy link
Contributor Author

@rayrapetyan rayrapetyan Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On FreeBSD, even when all threads are stopped, wait() in cycle may return multiple queued signals from multiple stopped threads (and even thread creation\exit events queued before trap).

I'm preparing a detailed step by step explanation of why "TestBreakpointCounts" fails, hopefully someone can help with a fix.

pkg/proc/native/proc_freebsd.c Show resolved Hide resolved
@rayrapetyan
Copy link
Contributor Author

So far I was able to reproduce "TRAP events disappear" issue with a pure C code, without go stuff involved. Under some circumstances, app is exiting without delivering queued TRAP events to debugger. Preparing a message to freebsd-hackers group.

@rayrapetyan
Copy link
Contributor Author

@aarzilli
Copy link
Member

Thank you, let me know how it goes.

@derekparker
Copy link
Member

Finished a cursory review this morning, will be doing a more in depth review tomorrow or the following day. In the meantime, I saw some commented out code, I can just add a blanket statement now to clean all that up.

By the way, thanks for submitting this patch!

@aarzilli
Copy link
Member

Regarding the discussion around TestBreakpointCounts I'm ok with disabling that test on freebsd and merging this now to prevent this PR from coderotting. There's also a free CI service that has FreeBSD servers: https://cirrus-ci.org/guide/FreeBSD/.

Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits and changes. Aside from this I agree with @aarzilli we should disable that test in the meantime only for FreeBSD and merge this soon.

One thing this PR has made apparent is the amount of duplication that must go into supporting another OS. I think we have an issue of code duplication anyways between arches, and that's something we need to fix swiftly. I think a set smaller, surgical refactoring PRs would be ideal and the least disruptive. I'm mostly thinking out loud with this last comment, but wanted to voice it. I will handle refactoring after this PR lands.

pkg/proc/native/threads_freebsd.go Outdated Show resolved Hide resolved
}

t.dbp.trapWait(t.dbp.pid)
/*_, status, err := t.dbp.waitFast(t.dbp.pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment this code out instead of implementing similar to how is being done elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that this logic seems to be broken in any existing implementation, e.g. in Linux waitFast is used, then exiting conditions checked, but there is no proper handling of thread creation\death in this place. trapWait handles all these cases properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, understood and makes sense.

pkg/proc/native/threads_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/ptrace_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got all the cases, but the PTRACE API has limitations regarding the thread that called the initial attach must be the thread issuing subsequent PTRACE calls, meaning they must be made from the goroutine which is locked to a thread.

pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/threads_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/threads_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/proc_freebsd.go Outdated Show resolved Hide resolved
pkg/proc/native/registers_freebsd_amd64.go Outdated Show resolved Hide resolved
@rayrapetyan
Copy link
Contributor Author

@derekparker, wrapped all places except PtraceDetach - it doesn't work that way. Thanks.

@derekparker
Copy link
Member

@rayrapetyan thanks for the prompt changes! The last thing I have is to disable any concurrent tests that are failing due to the FreeBSD issue with an informative t.Skip(...) and then we should be able to merge this and get FreeBSD CI set up.

Copy link
Member

@aarzilli aarzilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some changes to vendored code, for example vendor/golang.org/x/sys/unix/dirent.go, but no changes to go.mod/go.sum. How did you update the vendored code and is it going to break the next time we run go mod vendor?

@rayrapetyan
Copy link
Contributor Author

@derekparker, there are few weird test cases. When running:
go test
from pkg/proc folder, two tests always fail:

--- FAIL: TestAttachDetach (0.08s)
    proc_test.go:136: Did not continue to correct location, expected line :11 got /usr/local/go/src/runtime/panic.go:681
        	at /ara/devel/blockchain/go/src/github.com/go-delve/delve/pkg/proc/proc_test.go:2887
--- FAIL: TestIssue844 (0.14s)
    proc_test.go:113: failed assertion at proc_test.go:3204: Continue - ptraceGetLwpInfo err no such process -1
i: 0 -> equalsTwo: false 
i: 1 -> equalsTwo: false 
0
fatal error: all goroutines are asleep - deadlock!

But whenever you run each of these two individually (e.g. go test -run TestAttachDetach) - they always PASS. Any thoughts of why "go test" may perform like that? Thanks.

@aarzilli
Copy link
Member

aarzilli commented Mar 7, 2019

The TestAttachDetach could happen if you are running tests in parallel (like go test ./...), the other one I don't know, it needs to be debugged.

@rayrapetyan
Copy link
Contributor Author

I've tried cd pkg/proc; go test -parallel 1 - it also fails...

@aarzilli
Copy link
Member

aarzilli commented Mar 7, 2019

It could be that Detach(true) isn't actually killing the target process, one instance of testnextnethttp stays running and makes TestAttachDetach fail.

@rayrapetyan
Copy link
Contributor Author

"Skipped" concurrent tests. Please let me know if I can help to setup FreeBSD CI in Travis.
@aarzilli, sorry, just found your "go mod" related question. Please let me know how to perform vendor code update procedure properly and I'll commit missed fixes. Thanks.

@derekparker
Copy link
Member

@rayrapetyan thanks! You should just have to run go mod vendor.

@rayrapetyan
Copy link
Contributor Author

@derekparker, hm, go mod vendor just re-creates original vendor directory vanishing all my changes. Does that mean I should commit everything into original repo first (e.g. into golang.org/x/sys)? Thanks!

@derekparker
Copy link
Member

@rayrapetyan are you making manual changes to the vendor directory? Yeah, that won't work for the reason you described.

If you've made any changes to any dependency of Delve, you must commit that change upstream and then re-vendor it in Delve (from upstream). I didn't take a close look at what changes were made in the vendor dir, but is it possible to implement without any code changes to our dependencies?

@rayrapetyan
Copy link
Contributor Author

@derekparker, @aarzilli, I've cleared vendor-related changes, the only artifacts left are constants for ptrace and registers-related structs, I believe they should stay in vendors. Please suggest how to proceed. I've never committed into golang core, it requires a special procedure (https://golang.org/doc/contribute.html), so if you have accounts there could you commit these changes? Thanks.

@aarzilli
Copy link
Member

aarzilli commented Mar 9, 2019

I'm taking a look at the tests you have disabled. It's quite a lot of tests, if they all need to be disabled it seems that there is something that's pretty thoroughly broken here.

As for the changes to golang.org/x/sys/unix, I don't have a freebsd install and I don't know pretty much anything about it, I can't make them for you.

For the constants it's easier to copy them to our side. The changes to Stat_t, Dirent, etc are worrying however, that's an ABI change, I don't know that's handled on golang.org/x/sys/unix.

It's not that hard, you just have to follow steps 0 through 4 at https://golang.org/doc/contribute.html. Ignore the rest of that page, then you make a branch for your changes and mail them with git codereview mail.

@rayrapetyan
Copy link
Contributor Author

@aarzilli, these are all parallel_next and concurrentstep tests, there are also two other tests I will investigate. Will also proceed with commits into sys/unix, thanks!

@aarzilli
Copy link
Member

these are all parallel_next and concurrentstep tests, there are also two other tests I will investigate

My guess is that when a thread executes an INT 3 freebsd will send us immediately a signal: if other threads are running on different cores they will keep running until their quantum exipres, but the signal will be sent immediately. Which means that when (*Process).stop is called some of the threads could still be running and that's why we are missing some breakpoints in all those tests.

@rayrapetyan
Copy link
Contributor Author

golang.org: commit for ptrace support on FreeBSD: https://go-review.googlesource.com/c/go/+/166423

@derekparker
Copy link
Member

@rayrapetyan what's the status of the above patch? Looks like there might be some outstanding review comments to address before it's considered for merging.

@rayrapetyan
Copy link
Contributor Author

@derekparker, I was stuck with committing changes into golang - along with i386 changes which I provided they require changes for CPU architectures I don't own (ARM) and therefore can't test...

@derekparker
Copy link
Member

@rayrapetyan It looks like all you need to do is generate the files for ARM. I think if you do that and submit it to the patch, the burden for review and testing will be on them. They can use their own internal builders for testing.

@rayrapetyan
Copy link
Contributor Author

Pushed "generated" ARM files. Awaiting merge.
golang/go#30704

@derekparker
Copy link
Member

@rayrapetyan looks like that PR was closed because that stdlib package is locked, shoulda caught that earlier. In any event, can you open a PR against x/sys/unix if you haven't already? We are already vendoring that so it would be easier to vendor your changes once merged.

@rayrapetyan
Copy link
Contributor Author

Yes, it passes all tests locally in FreeBSD. I can reproduce issue with x/crypto in Ubuntu VM. Do you know how to make x/crypto build correctly?

@derekparker
Copy link
Member

@rayrapetyan the vendor has been updated with your changes via #1600 thanks to @aarzilli. You should be able to just remove your last commit and rebase on master.

@rayrapetyan
Copy link
Contributor Author

@derekparker, @aarzilli, thanks guys, I've spent several hours trying to make latest x/crypto to work with x/sys with no luck. Hopefully we can merge my branch now.

Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small things, otherwise looks good.

Exposes some areas which could benefit from some refactoring, but should be done in a follow up.

pkg/proc/fbsdutil/regs.go Show resolved Hide resolved
return
}

const (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this, let's reuse what already exists in linutil.

@@ -1327,6 +1327,9 @@ func (p *Process) loadGInstr() []byte {
case "darwin":
// mov rcx,QWORD PTR gs:{uint32(off)}
op = []byte{0x65, 0x48, 0x8B, 0x0C, 0x25}
case "freebsd":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean this up a bit by just merging this into the case above, e.g. case "darwin", "freebsd":

var err error
dbp.execPtraceFunc(func() { err = sys.PtraceLwpEvents(dbp.pid, 1) })
if err == syscall.ESRCH {
// XXX why do we wait here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still an open question?

@rayrapetyan
Copy link
Contributor Author

@derekparker, updated. An interesting finding - with fixed bitmasks another concurrent test is constantly failing, so I've added it to skipped for now. Definitely bitmasks affect other concurrent tests also. I will look into all skipped tests after July 20.

@derekparker
Copy link
Member

@rayrapetyan hmm, that's odd. That change didn't break any tests on any currently supported OS.

I have one last request before merging. The t.Skip(...) message isn't exactly correct. The tests are valid, just broken on the OS. Please reword and link to relevant issues (like the upstream freebsd issue you created) in a comment. After that I will merge promptly and create some follow up issues to track fixing the broken tests.

@derekparker
Copy link
Member

@aarzilli I'm planning on merging this by EOD today even if the above comment isn't addressed (and just do my own follow up commit if need be). Is there any concern you have with this PR or are you good with it at this point?

I have questions on why some of the tests are failing and we need to chase those down (@rayrapetyan has mentioned starting after July 20). I'm eager to get this merged, but as we are also really close to a release and I want to make sure we're not releasing something big like a new OS without very much vetting. The upside of merging soon however is we can start getting early adopters to test and provide feedback.

if err != nil {
return nil, fmt.Errorf("wait err %s %d", err, pid)
}
if status.Killed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this has anything to do with the failing concurrent tests, since we're effectively ignoring status.Killed, yes we send SIGKILL to the process in process.kill(). I have a freebsd VM I can test this theory out on tomorrow.

Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Merging then creating an issue to track test failures.

@derekparker derekparker merged commit df65be4 into go-delve:master Jul 13, 2019
@derekparker
Copy link
Member

@rayrapetyan a huge thanks to you for this port! Now looking forward to getting the concurrency issues sorted out.

cgxxv pushed a commit to cgxxv/delve that referenced this pull request Mar 25, 2022
* FreeBSD initial support

* first code review fixes

* regs slice upd

* execPtraceFunc wrap

* disabled concurrency tests
fixed kill() issue

* disabled concurrency tests
fixed kill() issue

* cleanup vendor related code

* cleanup ptrace calls

* vendoring latest changes

* Revert "vendoring latest changes"

This reverts commit 833cb87

* vendoring latest changes

* requested changes
abner-chenc pushed a commit to loongson/delve that referenced this pull request Mar 1, 2024
* FreeBSD initial support

* first code review fixes

* regs slice upd

* execPtraceFunc wrap

* disabled concurrency tests
fixed kill() issue

* disabled concurrency tests
fixed kill() issue

* cleanup vendor related code

* cleanup ptrace calls

* vendoring latest changes

* Revert "vendoring latest changes"

This reverts commit 833cb87

* vendoring latest changes

* requested changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants