Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Deadlock in Flush Function Due to ENOBUFS #286

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

patryk4815
Copy link

@patryk4815 patryk4815 commented Nov 25, 2024

Hi.
This PR resolves a issue in the Flush function, where a deadlock occurs when the kernel returns an ENOBUFS error.
This issue has been observed in our production 🙃 (cc: @Ignatella )

Changes:

  • Fix deadlock in Flush function
  • Added test to simulate the issue using a reduced read/write buffer, ensuring that the fix works correctly and prevents regressions

Debugger:

Switched from 56906 to 1 (thread 2855201)
(dlv) bt
0  0x000000000043de2e in runtime.gopark
    at runtime/proc.go:399
1  0x00000000004368b7 in runtime.netpollblock
    at runtime/netpoll.go:564
2  0x0000000000468425 in internal/poll.runtime_pollWait
    at runtime/netpoll.go:343
3  0x00000000004dbe67 in internal/poll.(*pollDesc).wait
    at internal/poll/fd_poll_runtime.go:84
4  0x00000000004e1fca in internal/poll.(*pollDesc).waitRead
    at internal/poll/fd_poll_runtime.go:89
5  0x00000000004e1fca in internal/poll.(*FD).RawRead
    at internal/poll/fd_unix.go:708
6  0x00000000004eb20a in os.(*rawConn).Read
    at os/rawconn.go:31
7  0x000000000079a56b in syscall.RawConn.Read-fm
    at <autogenerated>:1
8  0x0000000000798c49 in github.com/mdlayher/socket.rwT[go.shape.struct { github.com/mdlayher/socket.n int; github.com/mdlayher/socket.oobn int; github.com/mdlayher/socket.recvflags int; github.com/mdlayher/socket.from golang.org/x/sys/unix.Sockaddr }]
    at github.com/mdlayher/[email protected]/conn.go:795
9  0x00000000007984f2 in github.com/mdlayher/socket.readT[go.shape.struct { github.com/mdlayher/socket.n int; github.com/mdlayher/socket.oobn int; github.com/mdlayher/socket.recvflags int; github.com/mdlayher/socket.from golang.org/x/sys/unix.Sockaddr }]
    at github.com/mdlayher/[email protected]/conn.go:666
10  0x0000000000791eb4 in github.com/mdlayher/socket.(*Conn).Recvmsg
    at github.com/mdlayher/[email protected]/conn.go:572
11  0x000000000079f3f6 in github.com/mdlayher/netlink.(*conn).Receive
    at github.com/mdlayher/[email protected]/conn_linux.go:130
12  0x000000000079d9c2 in github.com/mdlayher/netlink.(*Conn).receive
    at github.com/mdlayher/[email protected]/conn.go:279
13  0x000000000079d747 in github.com/mdlayher/netlink.(*Conn).lockedReceive
    at github.com/mdlayher/[email protected]/conn.go:238
14  0x000000000079d62d in github.com/mdlayher/netlink.(*Conn).Receive
    at github.com/mdlayher/[email protected]/conn.go:231
15  0x00000000007ac35e in github.com/google/nftables.receiveAckAware
    at github.com/google/[email protected]/conn.go:94
16  0x00000000007acec5 in github.com/google/nftables.(*Conn).Flush
.......
    at runtime/proc.go:267
24  0x000000000046dbc1 in runtime.goexit
    at runtime/asm_amd64.s:1650
 
(dlv) frame 16

(dlv) p errs
error(*errors.joinError) *{
	errs: []error len: 1, cap: 1, [
		...,
	],}

(dlv) p errs.errs
[]error len: 1, cap: 1, [
	*github.com/mdlayher/netlink.OpError {
		Op: "receive",
		Err: error(*os.SyscallError) ...,
		Message: "",
		Offset: 0,},
]

(dlv) p errs.errs[0]
error(*github.com/mdlayher/netlink.OpError) *{
	Op: "receive",
	Err: error(*os.SyscallError) *{
		Syscall: "recvmsg",
		Err: error(syscall.Errno) *(*error)(0xc000034090),},
	Message: "",
	Offset: 0,}

(dlv) p errs.errs[0].Err
error(*os.SyscallError) *{
	Syscall: "recvmsg",
	Err: error(syscall.Errno) ENOBUFS (105),}

(dlv) p errs.errs[0].Err.Err
error(syscall.Errno) ENOBUFS (105)

Copy link

google-cla bot commented Nov 25, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant