Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroMQ send might be interrupted by system call after Go v1.14 #76

Closed
cloudxxx8 opened this issue Nov 23, 2020 · 5 comments
Closed

ZeroMQ send might be interrupted by system call after Go v1.14 #76

cloudxxx8 opened this issue Nov 23, 2020 · 5 comments
Labels
3-high priority denoting release-blocking issues bug Something isn't working hanoi Hanoi release

Comments

@cloudxxx8
Copy link
Member

Starting with Go 1.14, on Unix-like systems, you will get a lot of interrupted signal calls. See the top of a package documentation for a fix.

This issue would cause event missing at runtime.

Error log example:
level=ERROR ts=2020-11-23T05:44:17.994238292Z app=edgex-core-data source=event.go:303 msg="Unable to send message for event: {...} interrupted system call"

Reference:
https://pkg.go.dev/github.com/pebbe/zmq4#section-documentation

There are two options to prevent this.

The first option is to build your program with the environment variable:
GODEBUG=asyncpreemptoff=1

The second option is to let the program retry after an interrupted system call.

@cloudxxx8 cloudxxx8 added 3-high priority denoting release-blocking issues bug Something isn't working labels Nov 23, 2020
@cloudxxx8 cloudxxx8 changed the title ZeroMQ send might be intrrupted by system call after Go v1.14 ZeroMQ send might be interrupted by system call after Go v1.14 Nov 23, 2020
@lenny-goodell
Copy link
Member

@cloudxxx8 , @jpwhitemn, Is this serious enough to require a point release for Hanoi?

@cloudxxx8
Copy link
Member Author

I thought the answer yes, if we can resolve it.
Maybe we should investigate whether https://github.com/go-zeromq/zmq4 can resolve it.

@jpwhitemn jpwhitemn added the hanoi Hanoi release label Dec 2, 2020
@tonyespy
Copy link
Member

tonyespy commented Dec 7, 2020

The issue description is lacking some key details...

First, as described on recent TSC calls, this issue was discovered due to our new TAF-based Modbus scalability tests. Although not explicitly stated in the description, I'm assuming based on the date this issue was reported that the testing was performed against the Hanoi (1.3) release of EdgeX, and most likely the 1.3.0 version of device-modbus?

As mentioned in the description, the root cause is Go 1.14, where goroutines are now asynchronously preemptible on most platforms. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. Per the Go 1.14 release notes this results in a performance boost, in particular for most use cases of defer.

As @cloudxxx8 mentions, the zmq4 README includes a warning at the top of the page about the potential impact of this change, and offers two solutions:

  • turn off the behavior in Go 1.14, which could result in performance degradation
  • let the program retry after an interrupted system call

That said, the sentence immediately following these two options basically implies that zmq4 does this for you, unless explicitly disabled:

Initially, this is set to true, for the global context, and for contexts created with NewContext().

But I have to admit, it's not worded particularly well. I had to read it a few times myself and then review the code before I understood what it meant.

The catch is that go-mod-messaging is using v1.0.0 of the zmq4 package and the behavior described in the zmq4's README page wasn't introduced until v1.1.0 (see this commit). The code checks for EINTR on any CGO calls to the underlying libzmq library and automatically retries.

So we should consider upgrading to v1.1.0 or better yet v1.1.1 (the latest patch release of 1.1).

Also after poking through the zmq4 code, I noticed that this is really only a single developer project (https://github.com/pebbe/zmq4/graphs/contributors). Not sure if we reviewed it as part of our Geneva import review, but it might be worth re-visiting.

As for whether or not we should consider releasing a patch/maintenance release of Hanoi, I would say most definitely as this could result in data loss.

We also should survey our code for use of the syscall or golang.org/x/sys/unix packages as they could also be impacted (see the Go 1.14 release notes for more details).

@cloudxxx8
Copy link
Member Author

@tonyespy Bruce has verified v1.2.2 can work without this issue, so I suggest to merge the following PR and close this issue
#79

@lenny-goodell
Copy link
Member

@cloudxxx8 , excellent. PR merged. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3-high priority denoting release-blocking issues bug Something isn't working hanoi Hanoi release
Projects
None yet
Development

No branches or pull requests

4 participants