-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeroMQ send might be interrupted by system call after Go v1.14 #76
Comments
@cloudxxx8 , @jpwhitemn, Is this serious enough to require a point release for Hanoi? |
I thought the answer yes, if we can resolve it. |
The issue description is lacking some key details... First, as described on recent TSC calls, this issue was discovered due to our new TAF-based Modbus scalability tests. Although not explicitly stated in the description, I'm assuming based on the date this issue was reported that the testing was performed against the Hanoi (1.3) release of EdgeX, and most likely the 1.3.0 version of device-modbus? As mentioned in the description, the root cause is Go 1.14, where goroutines are now asynchronously preemptible on most platforms. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. Per the Go 1.14 release notes this results in a performance boost, in particular for most use cases of As @cloudxxx8 mentions, the zmq4 README includes a warning at the top of the page about the potential impact of this change, and offers two solutions:
That said, the sentence immediately following these two options basically implies that zmq4 does this for you, unless explicitly disabled: Initially, this is set to true, for the global context, and for contexts created with NewContext(). But I have to admit, it's not worded particularly well. I had to read it a few times myself and then review the code before I understood what it meant. The catch is that go-mod-messaging is using v1.0.0 of the zmq4 package and the behavior described in the zmq4's README page wasn't introduced until v1.1.0 (see this commit). The code checks for So we should consider upgrading to v1.1.0 or better yet v1.1.1 (the latest patch release of 1.1). Also after poking through the zmq4 code, I noticed that this is really only a single developer project (https://github.com/pebbe/zmq4/graphs/contributors). Not sure if we reviewed it as part of our Geneva import review, but it might be worth re-visiting. As for whether or not we should consider releasing a patch/maintenance release of Hanoi, I would say most definitely as this could result in data loss. We also should survey our code for use of the |
@cloudxxx8 , excellent. PR merged. Closing this issue. |
Starting with Go 1.14, on Unix-like systems, you will get a lot of interrupted signal calls. See the top of a package documentation for a fix.
This issue would cause event missing at runtime.
Error log example:
level=ERROR ts=2020-11-23T05:44:17.994238292Z app=edgex-core-data source=event.go:303 msg="Unable to send message for event: {...} interrupted system call"
Reference:
https://pkg.go.dev/github.com/pebbe/zmq4#section-documentation
There are two options to prevent this.
The first option is to build your program with the environment variable:
GODEBUG=asyncpreemptoff=1
The second option is to let the program retry after an interrupted system call.
The text was updated successfully, but these errors were encountered: