-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
We did this by making the reporting of status messages non-blocking
First, thanks a lot for all the work you have put into this. However, the tests are failing with your changes. I even re-run them, but for both Go 1.13 as well as Go 1.16 the following happens:
|
The error occurs only for the last testcase As to why this might be happening: in the test logs you can find many instances of this error message. If you increase the buffer size from 100 to 200, the test now passes, which also points to it being an issue of excessive load. As to why this is happening for the test case with 64 WebSocket CLAs but not for 64 TCP CLAs, I honestly have no idea... |
As can be seen in recent issues and PRs (#39, #40, #41, #42, #43, #45), there is an issue in dtn7-go that results in a deadlock after some time, making it impossible to sent bundles. To be able to test and debug this issue, some more extensive tests are required. Thus, we developed a test infrastructure to simulate multiple virtual nodes using the [CORE emulator](https://github.com/coreemu/core). This tests environment is located in the [dtn7-playground](https://github.com/dtn7/dtn7-playground) repository. To be able to trigger these tests, this PR adds a workflow file, that uses the [repository_dispatch event](https://docs.github.com/en/actions/reference/events-that-trigger-workflows#repository_dispatch) sent to the dtn7-playground repo.
* Workflow for extended tests As can be seen in recent issues and PRs (#39, #40, #41, #42, #43, #45), there is an issue in dtn7-go that results in a deadlock after some time, making it impossible to sent bundles. To be able to test and debug this issue, some more extensive tests are required. Thus, we developed a test infrastructure to simulate multiple virtual nodes using the [CORE emulator](https://github.com/coreemu/core). This tests environment is located in the [dtn7-playground](https://github.com/dtn7/dtn7-playground) repository. To be able to trigger these tests, this PR adds a workflow file, that uses the [repository_dispatch event](https://docs.github.com/en/actions/reference/events-that-trigger-workflows#repository_dispatch) sent to the dtn7-playground repo. * Fix copyright header Co-authored-by: Artur Sterz <[email protected]>
Please fix/merge this ... this great library is basically useless without this patch |
Sooo, it appears that this was actually an issue with the network-emulator we were using... since we haven't seen this problem in a while. |
While using
dtn7-go
in a large-ish setup, we found that the daemon would regularly lock up completely. We eventually traced the problem to the interaction between the cla-manager, manager-elem and cla itself (in our case mtcp, but the issue would potentially exist for any cla with synchronisation problems).In the end it comes down to the fact that both unbuffered channel writes and the respective
Close()
-methods of the manager-elem and cla are blocking. The created a situation where the manager would call the manager-elem'sClose()
-method and block, while the manager-elem could not react to it, since it was blocked by a channel send to the manager which would also never resolve - thus a global deadlock.Out first-order solution to this issue was to make the channel-writes for the reporting channels non-blocking, using a select-guard. In order to prevent message loss if there is not currently someone listening on the channel, we turned the channels from unbuffered to buffered channels (sizer of the buffer configurable using the
ReportChannelBuffer
const inmanager.go
.C&C welcome.