-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Startup problem on slow/degraded networks (the "3/4" problem) #2547
Comments
Now that is how you enter a bug report. Thanks! |
Thanks @devinbileck, glad to help in whatever way I can. :) I did some more testing on this issue and did some interesting math as well. It seems that, if the On the math side of things: since the initial transfer requires a push of ~1.5 MB through a hard window of 60 seconds, it results that the application can't start up if the outbound channel has less than ~250kbps available capacity. This is consistent with the results of my follow-up tests. If this is the case indeed (someone with knowledge of the code can confirm), then perhaps a solution may be to increase the hardcoded timeout constant, or replace it with a runtime variable based on the expected data transfer size, as Bisq is supposed to know the volume of data being pushed, before initiating the transfer. |
@agb19 Do you have the latest version? We ship some data so the request from seed requires less bandwidth. But thanks for your analysis, it is for sure an aspect we need to handle better, both to optimize that less data is requested and that the timeout does not trigger. |
Ah ignore the log request, I see it in the above logs, seems to be the SocketTimeoutException. |
The socket timeout is 2 minutes, so that is already quite long. I think increasing that is probable not a good idea. |
Thanks @ManfredKarrer , I saw the 2-minute timeout constant in the source code, however this doesn't match my practical observations, where the timeout message occurs after 60 seconds only (12:16:38 -> 12:17:38 in my log). It might be a different timeout condition. Agreed that a socket timeout of 2 minutes makes perfect sense, on the other hand, a data transfer completion timeout (which is what seems to be happening here) makes more sense to be a function of the data size. I use release v0.9.5 in my tests. |
Ah there is a timeout in GetDataRequestHandler with 60 seconds. That will be the one which triggered here. I will change that to 90 or 120 sec. |
See discussion at bisq-network#2547
Hi @ManfredKarrer, the fix in PR #2583 works, with some limitations. In token-bucket simulations, the resistance to network degradation has improved by ~50% as expected, which confirms that the fix does indeed act on the source of the issue. In real-life conditions, and specifically in my "worst case scenario" network, the improvement is unfortunately insufficient to push the startup sequence over the initial I tried gradually increasing the timeout numbers from PR #2583 and found a sweet spot around 300 seconds (5 minutes). This setting seems to work well to achieve the "expected result" above: the application takes a while to start up, but it does so eventually, even on my degraded network, and once the initial hurdle is cleared, the operation is smooth. |
Thanks for your research! Very long timouts like 5 Minutes might become problematic for other reasons. It will help in that particular case but if the seed node has connection issues you are stuck with an unrespondind seed for too long before you try to connect to another one. We connect to several seeds in parallel anyway so should not be too problematic as well. But I think the solution should be more in the direction to reduce network traffic to the level that with bad network conditions it still does not fail with 1-2 min timeouts. Please feel free to file a compensation request for your detailed testing and invetigations! Very much appreciated! See https://docs.bisq.network/dao.html about the Bisq DAO and how to file a request. |
Thanks for the clarification, glad I was able to help. Yes, I realize that a bad seed may delay the startup for a client that uses the long timeout setting. I was just unsure whether this setting could potentially have a wider impact on the entire network (like a DoS vector) that I'm not aware of. Agreed, a better approach would be to eliminate the bandwidth peaks by protocol design, but that sounds like a lot of work :) |
Yes better than an option (as non technical users will not be able to deal with that) would be to detect network conditions and increase timeout then. But yes changes in the P2P network are all very complex and have to be considered very well. Easy to screw up things unintendedly... |
I had this very issue during vacation on a boat as well (= poor internet connection). The issue most likely is that the initial data requests to the seednodes now exceed a size of 2MB and Bisq queries 2 seednodes simulaneously. If that amount of data cannot be transferred to the seed node before connection timeout, Bisq will never sync up. I fear, that is not something we can easily fix. I thought about changing the syncing process to reduce the request size but that would require a massive change in the code base... I will think about doing something to alert users on the issue if it happens (#2549) and maybe retry only a single seednode request - thus, reducing the payload by 50%. Other than that, if you run into this issue, there is no quick way of fixing that. |
I'm facing a similar issue, but on a high speed network. Here's the full log:
|
(Standard advice: backup before deleting anything) Your log suggests Tor connectivity issues. Try clearing out Tor cachefiles. (Close Bisq first). All files can be deleted from |
It worked! Thanks! :) |
Possibly related to: #2474, #2327. Uses #2278.
This is a slightly edited re-post of the issue reported at:
https://bisq.community/t/issue-bisq-startup-problem-on-slow-networks-3-4-problem/7233
Summary:
On slow/congested networks, Bisq never progresses past step 3/4 of the startup sequence, due to a timeout occurring in the initial handshake to seed nodes.
Details:
One of the compulsory steps in the Bisq client startup sequence is a special handshake with a seed node, consisting in an exchange of large data packets in the 1.5 MB size range (PreliminaryGetDataRequest in the debug log). On slow networks, specifically when the network output channel is degraded, this handshake fails systematically with a timeout error, resulting in an inability to start up (the "3/4" problem).
Bisq versions tested: 0.8.1, 0.9.3, 0.9.4, 0.9.5
Test platform: Tails 3.10, 3.11, 3.12 (Debian 9); Compatibility workarounds from #2278
To reproduce the bug, use a working Bisq installation and limit the network output bandwidth to a small value (100 kbps in my example) using the Linux TBF qdisc.
Steps to reproduce:
sudo tc qdisc add dev eth0 root tbf rate 100kbit burst 1540 latency 50ms
Expected result:
Bisq should complete the startup sequence and operate (with some lag due to network bandwidth limitation).
Actual result:
Bisq never progresses past step 3/4 of startup sequence, resulting in complete no-op.
Log fragment
The text was updated successfully, but these errors were encountered: