Disconnect nodes in two steps in the full node #3050
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The full node currently can't really sync because the following happens repeatedly: we connect to a peer, start a request, then the peer disconnects us, and we cancel the request.
Why do peers disconnect us? Because they're actually full. Substrate unfortunately doesn't implement the networking protocol as was intended, and instead of refusing peers ahead of time if it is full, it accepts them then disconnects them. See paritytech/polkadot-sdk#556.
Due to the easy-to-use-but-prone-to-race-conditions API, it is not possible to know whether the request might already have a response at the time of the disconnect. If it already has a response, then canceling the request simply discards it, which is meh.
Also, peers might still answer our request even after disconnecting us. It is at their discretion. In fact, Substrate does this. "Disconnecting" in this context doesn't mean "closing the TCP connection" but "notifying that we don't want peering anymore". The TCP connection is only closed after 10 seconds of inactivity.
What this PR does is: when a peer disconnects, we don't cancel the requests that targeting this peer and that are still in progress. Instead, we simply stop sending any new request, and clean up that peer only once all of its requests are finished.