-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libp2p-swarm] Permit more controlled backpressure. #1586
Conversation
Establish an upper bound for now, this is a separate issue.
As I mentioned in the other PR, I'm very reluctant about this change. To be clear, In other words, in an ideal world, the various I do believe that this PR considerably complicates the overall design, and I don't really understand what making |
I find this problematic for the reasons mentioned in the PR.
Interesting, because I think it offers a good trade-off between allowing opt-in improvements on the potential |
This is a more elaborate continuation from #1585 that proposes some first backward-compatible backpressure facilities between the
Network
and aNetworkBehaviour
via theSwarm
. Please note that the size of the diff is a bit blown out of proportion only because of an indented match block (and a lot of added commentary).Network
toNetworkBehaviour
When network connections become slow to process events, which can currently either happen because
ConnectionHandler::inject_event
is slow or because the task scheduler is busy or overwhelmed, theSwarm
starts being unable to deliver new events emitted by theNetworkBehaviour
to the respective connection(s). When that happens, it first tries to propagate backpressure to theNetworkBehaviour
by callingNetworkBehaviour::inject_connections_busy
, i.e. giving the behaviour the event back to be emitted again at a later time. If the behaviour rejects that event (i.e. does not implement backpressure), theSwarm
buffers thepending_event
and still allows network I/O and consumption of connection events by theNetworkBehaviour
viaNetworkBehaviour::inject_event
to continue despite the buffered event, subject to the configuredmax_event_lead
, which controls the maximum number of calls toNetworkBehaviour::inject_event
that may be made before a call toNetworkBehaviour::poll
must happen. Whenmax_event_lead
is reached and thepending_event
is still not consumed, theSwarm
waits for the connection (handler) to signal readiness to consume the event, i.e. exerting backpressure on the entireNetwork
. The point wheremax_event_lead
is reached is essentially the point at which theNetworkBehaviour
in turn exerts backpressure on theNetwork
, because it cannot consume more events without being allowed to make progress viaNetworkBehaviour::poll
.Important: By default,
max_event_lead
is1
and the default implementation ofNetworkBehaviour::inject_connections_busy
isErr(event)
, meaning these changes are fully backward-compatible and without changing the new configuration option formax_event_lead
or implementingNetworkBehaviour::inject_connections_busy
, the existing behaviour is preserved, allowing a gradual transition and experimentation.Missing: What I think is really missing to make this backpressure pipeline complete is a backward-incompatible change to
ConnectionHander::inject_event
by allowing it to outright return (i.e. reject) the given event and thus exert backpressure on the background task which in turn would propagate backpressure back through theNetwork
to theSwarm
by not consuming new events from its inbound channel until theinject_event
accepted the event.NetworkBehaviour
toNetwork
The
NetworkBehaviour
has no direct means for backpressure onNetworkBehaviour::inject_event
as it is expected from the contract of the trait that whenever the behaviour returnsPoll::Pending
, it is ready to receive more events, and themax_event_lead
bounds by how much the number of calls toNetworkBehaviour::inject_event
may exceed those toNetworkBehaviour::poll
. Oncemax_event_lead
is reached backpressure propagates back to theNetwork
. In this way, themax_event_lead
configures the threshold for backpressure from theNetworkBehaviour
towards theNetwork
.Important: I changed the polling behaviour of the swarm such that it always polls the
NetworkBehaviour
until it returnsPoll::Pending
, which signals that the behaviour is waiting for progress and input from theNetwork
(or the network connections signal backpressure, in which case the behaviour is also no longer polled). This is supposed to be an improvement in two ways. Firstly, if a single call toNetworkBehaviour::inject_event
happens to result in more than 1 (possibly many) events to be ready to be emitted fromNetworkBehaviour::poll
, then the behaviour is drained of these events before the next call toinject_event
, thus preventing potentially unbounded buffer growth in the behaviour if calls toinject_event
andpoll
always happen roughly at the same frequency. Secondly, polling the behaviour until it isPending
allows it to catch up after a previous increase of theevent_lead
due to some network slowness. I clarified the documentation on the matter that a behaviour that from some point onwards never again emitsPoll::Pending
is usually misbehaved, as this is a behaviour that, from some point onwards, no longer needs any input or progress from theNetwork
. I also think that it makes for relatively easy to understand semantics of how theSwarm
does the polling, as it makes clear that theNetworkBehaviour
takes the lead and the underlyingNetwork
is only an input/output facility driven on demand. There may be strange cases where a behaviour is very high on output but low on input, in which case it may be beneficial to sprinklePoll::Pending
in itspoll
output intentionally to permit network progress, but even if it doesn't, then there is still the safeguard that theSwarm
stops polling the behaviour if the network connections signal backpressure.