Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm does not honour max_negotiating_inbound_streams setting #3041

Closed
rkuhn opened this issue Oct 18, 2022 · 9 comments · Fixed by #3071
Closed

Swarm does not honour max_negotiating_inbound_streams setting #3041

rkuhn opened this issue Oct 18, 2022 · 9 comments · Fixed by #3071

Comments

@rkuhn
Copy link
Contributor

rkuhn commented Oct 18, 2022

loop {
let inbound_stream = ready!(this.poll_inner(cx))?;
if this.inbound_stream_buffer.len() >= MAX_BUFFERED_INBOUND_STREAMS {
log::warn!("dropping {inbound_stream} because buffer is full");
drop(inbound_stream);
continue;
}
this.inbound_stream_buffer.push_back(inbound_stream);
}

With MAX_BUFFERED_INBOUND_STREAMS == 25 this code places a limit of 25 incoming substream requests at any given time, effectively removing the utility of SwarmBuilder’s max_negotiating_inbound_streams setting. The muxer should be refactored such that only the configurable setting is used and the constant removed.

In combination with #3039 this means that currently it is very difficult to implement a working bitswap implementation.

@rkuhn
Copy link
Contributor Author

rkuhn commented Oct 18, 2022

Another thought: this loop can be kept busy for arbitrarily long times by a potent network connection, starving other activities. It should return Poll::Pending from time to time. /cc @thomaseizinger

@mxinden
Copy link
Member

mxinden commented Oct 18, 2022

With MAX_BUFFERED_INBOUND_STREAMS == 25 this code places a limit of 25 incoming substream requests at any given time, effectively removing the utility of SwarmBuilder’s max_negotiating_inbound_streams setting. The muxer should be refactored such that only the configurable setting is used and the constant removed.

Note that MAX_BUFFERED_INBOUND_STREAMS and max_negotiating_inbound_streams are two different limits.

The former limits the amount of inbound (non-negotiated) streams, the latter limits the amount of negotiating inbound streams, i.e. the streams that are running multistream-select + the UpgradeInbound logic of the corresponding ConnectionHandlers.

@thomaseizinger
Copy link
Contributor

Another thought: this loop can be kept busy for arbitrarily long times by a potent network connection, starving other activities. It should return Poll::Pending from time to time. /cc @thomaseizinger

I am not sure how that would work. Are you thinking of a timer that fires at some point and interrupts the loop?

@thomaseizinger
Copy link
Contributor

@rkuhn
Copy link
Contributor Author

rkuhn commented Oct 19, 2022

@mxinden Thinking more about it, libp2p 0.49 broke network protocol compatibility with earlier versions (which is the concrete and acute problem I have with libp2p-bitswap): file sync now fails because old clients will interpret dropped substreams as “that peer doesn’t have what we want”. Plus I have trouble figuring out how new clients should behave differently, but that is a separate issue.

Regarding this issue’s title: I think it is correct, because there is only one setting in the swarm regulating inbound substreams, and it currently doesn’t do anything useful. When I set that setting to 1000 I expect to be able to open 1000 substreams (i.e. fire 1000 requests) in a burst and not lose a single one, which worked in 0.48 but doesn’t work any longer.

@thomaseizinger In general yes, I’d use Instant::elapsed to trigger a cx.waker().wake_by_ref() and return Poll::Pending.

In this particular case the problem is that Connection::poll() has a loop that polls the muxer once and then calls poll_inbound once, starting over if that resulted in successfully beginning negotiations for a substream. This leads to dropping all those valuable new substreams that go beyond 25 in a row. Why not simply return Poll::Pending after the first one? It will be picked up by Connection::poll and my problem would be solved.

So in this case, since this poll is within a larger loop, that loop needs to be broken at regular intervals lest a single client monopolise a thread from the connection thread pool.

@thomaseizinger
Copy link
Contributor

Regarding this issue’s title: I think it is correct, because there is only one setting in the swarm regulating inbound substreams, and it currently doesn’t do anything useful. When I set that setting to 1000 I expect to be able to open 1000 substreams (i.e. fire 1000 requests) in a burst and not lose a single one, which worked in 0.48 but doesn’t work any longer.

Well, the fact that this worked was more of a bug than a feature. Not having any form of back-pressure doesn't work large scale. It also means we wouldn't be able to make use of QUIC's back-pressure mechanism for the number of streams.

I have a proposal open for improving this situation: #2878

It deprecates max_negotiating_inbound_streams and allows individual ConnectionHandlers to decide, how many substreams they can open.

Why not simply return Poll::Pending after the first one? It will be picked up by Connection::poll and my problem would be solved.

That will violate the Poll contract. You shouldn't return Poll::Pending without registering a waker. We could perhaps change it to an Option return type.

Moving forward with libp2p/rust-yamux#142 should help here too I think. We should be able to provide a better interface there once all the Control API cruft is gone.

@rkuhn
Copy link
Contributor Author

rkuhn commented Oct 19, 2022

That will violate the Poll contract.

That’s why I mentioned cx.waker().wake_by_ref() earlier.


May I ask that we separate two things?

  1. libp2p 0.49 is quite troublesome since it breaks existing code and peers using older versions, over the network — this is an issue because the new behaviour cannot be configured; currently my best-looking option is to maintain a fork that patches MAX_BUFFERED_INBOUND_STREAMS to some huge value (yes, “best-looking” is not the same as “good-looking”)
  2. proceeding work on backpressure (as per the ideas you referenced): is it possible to do this in a fashion that maintains compatibility by default and requires opt-in for stricter behaviour?

@thomaseizinger
Copy link
Contributor

That will violate the Poll contract.

That’s why I mentioned cx.waker().wake_by_ref() earlier.

Are you suggesting to immediately wake and thus schedule another call to poll? That is interesting, I had not considered that, thank you!

It is a bit hacky but should work if we have another call in the same poll function that may potentially return Poll::Pending so we can avoid an endless loop.

@rkuhn
Copy link
Contributor Author

rkuhn commented Oct 20, 2022

This is a common pattern in my parts of the woods: relinquish the thread to give another task its chance to run, while stating that we’re not done just yet. Rust’s approach to asynchrony needs this extra care due to the inverted control flow, whereas other runtimes (JS, Java, …) schedule continuations as new tasks and thus get these break points automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants