Swarm does not honour `max_negotiating_inbound_streams` setting #3041

rkuhn · 2022-10-18T15:04:38Z

Lines 142 to 152 in 4d4833f

    
           loop { 
        
               let inbound_stream = ready!(this.poll_inner(cx))?; 
        
               if this.inbound_stream_buffer.len() >= MAX_BUFFERED_INBOUND_STREAMS { 
        
                   log::warn!("dropping {inbound_stream} because buffer is full"); 
        
                   drop(inbound_stream); 
        
                   continue; 
        
               } 
        
               this.inbound_stream_buffer.push_back(inbound_stream); 
        
           }

With MAX_BUFFERED_INBOUND_STREAMS == 25 this code places a limit of 25 incoming substream requests at any given time, effectively removing the utility of SwarmBuilder’s max_negotiating_inbound_streams setting. The muxer should be refactored such that only the configurable setting is used and the constant removed.

In combination with #3039 this means that currently it is very difficult to implement a working bitswap implementation.

The text was updated successfully, but these errors were encountered:

rkuhn · 2022-10-18T15:13:40Z

Another thought: this loop can be kept busy for arbitrarily long times by a potent network connection, starving other activities. It should return Poll::Pending from time to time. /cc @thomaseizinger

mxinden · 2022-10-18T15:35:36Z

With MAX_BUFFERED_INBOUND_STREAMS == 25 this code places a limit of 25 incoming substream requests at any given time, effectively removing the utility of SwarmBuilder’s max_negotiating_inbound_streams setting. The muxer should be refactored such that only the configurable setting is used and the constant removed.

Note that MAX_BUFFERED_INBOUND_STREAMS and max_negotiating_inbound_streams are two different limits.

The former limits the amount of inbound (non-negotiated) streams, the latter limits the amount of negotiating inbound streams, i.e. the streams that are running multistream-select + the UpgradeInbound logic of the corresponding ConnectionHandlers.

thomaseizinger · 2022-10-18T20:49:19Z

Another thought: this loop can be kept busy for arbitrarily long times by a potent network connection, starving other activities. It should return Poll::Pending from time to time. /cc @thomaseizinger

I am not sure how that would work. Are you thinking of a timer that fires at some point and interrupts the loop?

thomaseizinger · 2022-10-18T20:50:47Z

Cross-referencing related efforts:

rkuhn · 2022-10-19T07:25:13Z

@mxinden Thinking more about it, libp2p 0.49 broke network protocol compatibility with earlier versions (which is the concrete and acute problem I have with libp2p-bitswap): file sync now fails because old clients will interpret dropped substreams as “that peer doesn’t have what we want”. Plus I have trouble figuring out how new clients should behave differently, but that is a separate issue.

Regarding this issue’s title: I think it is correct, because there is only one setting in the swarm regulating inbound substreams, and it currently doesn’t do anything useful. When I set that setting to 1000 I expect to be able to open 1000 substreams (i.e. fire 1000 requests) in a burst and not lose a single one, which worked in 0.48 but doesn’t work any longer.

@thomaseizinger In general yes, I’d use Instant::elapsed to trigger a cx.waker().wake_by_ref() and return Poll::Pending.

In this particular case the problem is that Connection::poll() has a loop that polls the muxer once and then calls poll_inbound once, starting over if that resulted in successfully beginning negotiations for a substream. This leads to dropping all those valuable new substreams that go beyond 25 in a row. Why not simply return Poll::Pending after the first one? It will be picked up by Connection::poll and my problem would be solved.

So in this case, since this poll is within a larger loop, that loop needs to be broken at regular intervals lest a single client monopolise a thread from the connection thread pool.

thomaseizinger · 2022-10-19T13:05:30Z

Regarding this issue’s title: I think it is correct, because there is only one setting in the swarm regulating inbound substreams, and it currently doesn’t do anything useful. When I set that setting to 1000 I expect to be able to open 1000 substreams (i.e. fire 1000 requests) in a burst and not lose a single one, which worked in 0.48 but doesn’t work any longer.

Well, the fact that this worked was more of a bug than a feature. Not having any form of back-pressure doesn't work large scale. It also means we wouldn't be able to make use of QUIC's back-pressure mechanism for the number of streams.

I have a proposal open for improving this situation: #2878

It deprecates max_negotiating_inbound_streams and allows individual ConnectionHandlers to decide, how many substreams they can open.

Why not simply return Poll::Pending after the first one? It will be picked up by Connection::poll and my problem would be solved.

That will violate the Poll contract. You shouldn't return Poll::Pending without registering a waker. We could perhaps change it to an Option return type.

Moving forward with libp2p/rust-yamux#142 should help here too I think. We should be able to provide a better interface there once all the Control API cruft is gone.

rkuhn · 2022-10-19T13:21:36Z

That will violate the Poll contract.

That’s why I mentioned cx.waker().wake_by_ref() earlier.

May I ask that we separate two things?

libp2p 0.49 is quite troublesome since it breaks existing code and peers using older versions, over the network — this is an issue because the new behaviour cannot be configured; currently my best-looking option is to maintain a fork that patches MAX_BUFFERED_INBOUND_STREAMS to some huge value (yes, “best-looking” is not the same as “good-looking”)
proceeding work on backpressure (as per the ideas you referenced): is it possible to do this in a fashion that maintains compatibility by default and requires opt-in for stricter behaviour?

thomaseizinger · 2022-10-20T00:28:35Z

That will violate the Poll contract.

That’s why I mentioned cx.waker().wake_by_ref() earlier.

Are you suggesting to immediately wake and thus schedule another call to poll? That is interesting, I had not considered that, thank you!

It is a bit hacky but should work if we have another call in the same poll function that may potentially return Poll::Pending so we can avoid an endless loop.

rkuhn · 2022-10-20T05:31:50Z

This is a common pattern in my parts of the woods: relinquish the thread to give another task its chance to run, while stating that we’re not done just yet. Rust’s approach to asynchrony needs this extra care due to the inverted control flow, whereas other runtimes (JS, Java, …) schedule continuations as new tasks and thus get these break points automatically.

thomaseizinger mentioned this issue Oct 30, 2022

muxers/yamux: Mitigation of unnecessary stream drops #3071

Merged

4 tasks

mxinden mentioned this issue Nov 4, 2022

Backpressure between components #3078

Open

15 tasks

mergify bot closed this as completed in #3071 Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swarm does not honour `max_negotiating_inbound_streams` setting #3041

Swarm does not honour `max_negotiating_inbound_streams` setting #3041

rkuhn commented Oct 18, 2022

rkuhn commented Oct 18, 2022

mxinden commented Oct 18, 2022 •

edited

Loading

thomaseizinger commented Oct 18, 2022

thomaseizinger commented Oct 18, 2022

rkuhn commented Oct 19, 2022

thomaseizinger commented Oct 19, 2022

rkuhn commented Oct 19, 2022

thomaseizinger commented Oct 20, 2022

rkuhn commented Oct 20, 2022

Swarm does not honour max_negotiating_inbound_streams setting #3041

Swarm does not honour max_negotiating_inbound_streams setting #3041

Comments

rkuhn commented Oct 18, 2022

rkuhn commented Oct 18, 2022

mxinden commented Oct 18, 2022 • edited Loading

thomaseizinger commented Oct 18, 2022

thomaseizinger commented Oct 18, 2022

rkuhn commented Oct 19, 2022

thomaseizinger commented Oct 19, 2022

rkuhn commented Oct 19, 2022

thomaseizinger commented Oct 20, 2022

rkuhn commented Oct 20, 2022

Swarm does not honour `max_negotiating_inbound_streams` setting #3041

Swarm does not honour `max_negotiating_inbound_streams` setting #3041

mxinden commented Oct 18, 2022 •

edited

Loading