Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set max concurrent uni streams accordingly -- do not over allocate open uni streams #1060

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

lijunwangs
Copy link

Problem

We can over allocate max concurrent open uni streams than the total streams allowed within a throttle window.
For example, an unstaked node might be eligible for 2 uni streams per throttle window, but we might allocate the 128 concurrent uni streams for it.

This has two problems: allocating resources on the server side unnecessarily and allow the client to open concurrent uni streams which will be throttled on server side, more timeout error on client side and more load on the server side.

Summary of Changes

Do not allow more open concurrent uni streams than the one permitted in a throttle window.

Fixes #

@lijunwangs lijunwangs force-pushed the do_not_over_allocate_streams branch 2 times, most recently from 00d93ce to 1f6108a Compare April 26, 2024 08:05
Copy link

@pgarg66 pgarg66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split the new functionality in its own struct, and add unit tests for the math?

@lijunwangs lijunwangs force-pushed the do_not_over_allocate_streams branch from 1c5fcf3 to d21dab8 Compare May 8, 2024 22:29
@codecov-commenter
Copy link

codecov-commenter commented May 9, 2024

Codecov Report

Attention: Patch coverage is 97.29730% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 82.1%. Comparing base (2bc026d) to head (9b5c06c).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##           master    #1060     +/-   ##
=========================================
- Coverage    82.1%    82.1%   -0.1%     
=========================================
  Files         893      893             
  Lines      236600   236677     +77     
=========================================
+ Hits       194429   194451     +22     
- Misses      42171    42226     +55     

@lijunwangs
Copy link
Author

Can we split the new functionality in its own struct, and add unit tests for the math?

Done.

) -> u64 {
let max_streams_per_throttle_window =
ema.available_load_capacity_in_throttling_duration(peer_type, total_stake);
(UniStreamQosUtil::compute_max_allowed_uni_streams(peer_type, total_stake) as u64)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: UniStreamQosUtil::compute_max_allowed_uni_streams could be replaced with Self::compute_max_allowed_uni_streams

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/// Given the max_streams_per_throttling_interval, derive the streams per throttle window.
/// Do not allow concurrent streams more than the max streams per throttle window.
pub fn max_concurrent_uni_streams_per_throttling_interval(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a function for this? It's just a wrapper on min. Why not directly use min() where we are calling this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes the design goal more explicit and easier to do test

@@ -811,8 +844,15 @@ async fn handle_connection(
stats.total_streams.load(Ordering::Relaxed),
stats.total_connections.load(Ordering::Relaxed),
);
connection.set_max_concurrent_uni_streams(max_uni_streams);
if let Some(receive_window) = receive_window {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit of moving receive_window setting to this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It encapsulates better to put QOS related config for connections in one place (receive window and max concurrent uni streams.

@@ -856,6 +896,20 @@ async fn handle_connection(
sleep(throttle_duration).await;
}
}
let max_concurrent_uni_streams =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: max_concurrent_uni_streams is very overloaded here. Can we simplify the code? Maybe we if use min() instead of UniStreamQosUtil::max_concurrent_uni_streams_per_throttling_interval(), the code could be compressed and simplified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the variables a little to clarify

@pgarg66
Copy link

pgarg66 commented May 9, 2024

Some nits. Otherwise the logic looks good to me.
Let's wait for @alessandrod to also take a look.

Copy link

@pgarg66 pgarg66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please give @alessandrod a chance to look at it before merging.

@lijunwangs
Copy link
Author

@alessandrod -- this is the formal change for the experiments we have done in '3gG' where we hard coded concurrent streams to 1 for unstaked connection based on the calculation of max allowed streams count per throttle window. This change makes the stream count limit to take the minimum of (original max current uni streams per stake, max allowed streams per throttle window).

@alessandrod
Copy link

Oops sorry I had missed this! I'll take a look in the morning

@lijunwangs
Copy link
Author

Oops sorry I had missed this! I'll take a look in the morning

Hi @alessandrod , any further comments on this PR? I'd like to wrap it up

@alessandrod
Copy link

From what I understand, I don't think this code is needed, adds complexity and probably some round trips to communicate the new limit to peers. (And locking and wakeups of the connection task, but admittedly those should be infrequent). I could be wrong of course, but I haven't seen any plausible explanation of why this is needed (or multiple streams are needed to begin with).

@lijunwangs
Copy link
Author

From what I understand, I don't think this code is needed, adds complexity and probably some round trips to communicate the new limit to peers. (And locking and wakeups of the connection task, but admittedly those should be infrequent). I could be wrong of course, but I haven't seen any plausible explanation of why this is needed (or multiple streams are needed to begin with).

Can you clarify why it is not needed? I think you would agree over allocate is bad. If you are questioning why we have multiple streams in the first place. As I mentioned it is based on the thinking of reducing head of line issue and 2. for better performance using parallelism. And I mentioned in the slack channel that 1 stream vs the current default there is at least 3 times of difference in bench-tps test. Your point of multiple stream may cause fragmentations among the stream is a valid point. But it is orthogonal to this PR. I am not making it worse. If you are proposing just change stream count to 1 for everything, I am thinking it is too drastic change. Please explicit if you are proposing something different.

@lijunwangs lijunwangs requested a review from sakridge May 18, 2024 03:09
@alessandrod
Copy link

alessandrod commented May 18, 2024

From what I understand, I don't think this code is needed, adds complexity and probably some round trips to communicate the new limit to peers. (And locking and wakeups of the connection task, but admittedly those should be infrequent). I could be wrong of course, but I haven't seen any plausible explanation of why this is needed (or multiple streams are needed to begin with).

Can you clarify why it is not needed? I think you would agree over allocate is bad.

Over allocate what? From what I understand - and again I could be wrong - we're not allocating anything more on the server side whether we allow 1 stream or 1000 streams. The max streams limit is a protocol limit that is communicated to the client. The server doesn't pre-reserve anything about it. Just the client will stop opening streams once it runs out of streams. Stream id tracking is done in the client, it requires no synchronization with the server. We have one task per connection. We read one stream at a time - not in parallel: we pop the next stream from the connection and process it. What gets over allocated?

The thing that changes how much the server allocates is the receive window, and we already bound that.

If you are questioning why we have multiple streams in the first place. As I mentioned it is based on the thinking of reducing head of line issue and 2. for better performance using parallelism.

I thought we agreed on slack that there's no HOL issue? If you don't agree, can you explain to me where the HOL issue is exactly?

And can you explain where the parallelism is, if the server has one task per connection and pops one stream at a time? How is parallelism increased exactly?

And I mentioned in the slack channel that 1 stream vs the current default there is at least 3 times of difference in bench-tps test. Your point of multiple stream may cause fragmentations among the stream is a valid point. But it is orthogonal to this PR. I am not making it worse.

You are adding code that I don't think is necessary. More code is always bad: more complexity and more bugs. In that sense, it's worse. Obviously I can be wrong, but I'd like to know how I'm wrong if you want me to approve the PR.

If you are proposing just change stream count to 1 for everything, I am thinking it is too drastic change. Please explicit if you are proposing something different.

I'm not proposing to set streams to 1 in the context of this PR. I do think we should do it, but as you said it's orthogonal to this PR and likely requires fixes in the client before we can do it.

@lijunwangs
Copy link
Author

From what I understand, I don't think this code is needed, adds complexity and probably some round trips to communicate the new limit to peers. (And locking and wakeups of the connection task, but admittedly those should be infrequent). I could be wrong of course, but I haven't seen any plausible explanation of why this is needed (or multiple streams are needed to begin with).

Can you clarify why it is not needed? I think you would agree over allocate is bad.

Over allocate what? From what I understand - and again I could be wrong - we're not allocating anything more on the server side whether we allow 1 stream or 1000 streams. The max streams limit is a protocol limit that is communicated to the client. The server doesn't pre-reserve anything about it. Just the client will stop opening streams once it runs out of streams. Stream id tracking is done in the client, it requires no synchronization with the server. We have one task per connection. We read one stream at a time - not in parallel: we pop the next stream from the connection and process it. What gets over allocated?

Lijun: To follow up. I mentioned in the PR description:
This has two problems: allocating resources on the server side unnecessarily and allow the client to open more concurrent uni streams which will be throttled on server side, more timeout error on client side and more load on the server side. My simple intuition was: if we allow to max N streams in a throttle window, do not allow the max concurrent open streams to be more than N, as the ones exceeding that will be throttled anyway. Open uni streams does require allocating resources to maintain the states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants