Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Networking] set TCP buffer overrides to None by default #10176

Merged
merged 5 commits into from
Sep 22, 2023

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Sep 21, 2023

Description

Tests show that overriding the socket options for send and recv buffer actually hurts performance (even with default linux kernel settings). Logically, this rolls back #4649 but keeping the configs and code around in case this or other socket options are revisited.

Test Plan

New results:

  • net_bench_two_region_env: 800 KB/s -> 4 MB/s
  • pfn_performance_with_realistic_env: 4K -> 6K TPS

@bchocho bchocho added the CICD:build-images when this label is present github actions will start build+push rust images from the PR. label Sep 21, 2023
@bchocho
Copy link
Contributor Author

bchocho commented Sep 22, 2023

Forge result links:

} else {
4500
};
let min_expected_tps = 4500;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this pretty conservative for now. I've seen a range of 5-6.2K TPS.

@bchocho bchocho marked this pull request as ready for review September 22, 2023 05:44
@bchocho bchocho requested a review from zekun000 September 22, 2023 05:44
@bchocho bchocho changed the title [Networking] remove TCP window overrides and configs [Networking] remove TCP buffer overrides and configs Sep 22, 2023
@@ -157,10 +149,6 @@ impl NetworkConfig {
inbound_rate_limit_config: None,
outbound_rate_limit_config: None,
max_message_size: MAX_MESSAGE_SIZE,
inbound_rx_buffer_size_bytes: Some(INBOUND_TCP_RX_BUFFER_SIZE),
Copy link
Contributor

@JoshLind JoshLind Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question, but, instead of removing support for this entirely, is there a way to just set the default to not override the values (i.e., default to None in the config and have the code do nothing)? e.g., I'm just wondering if there's ever a scenario where we want to set these things?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-adding tcp socket option setting would have the annoying work of figuring out where it goes, so leaving this on but disabled (it already had Option if-None-do-nothing) makes sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't think we want to mess with this and so might as well clean up the configs now. For this to be effective the user would have to bump up the linux configs, otherwise it has a detrimental effect.

If you feel strongly I'll leave the config but add a comment on the ways it shouldn't be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it in or at least leave a comment // TODO: if needed, set connection socket options here; someone is going to want to tinker with this again, even if Linux has pretty good defaults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JoshLind @brianolson . Makes sense, I guess there are other socket options that someone might want to play with in the future as well. I've changed the PR to just set the defaults as None and put up a big comment to proceed with caution.

@bchocho bchocho requested a review from JoshLind September 22, 2023 16:01
@bchocho bchocho changed the title [Networking] remove TCP buffer overrides and configs [Networking] set TCP buffer overrides to None by default Sep 22, 2023
@bchocho bchocho enabled auto-merge (squash) September 22, 2023 16:43
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.6.2 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad

Compatibility test results for aptos-node-v1.6.2 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad (PR)
1. Check liveness of validators at old version: aptos-node-v1.6.2
compatibility::simple-validator-upgrade::liveness-check : committed: 4631 txn/s, latency: 6849 ms, (p50: 6300 ms, p90: 9600 ms, p99: 19000 ms), latency samples: 175980
2. Upgrading first Validator to new version: 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1838 txn/s, latency: 15565 ms, (p50: 18000 ms, p90: 21800 ms, p99: 22300 ms), latency samples: 91940
3. Upgrading rest of first batch to new version: 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1720 txn/s, latency: 16397 ms, (p50: 17700 ms, p90: 21700 ms, p99: 24800 ms), latency samples: 89460
4. upgrading second batch to new version: 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3536 txn/s, latency: 9055 ms, (p50: 9900 ms, p90: 12700 ms, p99: 13000 ms), latency samples: 141460
5. check swarm health
Compatibility test for aptos-node-v1.6.2 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad

two traffics test: inner traffic : committed: 6118 txn/s, latency: 6399 ms, (p50: 6000 ms, p90: 8300 ms, p99: 12900 ms), latency samples: 2649520
two traffics test : committed: 100 txn/s, latency: 2675 ms, (p50: 2500 ms, p90: 2900 ms, p99: 8200 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.232, avg: 0.215", "QsPosToProposal: max: 0.231, avg: 0.172", "ConsensusProposalToOrdered: max: 0.652, avg: 0.630", "ConsensusOrderedToCommit: max: 0.577, avg: 0.545", "ConsensusProposalToCommit: max: 1.219, avg: 1.175"]
Max round gap was 1 [limit 4] at version 1034857. Max no progress secs was 3.668509 [limit 10] at version 1034857.
Test Ok

@bchocho bchocho merged commit fee28ac into main Sep 22, 2023
81 checks passed
@bchocho bchocho deleted the brian/remove-window-cfg branch September 22, 2023 17:13
@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.5.1 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad

Compatibility test results for aptos-node-v1.5.1 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad (PR)
Upgrade the nodes to version: 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 5323 txn/s, latency: 6037 ms, (p50: 6000 ms, p90: 9300 ms, p99: 9800 ms), latency samples: 196980
5. check swarm health
Compatibility test for aptos-node-v1.5.1 ==> 8ed6877357f5d59cf33eddaf46d1e9ad2cd1dcad passed
Test Ok

bchocho added a commit that referenced this pull request Nov 2, 2023
Tests show that overriding the socket options for send and recv buffer actually hurts performance (even with default linux kernel settings). Logically, this rolls back #4649 but keeping the configs and code around in case this or other socket options are revisited.

New results:
* `net_bench_two_region_env`: 800 KB/s -> 4 MB/s
* `pfn_performance_with_realistic_env`: 4K -> 6K TPS
bchocho added a commit that referenced this pull request Nov 2, 2023
…0775)

Tests show that overriding the socket options for send and recv buffer actually hurts performance (even with default linux kernel settings). Logically, this rolls back #4649 but keeping the configs and code around in case this or other socket options are revisited.

New results:
* `net_bench_two_region_env`: 800 KB/s -> 4 MB/s
* `pfn_performance_with_realistic_env`: 4K -> 6K TPS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-images when this label is present github actions will start build+push rust images from the PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants