Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(netbench) enable jumbo frame probing by default #1648

Merged
merged 10 commits into from
Mar 3, 2023

Conversation

jmayclin
Copy link
Contributor

@jmayclin jmayclin commented Mar 2, 2023

Description of changes:

The main semantic change in this PR is switching the s2n-quic netbench driver to probe for jumbo frames by default. The tcp stack already does this by default, so this should make comparison between quic/tcp/tls a bit more accurate.

This also expands our Netbench readmes and adds an example script that shows a working end to end example of how to use netbench.

Call-outs:

Cleanup of child processes: I don't know that we have a "best practice" way to do this yet. The cleanest way to do this would be to refactor the collector's to use a SIGTERM handler that would kill the child process. This would require a bit more of a code change, so instead my script just uses process groups to send the SIGTERM to multiple processes.

Loopback funkiness: I disable GSO in my script, because if it is enabled there are excessively large packet sizes. Example tshark output

1093975 26.194205764    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093976 26.194221320    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093977 26.194222387    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
1093978 26.194238773    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093979 26.194264256    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093980 26.194281223    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093981 26.194302019    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093982 26.194331389    127.0.0.1 → 127.0.0.1    QUIC 12044 Protected Payload (KP0)
1093983 26.194333966    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
1093984 26.194346436    127.0.0.1 → 127.0.0.1    QUIC 4844 Protected Payload (KP0)
1093985 26.194396362    127.0.0.1 → 127.0.0.1    QUIC 8910 Protected Payload (KP0)
1093986 26.194439364    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
1093987 26.194527024    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)

I also saw quite a few 60,000+ sized packets at the start of the run. In my research it looks like this is to be expected with loopback devices, but if anyone wants me to dig into this more let me know.

failure to probe jumbo frames: When I was running netbench with tshark, occasionally I would see that quic would fail to successfully probe the jumbo frames and would instead stick with the minimum transmission unit. My expectation is that this would be happening due to packet drops, but I would expect the drop rate to be very low over the loopback interface, so this was surprising. I think it would be worth investigating this further, but curious for other's opinions.

Testing:

In one terminal I ran sudo tshark -f "udp port 4433" -i any to capture the packets sent in the netbench run.

In another terminal I ran run-netbench-request-response.sh

This is an example of the output from the terminal running tshark

823879 530.238617349    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823880 530.238624905    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823881 530.238631693    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823882 530.238637910    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
823883 530.238638832    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823884 530.238646083    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823885 530.238650239    127.0.0.1 → 127.0.0.1    QUIC 287 Protected Payload (KP0)
823886 530.238682702    127.0.0.1 → 127.0.0.1    QUIC 100 Protected Payload (KP0)
823887 530.238710821    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823888 530.238721978    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823889 530.238729310    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823890 530.238736491    127.0.0.1 → 127.0.0.1    QUIC 8967 Protected Payload (KP0)
823891 530.238741533    127.0.0.1 → 127.0.0.1    QUIC 3095 Protected Payload (KP0)
823892 530.238744232    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
823893 530.263846138    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)
823894 530.263907179    127.0.0.1 → 127.0.0.1    QUIC 97 Protected Payload (KP0)

We can see that jumbo frames are being used.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This commit enabled jumbo frames by default for the quic drivers. They
can be disabled by setting the env variable MAX_MTU to some smaller
value, like 1,500.
Currently the way to understand netbench is to reverse engineer our code
or decipher the netbench CI.

I has a lot of difficulty getting a minimal netbench script running, so
ended up adding a bit of documentation describing my difficulties and
updating some out of date things.
otherwise massive packet sizes are observed
@jmayclin jmayclin requested a review from camshaft March 2, 2023 04:42
netbench/README.md Outdated Show resolved Hide resolved
netbench/README.md Outdated Show resolved Hide resolved
netbench/README.md Outdated Show resolved Hide resolved
netbench/run-netbench-request-response.sh Outdated Show resolved Hide resolved
netbench/run-netbench-request-response.sh Outdated Show resolved Hide resolved
Comment on lines 28 to 31
# run the server while collecting metrics. Generic Segmenetation Offload (GSO)
# is disabled, because it results in production-disssimilar behaviors when
# running over the loopback interface. Specifically, packets larger than the
# max supported MTU can be observed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke offline: this is to be expected. We hand the OS massive payload buffers and it defers splitting it up into individual packets for as long as possible. I think it's best to test with it enabled.

netbench/run-netbench-request-response.sh Outdated Show resolved Hide resolved
Also include some light edits in the readme.
camshaft
camshaft previously approved these changes Mar 3, 2023
netbench/scripts/netbench-run.sh Outdated Show resolved Hide resolved
camshaft
camshaft previously approved these changes Mar 3, 2023
@jmayclin jmayclin enabled auto-merge (squash) March 3, 2023 21:51
@jmayclin jmayclin disabled auto-merge March 3, 2023 21:54
@jmayclin jmayclin enabled auto-merge (squash) March 3, 2023 22:17
@jmayclin jmayclin merged commit 02660c8 into aws:main Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants