Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various performance improvements #11

Merged
merged 17 commits into from
Oct 9, 2023
Merged

Conversation

awelzel
Copy link

@awelzel awelzel commented Oct 5, 2023

Rough summary:

  • &try/backtrack() removal
  • remove one packet copy on the analyzer level and pass an iterator into the decryption function (still one copy in the decryption code)
  • pass std::vector as const ref when possible
  • skip optimization for padding and bytes fields that are otherwise not used.

On a pcap created with Python's aioquic package containing roughly 12k quic connections, this PR reduces runtime from ~18.5seconds to ~12seconds. By far the largest impact had removal of the previous &try / backtrack() approach - see also zeek/spicy#1565.

Improves performance processing pure QUIC traffic by ~20%

Relates to zeek/spicy#1565.
There's still a full packet copy within the decrypt_crypto_payload(),
but it's one less now.
There should not be a need for the extra copying. hilti::rt::Bytes
are mostly std::string and we can pass by const reference as well.
As before, avoid unnecessary copies of std::vector instances.
...and return hilti::rt::Bytes directly.
@awelzel awelzel force-pushed the topic/awelzel/perf-improvements branch from d8c6210 to 6408f4d Compare October 5, 2023 16:12
Now that we do not buffer the packet anymore explicitly, we do not need
a should_buffer() method.
I suspect the structure here can be improved, but given we're only
interested in the form, replace with an anonymous uint8 field.
There's not much point accumulating it in fields if we're never
using it, anyhow.
We only need to copy out the buffer, no need to be overly safe.
Think previously we exported all the symbols :-/
We're not actually using any of the fields, so may as well use skip.
@awelzel awelzel requested a review from bbannier October 6, 2023 12:25
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/QUIC.spicy Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
analyzer/decrypt_crypto.cc Show resolved Hide resolved
analyzer/decrypt_crypto.cc Outdated Show resolved Hide resolved
This removes the iterator usage but removes the explicit copy
into std::vector<> in favor of using the hilti::rt::Bytes::data()
content directly. Hide the reinterpret_cast<> behind a small
helper function.

And further feedback from Benjamin.
@awelzel awelzel marked this pull request as ready for review October 6, 2023 16:52
@awelzel awelzel requested a review from bbannier October 6, 2023 16:52
@awelzel
Copy link
Author

awelzel commented Oct 6, 2023

@bbannier - I think I adapted to most you suggested. Mind adding a :shipit: ? I'll post some hyperfine numbers in a bit, too.

The top two is the current analyzer version in main, the bottom two are the new ones. pcap's are 1) 16 connections downloading 50MB each and 2) ~12k small QUIC connections in a single pcap.

Command Mean [s] Min [s] Max [s]
zeek -b -C ../build-old/quic.hlto ../scripts base/protocols/ssl -r ../perf/16-50000000.pcap 30.358 ± 0.286 29.976 30.627
zeek -b -C ../build-old/quic.hlto ../scripts base/protocols/ssl -r ../perf/many-requests-12000.pcap 31.781 ± 0.248 31.521 32.187
zeek -b -C ./quic.hlto ../scripts base/protocols/ssl -r ../perf/16-50000000.pcap 9.553 ± 0.232 9.238 9.730
zeek -b -C ./quic.hlto ../scripts base/protocols/ssl -r ../perf/many-requests-12000.pcap 17.509 ± 0.263 17.337 17.974

It takes ~0.8 seconds for many-requests-12000.pcap and ~1.5 seconds for 16-50000000.pcap without the analyzer enabled.

@awelzel awelzel merged commit 021cf07 into main Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants