Various performance improvements #11

awelzel · 2023-10-05T15:59:36Z

Rough summary:

&try/backtrack() removal
remove one packet copy on the analyzer level and pass an iterator into the decryption function (still one copy in the decryption code)
pass std::vector as const ref when possible
skip optimization for padding and bytes fields that are otherwise not used.

On a pcap created with Python's aioquic package containing roughly 12k quic connections, this PR reduces runtime from ~18.5seconds to ~12seconds. By far the largest impact had removal of the previous &try / backtrack() approach - see also zeek/spicy#1565.

Improves performance processing pure QUIC traffic by ~20% Relates to zeek/spicy#1565.

There's still a full packet copy within the decrypt_crypto_payload(), but it's one less now.

There should not be a need for the extra copying. hilti::rt::Bytes are mostly std::string and we can pass by const reference as well.

No need for the copy.

As before, avoid unnecessary copies of std::vector instances.

...and return hilti::rt::Bytes directly.

Now that we do not buffer the packet anymore explicitly, we do not need a should_buffer() method.

I suspect the structure here can be improved, but given we're only interested in the form, replace with an anonymous uint8 field.

There's not much point accumulating it in fields if we're never using it, anyhow.

We only need to copy out the buffer, no need to be overly safe.

Think previously we exported all the symbols :-/

We're not actually using any of the fields, so may as well use skip.

analyzer/decrypt_crypto.cc

analyzer/QUIC.spicy

analyzer/decrypt_crypto.cc

This removes the iterator usage but removes the explicit copy into std::vector<> in favor of using the hilti::rt::Bytes::data() content directly. Hide the reinterpret_cast<> behind a small helper function. And further feedback from Benjamin.

awelzel · 2023-10-06T16:52:57Z

@bbannier - I think I adapted to most you suggested. Mind adding a ? I'll post some hyperfine numbers in a bit, too.

The top two is the current analyzer version in main, the bottom two are the new ones. pcap's are 1) 16 connections downloading 50MB each and 2) ~12k small QUIC connections in a single pcap.

Command	Mean [s]	Min [s]	Max [s]
`zeek -b -C ../build-old/quic.hlto ../scripts base/protocols/ssl -r ../perf/16-50000000.pcap`	30.358 ± 0.286	29.976	30.627
`zeek -b -C ../build-old/quic.hlto ../scripts base/protocols/ssl -r ../perf/many-requests-12000.pcap`	31.781 ± 0.248	31.521	32.187
`zeek -b -C ./quic.hlto ../scripts base/protocols/ssl -r ../perf/16-50000000.pcap`	9.553 ± 0.232	9.238	9.730
`zeek -b -C ./quic.hlto ../scripts base/protocols/ssl -r ../perf/many-requests-12000.pcap`	17.509 ± 0.263	17.337	17.974

It takes ~0.8 seconds for many-requests-12000.pcap and ~1.5 seconds for 16-50000000.pcap without the analyzer enabled.

awelzel added 8 commits October 5, 2023 17:20

analyzer: Replace try/backtrack with parse-at

437a0cd

Improves performance processing pure QUIC traffic by ~20% Relates to zeek/spicy#1565.

decrypt_crypto_payload: Pass stream iterator instead of AllData trick

c5a2a56

There's still a full packet copy within the decrypt_crypto_payload(), but it's one less now.

hkdf_extract: Pass hilti::rt::Bytes directly

20b11d2

There should not be a need for the extra copying. hilti::rt::Bytes are mostly std::string and we can pass by const reference as well.

hkdf_expand: Pass vectors by const-reference

9a3d655

No need for the copy.

remove_header_protection: Avoid copies

360f910

As before, avoid unnecessary copies of std::vector instances.

calculate_nonce: Pass std::vector by const-reference

1430a49

decrypt: Some more std::vector copy reduction

93f7f12

...and return hilti::rt::Bytes directly.

analyzer: Eat padding more efficiently

6408f4d

awelzel force-pushed the topic/awelzel/perf-improvements branch from d8c6210 to 6408f4d Compare October 5, 2023 16:12

awelzel added 8 commits October 6, 2023 11:13

should_buffer/can_decrypt: Unify

ea4c2de

Now that we do not buffer the packet anymore explicitly, we do not need a should_buffer() method.

Remove InitialByte

40d6f17

I suspect the structure here can be improved, but given we're only interested in the form, replace with an anonymous uint8 field.

Use "skip" for encrypted payload values

552256e

There's not much point accumulating it in fields if we're never using it, anyhow.

decrypt_crypto: Switch to std::array

17d3074

decrypt_crypto: Switch to UnsafeConstIterator

1e12739

We only need to copy out the buffer, no need to be overly safe.

decrypt_crypto: Move most everything into anonymous namespace

713794a

Think previously we exported all the symbols :-/

decrypt_crypto: Remove redundant protected_header copy

22fc672

analyzer: Some more skip annotations

6cc078b

We're not actually using any of the fields, so may as well use skip.

awelzel requested a review from bbannier October 6, 2023 12:25

bbannier reviewed Oct 6, 2023

View reviewed changes

awelzel marked this pull request as ready for review October 6, 2023 16:52

awelzel requested a review from bbannier October 6, 2023 16:52

bbannier approved these changes Oct 9, 2023

View reviewed changes

awelzel merged commit 021cf07 into main Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various performance improvements #11

Various performance improvements #11

awelzel commented Oct 5, 2023 •

edited

Loading

awelzel commented Oct 6, 2023 •

edited

Loading

Various performance improvements #11

Various performance improvements #11

Conversation

awelzel commented Oct 5, 2023 • edited Loading

awelzel commented Oct 6, 2023 • edited Loading

awelzel commented Oct 5, 2023 •

edited

Loading

awelzel commented Oct 6, 2023 •

edited

Loading