Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression test for firewall #783

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Hasan6979
Copy link
Contributor

@Hasan6979 Hasan6979 commented Aug 19, 2024

Problem

We have a benchmark test for firewall, but we don't regress future features agains't a baseline.

Solution

Add a regression test for firewall with v5.0.0-rc3 as baseline using critcmp crate. critcmp lists benchmarks alphabetically so a speed up could also result in failing hence awk is used to filter out speed ups.

☑️ Definition of Done checklist

  • Commit history is clean (requirements)
  • README.md is updated
  • Functionality is covered by unit or integration tests

@Hasan6979 Hasan6979 force-pushed the LLT-5338_firewall_regression_test branch 4 times, most recently from 36d5a98 to 107acaf Compare August 20, 2024 12:16
@Hasan6979 Hasan6979 force-pushed the LLT-5338_firewall_regression_test branch from 107acaf to 6ec0272 Compare August 20, 2024 12:26
@Hasan6979 Hasan6979 marked this pull request as ready for review August 20, 2024 12:36
@Hasan6979 Hasan6979 changed the title Draft: Regression test for firewall Regression test for firewall Aug 20, 2024
if [ -z "$output" ]; then
echo "No performance regression detected."
else
echo "Performance regression detected! Failing the build."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is run on the public github owned runner - we have no control over how busy is the machine any any point in time so I'm not sure that those numbers are meaningful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small differences might not come up, but a slowdown should still be detected, something which would either way fall through as we don't regress.
Increasing the number of runs and taking average, or well self-hosting would be alternatives.

Copy link
Contributor

@tomaszklak tomaszklak Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but it also means that this is prone to being flaky. Imo, without runner physical machine that is not being used for anything when the benchmarks are being run*, this shouldn't be merged in.

[*] this is assuming it's being configured correctly to limit thermal throttling and other things that can impact benchmarks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should then think about how to do that, as the same problem would come in for any other performance tests even with boringtun.

working-directory: crates/telio-firewall
run: |
result=$(critcmp current baseline -t 5)
output=$($output | awk '{if (NR > 2) { if ($10 == 1.00 || $14 == 1.00) { if ($10 < $14) {exit 1} } else { if ($13 < $17) {exit 1}}}}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is very unclear - I think a comment explaining the format would be a good addition

- name: Telio firewall benchmarks
working-directory: crates/telio-firewall
run: cargo bench --features test_utils --bench firewall_bench "64" -- --warm-up-time 1 --measurement-time 1
run: cargo bench --features test_utils --bench firewall_bench "64" -- --warm-up-time 1 --measurement-time 1 --save-baseline current
Copy link
Contributor

@tomaszklak tomaszklak Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this (and the other one) benchmark run should be run on the same physical core (and only on that one) to avoid noise from scheduling (can be done with taskset), but this only makes sense when we run on a known machine where we can prevent other processes from being scheduled on our selected core.

Overall, I think this is a good source of ideas for running benchmarks in CI: https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ and/or https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux#7-disable-address-space-randomization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants