From 41e090af7efd50540e6bdc7531e3236c75ef3e05 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Mon, 18 Mar 2024 14:36:49 -0600 Subject: [PATCH 1/4] Reconfig test results for 1.2.0 Add reconfig test results for the 1.2.0 release --- tests/reconfig/results/1.2.0/1.2.0.md | 87 +++++++++++++++++++++++++ tests/reconfig/scripts/cafe-routes.yaml | 2 +- 2 files changed, 88 insertions(+), 1 deletion(-) create mode 100644 tests/reconfig/results/1.2.0/1.2.0.md diff --git a/tests/reconfig/results/1.2.0/1.2.0.md b/tests/reconfig/results/1.2.0/1.2.0.md new file mode 100644 index 0000000000..f0752ec068 --- /dev/null +++ b/tests/reconfig/results/1.2.0/1.2.0.md @@ -0,0 +1,87 @@ +# Reconfiguration testing Results + + +- [Reconfiguration testing Results](#reconfiguration-testing-results) + - [Summary](#summary) + - [Test environment](#test-environment) + - [Results Tables](#results-tables) + - [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready) + - [Event Batch Processing](#event-batch-processing) + - [NumResources to Total Resources](#numresources-to-total-resources) + - [Observations](#observations) + - [Future Improvements](#future-improvements) + + +## Summary + +- Time to ready stayed consistent, if not slightly faster. +- Reload time has slightly increased in some instances. +- Number of batch events has reduced, subsequently increasing the size and time of each batch. + +## Test environment + +GKE cluster: + +- Node count: 3 +- Instance Type: e2-medium +- k8s version: 1.27.8-gke.1067004 +- Zone: us-west2-a +- Total vCPUs: 6 +- Total RAM: 12GB +- Max pods per node: 110 + +NGF deployment: + +- NGF version: edge - git commit 96a44240d317875406a8aef8fd1e424f2fb906eb +- NGINX Version: 1.25.4 + +## Results Tables + +> Note: After fixing the `cafe-tls-redirect` to point to the proper Gateway, tests that created 150 namespaces failed due to https://github.com/nginxinc/nginx-gateway-fabric/issues/1107. Therefore, those tests were re-run after reverting the `cafe-tls-redirect` issue to maintain consistency with the previous release tests. Going forward, results should look different once the above bug is fixed. + +### NGINX Reloads and Time to Ready + +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 2 | <1 | 2 | 189.5 | 100% | 100% | +| 1 | 150 | 2 | <1 | 2 | 389 | 100% | 100% | +| 2 | 30 | 30 | <1 | 94 | 161 | 100% | 100% | +| 2 | 150 | 154 | <1 | 387 | 267.48 | 100% | 100% | +| 3 | 30 | <1 | <1 | 94 | 127.91 | 100% | 100% | +| 3 | 150 | <1 | <1 | 454 | 128 | 100% | 100% | + +### Event Batch Processing + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------| +| 1 | 30 | 5 | 733.6 | 80% | 80% | 100% | 100% | 100% | +| 1 | 150 | 5 | 2967 | 40% | 40% | 40% | 40% | 40% | +| 2 | 30 | 371 | 57.32 | 100% | 100% | 100% | 100% | 100% | +| 2 | 150 | 1743 | 75.87 | 98.45% | 100% | 100% | 100% | 100% | +| 3 | 30 | 370 | 37.48 | 99.73% | 99.73% | 100% | 100% | 100% | +| 3 | 150 | 1808 | 40.18 | 99.94% | 99.94% | 99.94% | 99.94% | 100% | + +## NumResources to Total Resources + +| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Attached HTTPRoutes | Total Resources | +|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|---------------------|-----------------| +| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | 2x | | +| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 60 | 244 | +| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 300 | 1204 | + +> Note: Only 2x HTTPRoutes attach to the Gateway because the parentRef name in the `cafe-tls-redirect` HTTPRoute is incorrect. This has been fixed, but until https://github.com/nginxinc/nginx-gateway-fabric/issues/1107 is fixed we can't actually run the test successfully. + +## Observations + +1. Reload time seems to have a increased slightly in a few instances, though time to ready is consistent if not faster. + +2. We appear to be processing fewer batches, especially for Test 1 where the resources exist before NGF. This subsequently increased the time to process each +batch since each batch was much larger. I don't think we changed any of this logic recently, so not really sure why it's happening. Test 1 number seems very different +from past results, while the other tests are only slightly different from past results. + +3. No errors in the logs. + + +## Future Improvements + +Fix https://github.com/nginxinc/nginx-gateway-fabric/issues/1107 to allow for 150 resource tests to properly run. diff --git a/tests/reconfig/scripts/cafe-routes.yaml b/tests/reconfig/scripts/cafe-routes.yaml index f4d9823da9..006a8eba92 100644 --- a/tests/reconfig/scripts/cafe-routes.yaml +++ b/tests/reconfig/scripts/cafe-routes.yaml @@ -4,7 +4,7 @@ metadata: name: cafe-tls-redirect spec: parentRefs: - - name: gateway.networking.k8s.io/v1 + - name: gateway namespace: default sectionName: http hostnames: From ac0a18b3ec81da340071be4d3c2ad786944c85e1 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Mon, 18 Mar 2024 14:56:06 -0600 Subject: [PATCH 2/4] Clear up summary --- tests/reconfig/results/1.2.0/1.2.0.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tests/reconfig/results/1.2.0/1.2.0.md b/tests/reconfig/results/1.2.0/1.2.0.md index f0752ec068..fcb0f71f8a 100644 --- a/tests/reconfig/results/1.2.0/1.2.0.md +++ b/tests/reconfig/results/1.2.0/1.2.0.md @@ -16,7 +16,7 @@ - Time to ready stayed consistent, if not slightly faster. - Reload time has slightly increased in some instances. -- Number of batch events has reduced, subsequently increasing the size and time of each batch. +- Number of batch events has reduced, subsequently increasing the average time of each batch. ## Test environment @@ -75,9 +75,7 @@ NGF deployment: 1. Reload time seems to have a increased slightly in a few instances, though time to ready is consistent if not faster. -2. We appear to be processing fewer batches, especially for Test 1 where the resources exist before NGF. This subsequently increased the time to process each -batch since each batch was much larger. I don't think we changed any of this logic recently, so not really sure why it's happening. Test 1 number seems very different -from past results, while the other tests are only slightly different from past results. +2. Processing fewer batches overall due to improvements in resource event tracking. Overall processing time didn't change much, so the average increased due to fewer batches. 3. No errors in the logs. From 5dcc9a17b06209592a5ecbf2089e8dde9086f948 Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Mon, 18 Mar 2024 14:58:25 -0600 Subject: [PATCH 3/4] Fix bug phrasing --- tests/reconfig/results/1.2.0/1.2.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/reconfig/results/1.2.0/1.2.0.md b/tests/reconfig/results/1.2.0/1.2.0.md index fcb0f71f8a..ac0a604be2 100644 --- a/tests/reconfig/results/1.2.0/1.2.0.md +++ b/tests/reconfig/results/1.2.0/1.2.0.md @@ -37,7 +37,7 @@ NGF deployment: ## Results Tables -> Note: After fixing the `cafe-tls-redirect` to point to the proper Gateway, tests that created 150 namespaces failed due to https://github.com/nginxinc/nginx-gateway-fabric/issues/1107. Therefore, those tests were re-run after reverting the `cafe-tls-redirect` issue to maintain consistency with the previous release tests. Going forward, results should look different once the above bug is fixed. +> Note: After fixing the `cafe-tls-redirect` to point to the proper Gateway, tests that created 450 HTTPRoutes failed due to https://github.com/nginxinc/nginx-gateway-fabric/issues/1107. Therefore, those tests were re-run after reverting the `cafe-tls-redirect` issue to maintain consistency with the previous release tests. Going forward, results should look different once the above bug is fixed. ### NGINX Reloads and Time to Ready From c98da1ec0fcd79e76cf7d6add6b4616a851264aa Mon Sep 17 00:00:00 2001 From: Saylor Berman Date: Mon, 18 Mar 2024 16:04:02 -0600 Subject: [PATCH 4/4] Add plus results --- tests/reconfig/results/1.2.0/1.2.0.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/tests/reconfig/results/1.2.0/1.2.0.md b/tests/reconfig/results/1.2.0/1.2.0.md index ac0a604be2..543c40e6ab 100644 --- a/tests/reconfig/results/1.2.0/1.2.0.md +++ b/tests/reconfig/results/1.2.0/1.2.0.md @@ -33,14 +33,17 @@ GKE cluster: NGF deployment: - NGF version: edge - git commit 96a44240d317875406a8aef8fd1e424f2fb906eb -- NGINX Version: 1.25.4 +- NGINX OSS Version: 1.25.4 +- NGINX Plus Version: R31 ## Results Tables -> Note: After fixing the `cafe-tls-redirect` to point to the proper Gateway, tests that created 450 HTTPRoutes failed due to https://github.com/nginxinc/nginx-gateway-fabric/issues/1107. Therefore, those tests were re-run after reverting the `cafe-tls-redirect` issue to maintain consistency with the previous release tests. Going forward, results should look different once the above bug is fixed. +> Note: After fixing the `cafe-tls-redirect` to point to the proper Gateway, tests that created 450 HTTPRoutes failed due to https://github.com/nginxinc/nginx-gateway-fabric/issues/1107. Therefore, those tests were re-run after reverting the `cafe-tls-redirect` issue to maintain consistency with the previous release tests. Going forward, results should look different once the above bug is fixed. Added N+ tests, but without testing 150 since it has the bug mentioned above. ### NGINX Reloads and Time to Ready +#### OSS + | Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | |-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| | 1 | 30 | 2 | <1 | 2 | 189.5 | 100% | 100% | @@ -50,8 +53,18 @@ NGF deployment: | 3 | 30 | <1 | <1 | 94 | 127.91 | 100% | 100% | | 3 | 150 | <1 | <1 | 454 | 128 | 100% | 100% | +#### Plus + +| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms | +|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------| +| 1 | 30 | 1 | <1 | 2 | 151.5 | 100% | 100% | +| 2 | 30 | 30 | <1 | 94 | 157 | 100% | 100% | +| 3 | 30 | <1 | <1 | 94 | 128 | 100% | 100% | + ### Event Batch Processing +#### OSS + | Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms | |-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------| | 1 | 30 | 5 | 733.6 | 80% | 80% | 100% | 100% | 100% | @@ -61,6 +74,14 @@ NGF deployment: | 3 | 30 | 370 | 37.48 | 99.73% | 99.73% | 100% | 100% | 100% | | 3 | 150 | 1808 | 40.18 | 99.94% | 99.94% | 99.94% | 99.94% | 100% | +#### Plus + +| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms | +|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------| +| 1 | 30 | 3 | 1170 | 66% | 66% | 100% | 100% | 100% | +| 2 | 30 | 370 | 58.79 | 100% | 100% | 100% | 100% | 100% | +| 3 | 30 | 370 | 41.32 | 99.73% | 99.73% | 100% | 100% | 100% | + ## NumResources to Total Resources | NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Attached HTTPRoutes | Total Resources |