Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark testing framework #3599

Merged
merged 32 commits into from
Jun 27, 2024
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
9bc6e99
initial implementation of benchmark test suite
shawnh2 Jun 12, 2024
32ecca3
implement benchmark test run
shawnh2 Jun 13, 2024
6ea7c91
add benchmark in ci and spawn a job for benchmark test run
shawnh2 Jun 14, 2024
c2cfb96
fix lint
shawnh2 Jun 14, 2024
83abe3f
fix ci config
shawnh2 Jun 14, 2024
bb8f6ee
save benchmark test result into report
shawnh2 Jun 15, 2024
c9c6be7
add control-plane metrics to benchmark test report
shawnh2 Jun 16, 2024
d4710a3
change httproutes scale number to perform the benchmark test
shawnh2 Jun 16, 2024
63b70bd
increase poll timeout
shawnh2 Jun 17, 2024
2b5b8a1
Merge branch 'main' of github.com:envoyproxy/gateway into benchmark-ci
shawnh2 Jun 18, 2024
baf9be9
add longer timeout for go-test, collect reports in suite and change s…
shawnh2 Jun 18, 2024
88d2204
update resource limits for both envoyproxy and envoygateway pod
shawnh2 Jun 18, 2024
8add6f5
fix github action unit problem
shawnh2 Jun 18, 2024
a7eb9f4
add scale-down test case support
shawnh2 Jun 18, 2024
9314a81
fix according to benchmark-result:
shawnh2 Jun 19, 2024
0e70867
increase memory to 2GiB to see the difference
shawnh2 Jun 19, 2024
32a7c34
return report for every benchmark test run
shawnh2 Jun 19, 2024
6d724a7
right memory setup for github ci
shawnh2 Jun 19, 2024
8ac9073
update report util methods
shawnh2 Jun 20, 2024
526627a
add export benchmark report support
shawnh2 Jun 20, 2024
4d874f2
view benchmark report in github comment
shawnh2 Jun 20, 2024
891d2d4
correct github comment ci
shawnh2 Jun 21, 2024
a0e1497
fix env setting table
shawnh2 Jun 21, 2024
d1a7cde
correct ci args
shawnh2 Jun 21, 2024
5322f0b
add benchmark report
shawnh2 Jun 21, 2024
f31f3f5
simplify the report
shawnh2 Jun 21, 2024
ed07329
add pull_request_target to benchmark report commenter
shawnh2 Jun 25, 2024
4453b48
resolve conflicts
shawnh2 Jun 25, 2024
3fad16c
add envoyproxy metrics report and render support
shawnh2 Jun 25, 2024
19ae3d8
grant benchmark-test job with write access
shawnh2 Jun 25, 2024
970f5f7
upload latest benchmark report and remove commenter ci
shawnh2 Jun 25, 2024
3dc4472
fix lint and address comments
shawnh2 Jun 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions .github/workflows/benchmark.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Benchmarking Tests at Scale
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should we schedule this ci as a cron job or run with every PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or only run this if someone comments /benchmark ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id vote to only make it run on push to main and release/v*

lets raise a follow up issue to support running on PRs automatically (if it doesn't increase CI time) or using /benchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

on:
pull_request:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it to push once this PR is good to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to run this in every push, or on schedule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like suggested #3599 (comment), we can run this with /benchmark command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to run it on pull/push. In general, I like all major testing/linting/etc. CI suites to run on every push even if there is no PR. Makes it easy to get your branches in order without having a PR that goes through a bunch of edits to get things working. I dislike the idea of only ever running it when users comment /benchmark. The general idea should be for CI to alert us when incoming changes degrade (or improve) performance.

branches:
- "main"
- "release/v*"
workflow_dispatch:
inputs:
rps:
description: "The target requests-per-second rate. Default: 1000"
default: '1000'
shawnh2 marked this conversation as resolved.
Show resolved Hide resolved
type: string
required: false
connections:
description: "The maximum allowed number of concurrent connections per event loop. HTTP/1 only. Default: 100."
default: '100'
type: string
required: false
duration:
description: "The number of seconds that the test should run. Default: 90."
default: '90'
type: string
required: false
cpu_limits:
description: "The CPU resource limits for the envoy gateway, in unit 'm'. Default: 1000."
default: '1000'
type: string
required: false
memory_limits:
description: "The memory resource limits for the envoy gateway, in unit 'Mi'. Default: 1024."
default: '1024'
type: string
required: false

jobs:
benchmark-test:
name: Benchmark Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- uses: ./tools/github-actions/setup-deps

- name: Run Benchmark tests
env:
KIND_NODE_TAG: v1.28.0
IMAGE_PULL_POLICY: IfNotPresent
BENCHMARK_RPS: ${{ github.event.inputs.rps || 1000 }}
BENCHMARK_CONNECTIONS: ${{ github.event.inputs.connections || 100 }}
BENCHMARK_DURATION: ${{ github.event.inputs.duration || 90 }}
BENCHMARK_CPU_LIMITS: ${{ github.event.inputs.cpu_limits || 1000 }}
BENCHMARK_MEMORY_LIMITS: ${{ github.event.inputs.memory_limits || 2048 }}
run: make benchmark
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ require (
github.com/peterbourgon/diskv v2.0.1+incompatible // indirect
github.com/pkg/errors v0.9.1
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_model v0.6.1 // indirect
github.com/prometheus/client_model v0.6.1
github.com/prometheus/procfs v0.15.0 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
Expand Down
925 changes: 925 additions & 0 deletions test/benchmark/benchmark_report.md

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions test/benchmark/benchmark_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
// Copyright Envoy Gateway Authors
// SPDX-License-Identifier: Apache-2.0
// The full text of the Apache license is available in the LICENSE file at
// the root of the repo.

//go:build benchmark
// +build benchmark

package benchmark

import (
"flag"
"testing"

"github.com/stretchr/testify/require"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"

"github.com/envoyproxy/gateway/test/benchmark/suite"
"github.com/envoyproxy/gateway/test/benchmark/tests"
)

func TestBenchmark(t *testing.T) {
cfg, err := config.GetConfig()
require.NoError(t, err)

cli, err := client.New(cfg, client.Options{})
require.NoError(t, err)

// Install all the scheme for kubernetes client.
suite.CheckInstallScheme(t, cli)

// Parse benchmark options.
flag.Parse()
options := suite.NewBenchmarkOptions(
*suite.RPS,
*suite.Connections,
*suite.Duration,
*suite.Concurrency,
)

bSuite, err := suite.NewBenchmarkTestSuite(
cli,
options,
"config/gateway.yaml",
"config/httproute.yaml",
"config/nighthawk-client.yaml",
*suite.ReportSavePath,
)
if err != nil {
t.Fatalf("Failed to create BenchmarkTestSuite: %v", err)
}

t.Logf("Running %d benchmark tests", len(tests.BenchmarkTests))
bSuite.Run(t, tests.BenchmarkTests)
}
14 changes: 14 additions & 0 deletions test/benchmark/config/gateway.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: "{GATEWAY_NAME}"
namespace: benchmark-test
spec:
gatewayClassName: envoy-gateway
listeners:
- name: http
port: 8081
protocol: HTTP
allowedRoutes:
namespaces:
from: Same
82 changes: 82 additions & 0 deletions test/benchmark/config/gatewayclass.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
kind: GatewayClass
apiVersion: gateway.networking.k8s.io/v1
metadata:
name: envoy-gateway
spec:
controllerName: gateway.envoyproxy.io/gatewayclass-controller
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: proxy-config
namespace: envoy-gateway-system
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: proxy-config
namespace: envoy-gateway-system
spec:
provider:
type: Kubernetes
kubernetes:
envoyDeployment:
container:
resources:
limits:
memory: "1024Mi"
cpu: "1000m"
requests:
memory: "256Mi"
cpu: "500m"
telemetry:
metrics:
prometheus: {}
sinks:
- type: OpenTelemetry
openTelemetry:
backendRefs:
- name: otel-collector
namespace: monitoring
port: 4317
accessLog:
settings:
- format:
type: Text
text: |
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
sinks:
- type: File
file:
path: /dev/stdout
- type: OpenTelemetry
openTelemetry:
backendRefs:
- name: otel-collector
namespace: monitoring
port: 4317
resources:
k8s.cluster.name: "envoy-gateway"
tracing:
provider:
backendRefs:
- name: otel-collector
namespace: monitoring
port: 4317
customTags:
"k8s.cluster.name":
type: Literal
literal:
value: "envoy-gateway"
"k8s.pod.name":
type: Environment
environment:
name: ENVOY_POD_NAME
defaultValue: "-"
"k8s.namespace.name":
type: Environment
environment:
name: ENVOY_GATEWAY_NAMESPACE
defaultValue: "envoy-gateway-system"
shutdown:
drainTimeout: 5s
minDrainDuration: 1s
22 changes: 22 additions & 0 deletions test/benchmark/config/httproute.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: "{HTTPROUTE_NAME}"
namespace: benchmark-test
spec:
parentRefs:
- name: "{REF_GATEWAY_NAME}"
hostnames:
- "www.benchmark.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also template out the hostname for this test so each HTTPRoute gets a unique hostname
else these routes wont reach Programmed and will bloat Status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make this a bit more realistic and control num-routes-per-host? This way, we don't have one huge route table or many small ones... anyway, not criticial for this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, do it as a follow-up

rules:
- backendRefs:
- group: ""
kind: Service
name: nighthawk-test-server
namespace: benchmark-test
port: 8080
weight: 1
matches:
- path:
type: PathPrefix
value: /
18 changes: 18 additions & 0 deletions test/benchmark/config/nighthawk-client.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
### Nighthawk test client job template
apiVersion: batch/v1
kind: Job
metadata:
name: "{NIGHTHAWK_CLIENT_NAME}"
namespace: benchmark-test
labels:
benchmark-test/client: "true"
spec:
template:
spec:
containers:
- name: nighthawk-client
image: envoyproxy/nighthawk-dev:latest
imagePullPolicy: IfNotPresent
args: ["nighthawk_client"] # Fill-up args at runtime
restartPolicy: Never
backoffLimit: 3
44 changes: 44 additions & 0 deletions test/benchmark/config/nighthawk-test-server-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
static_resources:
listeners:
# define an origin server on :10000 that always returns "lorem ipsum..."
- address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
generate_request_id: false
codec_type: AUTO
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: service
domains:
- "*"
http_filters:
- name: dynamic-delay
typed_config:
"@type": type.googleapis.com/nighthawk.server.DynamicDelayConfiguration
static_delay: 0s
- name: test-server # before envoy.router because order matters!
typed_config:
"@type": type.googleapis.com/nighthawk.server.ResponseOptions
response_body_size: 10
v3_response_headers:
- {header: {key: "foo", value: "bar"}}
- {header: {key: "foo", value: "bar2"}, append: true}
- {header: {key: "x-nh", value: "1"}}
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
dynamic_stats: false
admin:
access_log_path: /tmp/envoy.log
address:
socket_address:
address: 0.0.0.0
port_value: 8081
53 changes: 53 additions & 0 deletions test/benchmark/config/nighthawk-test-server.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
### Nighthawk test server deployment & service
apiVersion: apps/v1
kind: Deployment
metadata:
name: nighthawk-test-server
namespace: benchmark-test
spec:
replicas: 1
selector:
matchLabels:
app: nighthawk-test-server
template:
metadata:
labels:
app: nighthawk-test-server
spec:
serviceAccountName: default
containers:
- name: nighthawk-server
Copy link
Contributor Author

@shawnh2 shawnh2 Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can replace this test server with a much simpler one, like echo server in a follow-up PR.

image: envoyproxy/nighthawk-dev:latest
imagePullPolicy: IfNotPresent
args: ["nighthawk_test_server", "-c", "/etc/test-server-config/nighthawk-test-server-config.yaml"]
ports:
- containerPort: 8080
volumeMounts:
- name: test-server-config
mountPath: "/etc/test-server-config"
env:
- name: PORT
value: "8080"
resources:
requests:
cpu: "2"
limits:
cpu: "2"
volumes:
- name: test-server-config
configMap:
name: test-server-config # Created directly from file
---
apiVersion: v1
kind: Service
metadata:
name: nighthawk-test-server
namespace: benchmark-test
spec:
type: ClusterIP
selector:
app: nighthawk-test-server
ports:
- name: http
port: 8080
targetPort: 8080
32 changes: 32 additions & 0 deletions test/benchmark/suite/client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Copyright Envoy Gateway Authors
// SPDX-License-Identifier: Apache-2.0
// The full text of the Apache license is available in the LICENSE file at
// the root of the repo.

//go:build benchmark
// +build benchmark

package suite

import (
"testing"

"github.com/stretchr/testify/require"
batchv1 "k8s.io/api/batch/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
gwapiv1 "sigs.k8s.io/gateway-api/apis/v1"
gwapiv1a2 "sigs.k8s.io/gateway-api/apis/v1alpha2"
gwapiv1a3 "sigs.k8s.io/gateway-api/apis/v1alpha3"
gwapiv1b1 "sigs.k8s.io/gateway-api/apis/v1beta1"

egv1a1 "github.com/envoyproxy/gateway/api/v1alpha1"
)

func CheckInstallScheme(t *testing.T, c client.Client) {
require.NoError(t, gwapiv1a3.Install(c.Scheme()))
require.NoError(t, gwapiv1a2.Install(c.Scheme()))
require.NoError(t, gwapiv1b1.Install(c.Scheme()))
require.NoError(t, gwapiv1.Install(c.Scheme()))
require.NoError(t, egv1a1.AddToScheme(c.Scheme()))
require.NoError(t, batchv1.AddToScheme(c.Scheme()))
}
Loading
Loading