Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loadtesting metrics, updated #737

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions itest/loadtest/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Description

This directory (`itest/loadtest`) includes all files and data related to running
the loadtesting suite for taproot assets daemon. These tests use the existing
itest framework to run against real external running daemons.

The configuration file needs to be named `loadtest.conf` and must be placed on
the working directory in order for the loadtest executable to detect it. A
sample configuration can be found in `loadtest-sample.conf` which includes all
the fields that are required for the tests to run successfully. This includes
connection credentials for the tapd & lnd nodes, as well as a bitcoind backend.

For further tracking and metrics, a prometheus gateway is configured and used by
the loadtests in order to submit any desired data in-flight.

## Building

To create the loadtest executable run `make build-loadtest`. This will
create a `loadtest` binary in your working directory which you can run, given
that you have a correct `loadtest.conf` in the same directory.

The executable will consult the appropriate fields of `loadtest.conf` and it's
going to run the defined test case with the respective config.

Example: To run a mint loadtest which mints batches of `450` assets we will
define `test-case="mint"` and `mint-test-batch-size=450` in our `loadtest.conf`.

## Using dev-resources docker setup

You can use any kind of external running daemon, as long as it's reachable. The
easiest way to spin up some nodes from scratch for the purpose of the loadtests
is to run the `dev-resources/docker-regtest` setup and use `alice`,
`alice-tapd`, `bob`, `bob-tapd` and the single `bitcoind` instance.
44 changes: 44 additions & 0 deletions itest/loadtest/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,17 @@ type BitcoinConfig struct {
TLSPath string `long:"tlspath" description:"Path to btcd's TLS certificate, if TLS is enabled"`
}

// PrometheusGatewayConfig defines exported config options for connecting to the
// Prometheus PushGateway.
type PrometheusGatewayConfig struct {
// nolint: lll
Enabled bool `long:"enabled" description:"Enable pushing metrics to Prometheus PushGateway"`
// nolint: lll
Host string `long:"host" description:"Prometheus PushGateway host address"`
Port int `long:"port" description:"Prometheus PushGateway port"`
PushURL string
}

// Config holds the main configuration for the performance testing binary.
type Config struct {
// TestCases is a comma separated list of test cases that will be
Expand Down Expand Up @@ -97,6 +108,12 @@ type Config struct {

// TestTimeout is the timeout for each test.
TestTimeout time.Duration `long:"test-timeout" description:"the timeout for each test"`

// PrometheusGateway is the configuration for the Prometheus
// PushGateway.
//
// nolint: lll
PrometheusGateway *PrometheusGatewayConfig `group:"prometheus-gateway" namespace:"prometheus-gateway" description:"Prometheus PushGateway configuration"`
}

// DefaultConfig returns the default configuration for the performance testing
Expand All @@ -120,6 +137,11 @@ func DefaultConfig() Config {
SendType: taprpc.AssetType_COLLECTIBLE,
TestSuiteTimeout: defaultSuiteTimeout,
TestTimeout: defaultTestTimeout,
PrometheusGateway: &PrometheusGatewayConfig{
Enabled: false,
Host: "localhost",
Port: 9091,
},
}
}

Expand Down Expand Up @@ -156,6 +178,28 @@ func LoadConfig() (*Config, error) {
// of it with sane defaults.
func ValidateConfig(cfg Config) (*Config, error) {
// TODO (positiveblue): add validation logic.

// Validate Prometheus PushGateway configuration.
if cfg.PrometheusGateway.Enabled {
gatewayHost := cfg.PrometheusGateway.Host
gatewayPort := cfg.PrometheusGateway.Port

if gatewayHost == "" {
return nil, fmt.Errorf(
"gateway hostname may not be empty",
)
}

if gatewayPort == 0 {
return nil, fmt.Errorf("gateway port is not set")
}

// Construct the endpoint for Prometheus PushGateway.
cfg.PrometheusGateway.PushURL = fmt.Sprintf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this can be part of the config creation? Otherwise, something meant to validate is actually mutating the underlying config.

"%s:%d", gatewayHost, gatewayPort,
)
}

return &cfg, nil
}

Expand Down
49 changes: 49 additions & 0 deletions itest/loadtest/load_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,30 @@ package loadtest

import (
"context"
"fmt"
"testing"
"time"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/push"
"github.com/stretchr/testify/require"
)

var (
testDuration = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a histogram metric. Then we'll be able to do percentile plots, and heat maps, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into this and a histogram won't persist over multiple runs (it will be overwritten across runs). There's some tricks we can try to get the gateway to persist but really not worth the time & diff.

Given the frequency at which we run the loadtests, we can produce a histogram from the GaugeVec with PromQL directly in Grafana. This is not as performant as directly using a histogram, but will give us the same insights on percentiles etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't it persist? Isn't it the same as any other metric. I've never ran into this restriction myself, is it just for push metrics? As we have the histogram metrics for proofs ize in the other PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this.)?

IIUC that means that if the pushgateway is restart before it's scraped, the metrics won't persist, but IIUC we have the system always running.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah in our setup pushgateway should always be alive. I was referring to the part where the client side restarts (by design) and we push a fresh instance of "histogram". See here

By pushing a fresh histogram to the pushgateway we're overwriting the old one, effectively only keeping the values of our last run (last-write-wins)

I'm not sure if a HistogramVec, with unique label per test run, is going to be a fine workaround, less performant than a simple histogram nevertheless

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a community implementation of pushgateway that seems to be solving this issue (haven't tested it)
https://github.com/zapier/prom-aggregation-gateway

Name: "test_duration_seconds",
Help: "Duration of the test execution, in seconds",
},
[]string{"test_name"},
)
)

func init() {
// Register the metric with Prometheus's default registry.
prometheus.MustRegister(testDuration)
}

type testCase struct {
name string
fn func(t *testing.T, ctx context.Context, cfg *Config)
Expand Down Expand Up @@ -48,6 +67,9 @@ func TestPerformance(t *testing.T) {
continue
}

// Record the start time of the test case.
startTime := time.Now()

success := t.Run(tc.name, func(tt *testing.T) {
ctxt, cancel := context.WithTimeout(
ctxt, cfg.TestTimeout,
Expand All @@ -59,6 +81,33 @@ func TestPerformance(t *testing.T) {
if !success {
t.Fatalf("test case %v failed", tc.name)
}

// Calculate the test duration and push metrics if the test case succeeded.
if cfg.PrometheusGateway.Enabled {
duration := time.Since(startTime).Seconds()

timeTag := fmt.Sprintf("%d", time.Now().Unix())

label := tc.name + timeTag

// Update the metric with the test duration.
testDuration.WithLabelValues(label).Set(duration)

t.Logf("Pushing testDuration %v with label %v to gateway", duration, label)

// Create a new pusher to push the metrics.
pusher := push.New(cfg.PrometheusGateway.PushURL, "load_test").
Collector(testDuration)

// Push the metrics to Prometheus PushGateway.
if err := pusher.Add(); err != nil {
t.Logf("Could not push metrics to Prometheus PushGateway: %v",
err)
} else {
t.Logf("Metrics pushed for test case '%s': duration = %v seconds",
tc.name, duration)
}
}
}
}

Expand Down
49 changes: 43 additions & 6 deletions itest/loadtest/loadtest-sample.conf
Original file line number Diff line number Diff line change
@@ -1,19 +1,56 @@
# Network the nodes are connected to
network=regtest

# The name of the test case to run. Example "send" or "mint"
test-case="mint"

# Batch size for mint test
mint-test-batch-size=5

# Number of send operations to perform for send test
send-test-num-sends=5

# Number of assets to send in each send operation for send test
send-test-num-assets=1

# Timeout for the entire test suite
test-suite-timeout=120m

# Timeout for each test
test-timeout=10m

[bitcoin]
bitcoin.host="localhost"
bitcoin.port=18443
bitcoin.user=lightning
bitcoin.password=lightning


[alice]
alice.tapd.name=alice
alice.tapd.host="localhost"
alice.tapd.port=10029
alice.tapd.tlspath=path-to-alice/.tapd/tls.cert
alice.tapd.macpath=path-to-alice/.tapd/data/regtest/admin.macaroon
alice.tapd.port=XXX
alice.tapd.tlspath=/path/to/tls.cert
alice.tapd.macpath=/path/to/admin.macaroon
alice.lnd.name=alice_lnd
alice.lnd.host="localhost"
alice.lnd.port=XXX
alice.lnd.tlspath=/path/to/tls.cert
alice.lnd.macpath=/path/to/admin.macaroon

[bob]
bob.tapd.name=bob
bob.tapd.host="localhost"
bob.tapd.port=10032
bob.tapd.tlspath=path-to-bob/.tapd/tls.cert
bob.tapd.macpath=path-to-bob/.tapd/data/regtest/admin.macaroon
bob.tapd.port=XXX
bob.tapd.tlspath=/path/to/tls.cert
bob.tapd.macpath=/path/to/admin.macaroon
bob.lnd.name=bob_lnd
bob.lnd.host="localhost"
bob.lnd.port=XXX
bob.lnd.tlspath=/path/to/tls.cert
bob.lnd.macpath=/path/to/admin.macaroon

[prometheus-gateway]
prometheus-gateway.enabled=true
prometheus-gateway.host=prometheus-gateway-host
prometheus-gateway.port=9091
Loading