-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loadtesting metrics, updated #737
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
## Description | ||
|
||
This directory (`itest/loadtest`) includes all files and data related to running | ||
the loadtesting suite for taproot assets daemon. These tests use the existing | ||
itest framework to run against real external running daemons. | ||
|
||
The configuration file needs to be named `loadtest.conf` and must be placed on | ||
the working directory in order for the loadtest executable to detect it. A | ||
sample configuration can be found in `loadtest-sample.conf` which includes all | ||
the fields that are required for the tests to run successfully. This includes | ||
connection credentials for the tapd & lnd nodes, as well as a bitcoind backend. | ||
|
||
For further tracking and metrics, a prometheus gateway is configured and used by | ||
the loadtests in order to submit any desired data in-flight. | ||
|
||
## Building | ||
|
||
To create the loadtest executable run `make build-loadtest`. This will | ||
create a `loadtest` binary in your working directory which you can run, given | ||
that you have a correct `loadtest.conf` in the same directory. | ||
|
||
The executable will consult the appropriate fields of `loadtest.conf` and it's | ||
going to run the defined test case with the respective config. | ||
|
||
Example: To run a mint loadtest which mints batches of `450` assets we will | ||
define `test-case="mint"` and `mint-test-batch-size=450` in our `loadtest.conf`. | ||
|
||
## Using dev-resources docker setup | ||
|
||
You can use any kind of external running daemon, as long as it's reachable. The | ||
easiest way to spin up some nodes from scratch for the purpose of the loadtests | ||
is to run the `dev-resources/docker-regtest` setup and use `alice`, | ||
`alice-tapd`, `bob`, `bob-tapd` and the single `bitcoind` instance. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,11 +4,30 @@ package loadtest | |
|
||
import ( | ||
"context" | ||
"fmt" | ||
"testing" | ||
"time" | ||
|
||
"github.com/prometheus/client_golang/prometheus" | ||
"github.com/prometheus/client_golang/prometheus/push" | ||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
var ( | ||
testDuration = prometheus.NewGaugeVec( | ||
prometheus.GaugeOpts{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make this a histogram metric. Then we'll be able to do percentile plots, and heat maps, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I looked into this and a histogram won't persist over multiple runs (it will be overwritten across runs). There's some tricks we can try to get the gateway to persist but really not worth the time & diff. Given the frequency at which we run the loadtests, we can produce a histogram from the GaugeVec with PromQL directly in Grafana. This is not as performant as directly using a histogram, but will give us the same insights on percentiles etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why wouldn't it persist? Isn't it the same as any other metric. I've never ran into this restriction myself, is it just for push metrics? As we have the histogram metrics for proofs ize in the other PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you referring to this.)? IIUC that means that if the pushgateway is restart before it's scraped, the metrics won't persist, but IIUC we have the system always running. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah in our setup pushgateway should always be alive. I was referring to the part where the client side restarts (by design) and we push a fresh instance of "histogram". See here By pushing a fresh histogram to the pushgateway we're overwriting the old one, effectively only keeping the values of our last run (last-write-wins) I'm not sure if a HistogramVec, with unique label per test run, is going to be a fine workaround, less performant than a simple histogram nevertheless There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also a community implementation of pushgateway that seems to be solving this issue (haven't tested it) |
||
Name: "test_duration_seconds", | ||
Help: "Duration of the test execution, in seconds", | ||
}, | ||
[]string{"test_name"}, | ||
) | ||
) | ||
|
||
func init() { | ||
// Register the metric with Prometheus's default registry. | ||
prometheus.MustRegister(testDuration) | ||
} | ||
|
||
type testCase struct { | ||
name string | ||
fn func(t *testing.T, ctx context.Context, cfg *Config) | ||
|
@@ -48,6 +67,9 @@ func TestPerformance(t *testing.T) { | |
continue | ||
} | ||
|
||
// Record the start time of the test case. | ||
startTime := time.Now() | ||
|
||
success := t.Run(tc.name, func(tt *testing.T) { | ||
ctxt, cancel := context.WithTimeout( | ||
ctxt, cfg.TestTimeout, | ||
|
@@ -59,6 +81,33 @@ func TestPerformance(t *testing.T) { | |
if !success { | ||
t.Fatalf("test case %v failed", tc.name) | ||
} | ||
|
||
// Calculate the test duration and push metrics if the test case succeeded. | ||
if cfg.PrometheusGateway.Enabled { | ||
duration := time.Since(startTime).Seconds() | ||
|
||
timeTag := fmt.Sprintf("%d", time.Now().Unix()) | ||
|
||
label := tc.name + timeTag | ||
|
||
// Update the metric with the test duration. | ||
testDuration.WithLabelValues(label).Set(duration) | ||
|
||
t.Logf("Pushing testDuration %v with label %v to gateway", duration, label) | ||
|
||
// Create a new pusher to push the metrics. | ||
pusher := push.New(cfg.PrometheusGateway.PushURL, "load_test"). | ||
Collector(testDuration) | ||
|
||
// Push the metrics to Prometheus PushGateway. | ||
if err := pusher.Add(); err != nil { | ||
t.Logf("Could not push metrics to Prometheus PushGateway: %v", | ||
err) | ||
} else { | ||
t.Logf("Metrics pushed for test case '%s': duration = %v seconds", | ||
tc.name, duration) | ||
} | ||
} | ||
} | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,19 +1,56 @@ | ||
# Network the nodes are connected to | ||
network=regtest | ||
|
||
# The name of the test case to run. Example "send" or "mint" | ||
test-case="mint" | ||
|
||
# Batch size for mint test | ||
mint-test-batch-size=5 | ||
|
||
# Number of send operations to perform for send test | ||
send-test-num-sends=5 | ||
|
||
# Number of assets to send in each send operation for send test | ||
send-test-num-assets=1 | ||
|
||
# Timeout for the entire test suite | ||
test-suite-timeout=120m | ||
|
||
# Timeout for each test | ||
test-timeout=10m | ||
|
||
[bitcoin] | ||
bitcoin.host="localhost" | ||
bitcoin.port=18443 | ||
bitcoin.user=lightning | ||
bitcoin.password=lightning | ||
|
||
|
||
[alice] | ||
alice.tapd.name=alice | ||
alice.tapd.host="localhost" | ||
alice.tapd.port=10029 | ||
alice.tapd.tlspath=path-to-alice/.tapd/tls.cert | ||
alice.tapd.macpath=path-to-alice/.tapd/data/regtest/admin.macaroon | ||
alice.tapd.port=XXX | ||
alice.tapd.tlspath=/path/to/tls.cert | ||
alice.tapd.macpath=/path/to/admin.macaroon | ||
alice.lnd.name=alice_lnd | ||
alice.lnd.host="localhost" | ||
alice.lnd.port=XXX | ||
alice.lnd.tlspath=/path/to/tls.cert | ||
alice.lnd.macpath=/path/to/admin.macaroon | ||
|
||
[bob] | ||
bob.tapd.name=bob | ||
bob.tapd.host="localhost" | ||
bob.tapd.port=10032 | ||
bob.tapd.tlspath=path-to-bob/.tapd/tls.cert | ||
bob.tapd.macpath=path-to-bob/.tapd/data/regtest/admin.macaroon | ||
bob.tapd.port=XXX | ||
bob.tapd.tlspath=/path/to/tls.cert | ||
bob.tapd.macpath=/path/to/admin.macaroon | ||
bob.lnd.name=bob_lnd | ||
bob.lnd.host="localhost" | ||
bob.lnd.port=XXX | ||
bob.lnd.tlspath=/path/to/tls.cert | ||
bob.lnd.macpath=/path/to/admin.macaroon | ||
|
||
[prometheus-gateway] | ||
prometheus-gateway.enabled=true | ||
prometheus-gateway.host=prometheus-gateway-host | ||
prometheus-gateway.port=9091 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this can be part of the config creation? Otherwise, something meant to validate is actually mutating the underlying config.