Create benchmark scripts for demo-rollup running in native mode (#458)

* bench stuff * temp stash * revert main changes, check in benches * cleanup * basic Makefile * cleanup, readme, makefile * Add docs for env vars Makefile * fix rpc test (missing height added for benchmark in TestBlob) * cleanup and changes * add new file benches/test_helper.rs * add back missing criterion dependency * make lint fix * revert test_data json files to pass tests * add temporary directory instead of cleanup/reuse for rollup state and ledger * lint fix * remove unused function impls for bench da service * fixing post-merge linting --------- Co-authored-by: dubbelosix <[email protected]>
Sovereign-Labs · Jul 5, 2023 · 1fff0a3 · 1fff0a3
1 parent f76a52a
commit 1fff0a3
Show file tree

Hide file tree

Showing 12 changed files with 1,109 additions and 2 deletions.
diff --git a/examples/demo-rollup/Cargo.toml b/examples/demo-rollup/Cargo.toml
@@ -39,11 +39,22 @@ sov-modules-api = { path = "../../module-system/sov-modules-api", features = ["n
 sov-state = { path = "../../module-system/sov-state", features = ["native"] }
 const-rollup-config = { path = "../const-rollup-config" }
 
-
 [dev-dependencies]
 sha2 = { workspace = true }
 reqwest = "0.11"
 tendermint = "0.32"
 tempfile = { workspace = true }
 proptest = { workspace = true }
+clap = { workspace = true }
 sov-rollup-interface = { path = "../../rollup-interface", features = ["fuzzing"] }
+prometheus = "0.11.0"
+prettytable-rs = "^0.10"
+criterion = "0.5.1"
+
+[[bench]]
+name = "rollup_bench"
+harness = false
+
+[[bench]]
+name = "rollup_coarse_measure"
+harness = false
diff --git a/examples/demo-rollup/benches/Makefile b/examples/demo-rollup/benches/Makefile
@@ -0,0 +1,34 @@
+# Default values for num blocks and transactions per block
+BLOCKS ?= 100
+TXNS_PER_BLOCK ?= 10000
+
+export BLOCKS
+export TXNS_PER_BLOCK
+
+criterion:
+	@echo "Running criterion bench with $(TXNS_PER_BLOCK) transactions per block"
+	@echo "Method: Criterion"
+	@echo "Output: Criterion"
+	@cd .. && cargo bench --bench rollup_bench
+
+basic:
+	@echo "Running basic benchmark with $(BLOCKS) blocks and $(TXNS_PER_BLOCK) transactions per block"
+	@echo "Method: Coarse Timers"
+	@echo "Output: Standard"
+	@cd .. && cargo bench --bench rollup_coarse_measure
+
+prometheus:
+	@echo "Running basic benchmark with $(BLOCKS) blocks and $(TXNS_PER_BLOCK) transactions per block"
+	@echo "Method: Coarse Timers"
+	@echo "Output: Prometheus"
+	@cd .. && PROMETHEUS=1 cargo bench --bench rollup_coarse_measure
+
+flamegraph:
+	@echo "Running basic benchmark with $(BLOCKS) blocks and $(TXNS_PER_BLOCK) transactions per block"
+	@echo "Method: Coarse Timers"
+	@echo "Output: Flamegraph"
+	@echo "WARNING: Flamegraph requires sudo. The Makefile does cleanup, but there is a unforeseen risk of files being owned by root after the script is done. The Makefile also does full cleanup so subsequent builds with default user will be from scratch."
+	@read -p "Proceed (y/n): " REPLY; if [ $$REPLY = "y" ]; then \
+		cd .. && sudo BLOCKS=$(BLOCKS) TXNS_PER_BLOCK=$(TXNS_PER_BLOCK) cargo flamegraph -o benches/flamegraph.svg --bench rollup_coarse_measure && sudo rm -rf benches/demo_data ; \
+		sudo rm -rf ../../../target ; \
+	fi
diff --git a/examples/demo-rollup/benches/README.md b/examples/demo-rollup/benches/README.md
@@ -0,0 +1,64 @@
+# Native Benchmarks
+Native benchmarks refer to the performance of the rollup SDK in native mode - this does not involve proving
+## Methodology
+* We use the Bank module's Transfer call as the main transaction for running this benchmark. So what we're measuring is the number of value transfers can be done per second. 
+* We do not connect to the DA layer since that will be the bottleneck if we do. We pre-populate 10 blocks (configurable via env var BLOCKS) with 1 blob each containing 10,000 transactions each (configurable via env var TXNS_PER_BLOCK). 
+* The first block only contains a "CreateToken" transaction. Subsequent blocks contain "Transfer" transactions.
+* All token transfers are initiated from the created token's mint address
+
+We use two scripts for benchmarking:
+* **rollup_bench.rs**: This makes use of the rust criterion benchmarking framework. 
+  * One issue with this is that most benching frameworks are focused on micro-benchmarks for pure functions. 
+  * To get a true estimate of TPS we need to write to disk and this has a side effect for the bench framework and when it tries executing the same writes.
+  * Bench frameworks (criterion, glassbench) take an iterator as an argument, and we cannot control the number of iterations directly. The framework chooses the sampling and the number of iterations.
+  * Giving the entire rollup loop (for all the blocks) to criterion would require a cleanup of the data or using a new data destination for each iteration.
+  * To get around the above problems, we pre-generate a "large" number of blocks and set the measurement time bounds for criterion to 20 seconds. Instead of having a loop from block_0 to block_n, we let criterion choose how many blocks to process.
+  * The output of the framework is the mean time for processing a single block (containing the configured number of transactions)
+```
+Benchmarking rollup main loop
+Benchmarking rollup main loop: Warming up for 3.0000 s
+Benchmarking rollup main loop: Collecting 10 samples in estimated 24.220 s (20 iterations)
+Benchmarking rollup main loop: Analyzing
+rollup main loop        time:   [2.5035 s 2.7001 s 2.9122 s]
+Found 1 outliers among 10 measurements (10.00%)
+  1 (10.00%) high mild
+```
+* **rollup_coarse_measure.rs**
+  * This script uses coarse grained timers (with std::time) to measure the time taken to process all the pre-generated blocks.
+  * We can control the number of blocks and transactions per block with environment variables
+  * There are timers around the main loop for a total measurement, as well as timers around key functions
+    * begin_slot
+    * apply_blob
+    * end_slot
+  * The script uses rust lib prettytable-rs to format the output in a readable way
+  * Optionally, the script also allows generating prometheus metrics (histogram), so they can be aggregated by other tools.
+```
++--------------------+--------------------+
+| Blocks             | 100                |
++--------------------+--------------------+
+| Txns per Block     | 10000              |
++--------------------+--------------------+
+| Total              | 292.819598958s     |
++--------------------+--------------------+
+| Begin slot         | 39.414µs           |
++--------------------+--------------------+
+| End slot           | 243.091403746s     |
++--------------------+--------------------+
+| Apply Blob         | 46.639351922s      |
++--------------------+--------------------+
+| Txns per sec (TPS) | 3424.6575342465753 |
++--------------------+--------------------+
+```
+
+# Makefile
+We abstract having to manually run the benchmarks by using a Makefile for the common benchmarks we want to run
+
+The Makefile is located in the demo-rollup/benches folder and supports the following commands
+* **make criterion** - generates the criterion benchmark using rollup_bench.rs
+* **make basic** - supports the coarse grained timers (getting the TPS) using rollup_coarse_measure.rs
+* **make prometheus** - runs rollup_coarse_measure.rs but instead of aggregating std::time directly and printing in a table, it outputs a json containing histogram metrics populated by the script
+* **make flamegraph** - runs `cargo flamegraph`. On mac this requires `sudo` permissions. The script ensures some cleanup and to err on the side of caution, it deletes the `sovereign/target` folder since new artifacts can be owned by root
+
+The Makefile supports setting number of blocks and transactions per block using BLOCKS and TXNS_PER_BLOCK env vars. Defaults are 100 blocks and 10,000 transactions per block when using the Makefile
+
+![Flamgraph](flamegraph_sample.svg)