-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seraphis Performance Results #91
Comments
How do those Seraphis "merge", "concise" and "squashed" variants differ? An ELI5 would be something for crypto-challenged people like me :) |
Basically, 'concise' is the plain one, 'merge' is slightly more efficient but you have to sign all inputs at the same time (tx author must own all funds spent by the tx, different from other variants where multiple people can fund a tx), and 'squashed' allows simpler membership proofs at the cost of needing to make a range proof for each input. Squashed can also use the merged composition proof, I just separated 'merge' into its own tx type for comparisons. |
Are there any wallets that allow for this via gui? This seems to be just a theoretical benefit. (multiple people funding a transaction vs multiple transactions just seems to save on fees and make the history a bit cleaner?) |
It is not possible with the current protocol. One example of the technique's use is the BCH crowdfunding system. |
I think Seraphis-Merge for collaborative funding sounds so interesting. |
Seraphis-Merge prevents collaborative funding (which Seraphis-Concise/Seraphis-Squashed can do). It allows a bit smaller tx (96 bytes fewer per tx input). |
@boogerlad @garth-xmr Collaborative funding is very real on the BCH blockchain. Over 9,000 BCH has been contributed to 85 projects through their Flipstarter system. At this point Flipstarter is the main funding mechanism for BCH development. You could almost say that it saved BCH from a devtax (the devtax advocates forked off anyway and their coin is now called eCash). BCH's AnyoneCanPay special transaction type allow this permissionless, noncustodial, and self-hosted funding mechanism. For now, Monero has mostly relied on the CCS funding system, which is good but is also permissioned, centralized, and custodial. @emergent-reasons from BCH may be able to explain more. @plowsof recently sought and received funding through a Flipstarter campaign. |
Old introduction article to Flipstarter. It's fundamentally the same thing as @mikehearn's Lighthouse. It's an entirely self-hosted, open source funding solution that uses trustless assurance contracts (all or nothing) where pledgers' money stays fully within their control until the moment that the funding transaction pulls together all the pledged inputs. I don't want to derail the discussion, so please feel free to contact me on the internet under variations of "emergent_reasons". |
One design concern to keep in mind: Seraphis-Squashed requires range proofs for inputs. This may or may not entail limits on input counts (i.e. the 16-output limit was imposed when Bulletproofs were introduced). Maybe someone can comment on why Bulletproofs led to a 16-output limit. For example, there could be a DDOS vector where a max-in/out transaction causes batch verification to slow down significantly (i.e. greatly increases the average verification cost across a batch). UPDATE: I did some testing and found that per-proof verification improves with batching even if you combine many small-aggregate proofs (e.g. aggregates of 2 proofs) with few large-aggregate proofs (e.g. aggregates of 128 proofs). Basically, batching does not open a DDOS vector. However, it is still necessary to impose a limit on the number of tx inputs, since BP+ has a config limit on the number of proofs you can aggregate. One reasonable limit might be 112 inputs, 16 outputs (for a per-tx maximum of 128, which is a power of 2). UPDATE2: The issue with large aggregations is, if a large proof aggregation (e.g. tx with many inputs) has no other large proofs to batch-verify with, then the per-range-proof verification cost of that large aggregation will be significantly higher than if it could be batched (~3-5x higher). This is the basic reason I investigated range proof splitting (so rare large aggregate proofs can be split into smaller proofs that will benefit more from batching). |
One question that comes to mind is the impact of GPU verification on verification cost per tx especially with large batch sizes 25 txs, 100 txs etc. if a performance improvement of 50x or more over a single core CPU can be achieved, this would be a material improvement that could enable for example a 256 ring size in Seraphis. Here is an example of performance improvements of 28.4x over a 8 core CPU and 8.4x over a 32 core CPU. https://www.dataversity.net/what-are-gpus-and-why-do-data-scientists-love-them/ |
While GPUs might significantly improve verification times, I have a couple concerns.
|
Well that would be a non starter. Can this code be optimized in assembler and any extensions leveraged? |
@UkoeHB Could you clarify this comment?:
Is this assuming use of a single core of a CPU or multiple cores? And if multiple cores, how many? |
@Rucknium Yes that estimate is based on single-threaded verification. |
Results of timing tests on experiments 1-4 on my medium-end-ish core i7 1.8 GHz + 32gb RAM, based on commit a63e39c6604d4ac493acee0f1c923dbefd50cca3. To my eye, the relative efficiency gains seem fairly close. Experiment 1: reference set size (no batching) Fixed parameters: 2-input/2-output, no BP+ splitting, no tx verification batching. Note that the verification plot is logarithmic in the y-axis. Experiment 2: reference set size (25 tx per batch) Fixed parameters: 2-input/2-output, no BP+ splitting, 25 tx per batch (normalized to cost/tx). Note that the verification plot is logarithmic in the y-axis. Experiement 3: reference set size decomposition Fixed parameters: 2-input/2-output, no BP+ splitting, no tx verification batching. Experiment 4: inputs Fixed parameters: 2-output, reference set decomposition 2^8, no BP+ splitting, no tx verification batching.
These results are normalized to 1 input for each protocol type. |
Here are updated test results as of commit There were two optimizations to grootle proofs:
I also added an Note that I removed the BP+-splitting experiments since the prior results suggested it wasn't a worthwhile approach. ResultsExperiment 1: reference set size (no batching)Fixed parameters: 2-input/2-output, no tx verification batching. Note that the verification plot is logarithmic in the y-axis. Experiment 2: reference set size (25 tx per batch)Fixed parameters: 2-input/2-output, 25 tx per batch (normalized to cost/tx). Note that the verification plot is logarithmic in the y-axis. Experiement 3: reference set size decompositionFixed parameters: 2-input/2-output, no tx verification batching. Experiment 4: inputsFixed parameters: 2-output, reference set decomposition 2^7, no tx verification batching.
These results are normalized to 1 input for each protocol type. Experiment 5: 16 in/out batchingFixed parameters: 16 inputs, 16 outputs, decomp 2^7 Discussion
|
UPDATE: See this comment for most current results.
Seraphis Performance Results
Below I display and discuss performance results from several transaction protocol mock-ups (CLSAG, Triptych, Seraphis-Concise, Seraphis-Merge, Seraphis-Squashed), collected during one test run. The purpose of this report is to inform engineering/design decisions around a potential real-world implementation of Seraphis.
Preliminaries
Test Context
The test was run single-threaded on a
zenith2alpha
motherboard with an AMD Ryzen Threadripper 3970X 32-Core processor and 256GB RAM. It was run on my Seraphis perf test branch at commite0620b4f71faa20e69afcac6206a4180102f251d
, and started at2021-11-09 : 18:02:12 UTC
(according to the machine's clock). The test command was./build/Linux/seraphis_perf/release/tests/performance_tests/performance_tests --filter=\*mock_tx\* --stats --loop-multiplier=10 --timings-database=/home/user/seraphis_perf/test_results/perf_3.txt > /home/user/seraphis_perf/test_results/stdout_perftest_3.txt
.Terminology
ref_set_size = n^m
, wheren
is the 'decomposition base'. This detail is relevant to Grootle membership proofs (used in Triptych, Seraphis, Lelantus-Spark), which requiren
andm
to be integers.Test Questions
There were a number of questions I wanted to answer with this test.
n
)?Results
Experiment 1: reference set size (no batching)
Fixed parameters: 2-input/2-output, no BP+ splitting, no tx verification batching. Note that the verification plot is logarithmic in the y-axis.
Experiment 2: reference set size (25 tx per batch)
Fixed parameters: 2-input/2-output, no BP+ splitting, 25 tx per batch (normalized to cost/tx). Note that the verification plot is logarithmic in the y-axis.
Experiement 3: reference set size decomposition
Fixed parameters: 2-input/2-output, no BP+ splitting, no tx verification batching.
Experiment 4: inputs
Fixed parameters: 2-output, reference set decomposition 2^8, no BP+ splitting, no tx verification batching.
These results are normalized to 1 input for each protocol type.
Experiment 5: BP+ splitting (no batching)
Fixed parameters: 2-input, reference set decomposition 2^8, 1 tx per batch (normalized to cost/tx).
Experiment 6: BP+ splitting (25 tx per batch)
Fixed parameters: 2-input, reference set decomposition 2^8, 25 tx per batch (normalized to cost/tx).
Discussion
My key take-aways:
n = 2
orn = 3
. I believe2^6 = 64
,3^4 = 81
, and2^7 = 128
are the best candidates for a reference set size using one of the new variants, taking CLSAG withref_set_size = 16
as a baseline for comparison (16
is likely to be the reference set size after Monero's next hardfork).The text was updated successfully, but these errors were encountered: