Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add bulletproof rewind profiling #3618

Conversation

hansieodendaal
Copy link
Contributor

@hansieodendaal hansieodendaal commented Nov 25, 2021

Description

The following changes were made:

  • Added bulletproof rewind profiling measurements.
  • Improved wallet recovery resiliency with improved connection logic to the selected base node(s).
  • Improved recovering of seed words usability.
  • Improved some struct member names to be less confusing.
  • Improved recovery UTXO streaming performance by tuning the chunking parameter.

Results
The following tests were run using standard Tor connectivity on a laptop with an Intel(R) Core(TM) i7-7820HQ CPU and basic Samsung NVMe storage device.

Number of outputs 52654 52284 52753 52762 52762
Chunk size 10 100 250 125 125
Stream form base node time (ms) 197678 108757 236872 96274 115798
- per output (ms) 3.754 2.080 4.490 1.825 2.195
- normalized 0.836 0.463 1 0.406 0.489
Total processing time (ms) 77157 72653 69005 67170 68563
- per output (ms) 1.465 1.39 1.308 1.273 1.299
- normalized 1 0.949 0.893 0.869 0.887
Bulletproofs rewind time (ms) 10877 11040 10665 10854 10702
- per output (ms) 0.207 0.211 0.202 0.206 0.203
- normalized 0.981 1 0.957 0.976 0.962

Discussion

  • As can be seen, Bulletproof rewind performance was consistently measured at ~0.21ms per output.
  • Total processing time, which includes all database processing and Bulletproof rewinding, averaged at about 1.35ms per output.
  • Optimum streaming performance was found to be ~2ms per output.
  • Output streaming performance was found to be closely related to the chunk size that are requested from the base node. Smaller (10) and larger (250) quantities were the worst performers, while a quantity in the range of 100 to 125 was approximately twice as fast. For this PR a final chunk size value of 125 was chosen.

Motivation and Context

  • Further code improvements can be based on informed decisions using the profiling information as basis.

How Has This Been Tested?

  • Unit testing.
  • Cucumber testing.
  • System-level testing doing wallet recoveries.

Copy link
Contributor

@philipr-za philipr-za left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most these changes look fine, a couple comments though.

@@ -105,7 +105,8 @@ pub async fn wallet_recovery(wallet: &WalletSqlite, base_node_config: &PeerConfi

let mut recovery_task = UtxoScannerService::<WalletSqliteDatabase>::builder()
.with_peers(peer_public_keys)
.with_retry_limit(3)
// Do not make this a small number as wallet recovery needs to be resilient
.with_retry_limit(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this smaller on purpose, if you have gone through your whole peer list 3 times something larger is wrong that trying 97 more times is not going to solve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to connect to a single peer in the list when the base node connection was not responsive posed to be a problem. I think it is much better for the user to stop the recovery process eventually than to restart it every time and hope for a quick connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's a small enough issue that I don't mind it going in but I still think that trying a 100 times when you have tried 3 times across all your peers and that many failed attempts tells you there is something wrong with the peers or your setup and you need to take action.

So I don't mind this going in BUT we use this in the Daily tests for that exact purpose, to tell us there is a problem with recovery. So if you want this to be 100 you need to cater for it in the way the Daily recovery test is launched so that it only has 3 retries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added configurable retry limit with default value 3.

- Added bulletproof rewind profiling measurements.
- Improved wallet recovery resiliency with improved connection logic
  to the selected base node(s).
- Improved recovering of seed words useability.
- Inmproved some struct member names to be less confusing.
- Improved revovery UTXO streaming performance by tuning the chunking
  parameter.
@hansieodendaal hansieodendaal force-pushed the ho_bulletproof_rewind_profiling branch from 64b7bf1 to 754f1e7 Compare November 26, 2021 08:13
@aviator-app aviator-app bot added mq-failed and removed mq-failed labels Nov 26, 2021
@aviator-app aviator-app bot merged commit 5790a9d into tari-project:development Nov 26, 2021
@hansieodendaal hansieodendaal deleted the ho_bulletproof_rewind_profiling branch November 27, 2021 04:41
sdbondi added a commit to sdbondi/tari that referenced this pull request Dec 6, 2021
* development: (29 commits)
  fix(pruned mode)!: prune inputs, allow horizon sync resume and other fixes (tari-project#3521)
  feat!: sending one-sided transactions in wallet_ffi (tari-project#3634)
  fix: use json 5 for tor identity (regression) (tari-project#3624)
  test: add operation_id to log messages (tari-project#3633)
  fix!: multiple monerod addresses in tari merge mining proxy (tari-project#3628)
  fix: get-peer command works with public key again (tari-project#3636)
  fix!: separate peer seeds to common.network (tari-project#3635)
  test: removed stress test log target (tari-project#3631)
  feat: removed transaction validation redundant events (tari-project#3630)
  feat: improve wallet responsiveness (tari-project#3625)
  feat: add bulletproof rewind profiling (tari-project#3618)
  fix!: console wallet grpc_console_wallet_addresss config (tari-project#3619)
  test: increase timeout in cucumber (tari-project#3621)
  chore: change status line (tari-project#3610)
  feat!: add tcp bypass settings for tor in wallet_ffi (tari-project#3615)
  feat: only trigger UTXO scanning when a new block event is received (tari-project#3620)
  feat: implement dht pooled db connection (tari-project#3596)
  feat: add page for detailed mempool in explorer (tari-project#3613)
  chore: add pub key in the dailes notify (tari-project#3612)
  feat: display network for console wallet (tari-project#3611)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants