tpu-client: Speed up performance by awaiting all futures at once #32945

joncinque · 2023-08-22T23:24:08Z

Problem

This is a less invasive version of #32844, which aims to solve the regression of slow deployments in 1.16.

Summary of Changes

5672758: Rather than detaching tasks and risking overloading the runtime, the solution here is to create all of the sending futures right from the start and await them all together. The performance is here is roughly the same as #32844, which is 2.5-3.5 seconds for 100 self-transfers.

cb76257: On top of that, I investigated the error found at #32844 (comment) -- if we can't connect to a node for some reason, there are lengthy timeouts and retries. This makes sense for validators who are trying to maintain a persistent connection to the network, but for users, they just want to fire off their messages. So as a simple fix, we timeout the future, which solves the issue.

Once this lands, we can look into doing something similar with the work done in #32638

Fixes #32595

t-nelson · 2023-08-22T23:45:06Z

tpu-client/src/nonblocking/tpu_client.rs

+                    }
+
+                    // Send them all at once
+                    let results = join_all(futures).await;


can we transpose all those timeout wrappers by just timeout-wrapping the JoinAll future here, or do we want the granularity?

We want the granularity unfortunately.

We're mimicking the old behavior by waiting 10ms between each send, and that timeout encompasses the waiting too. So for example, if we set a global timeout of 5 seconds, and we want to send 800 transactions, the last 300 transactions will never get sent since they'll timeout just waiting their turn.

codecov · 2023-08-23T00:26:33Z

Codecov Report

Merging #32945 (8550207) into master (e1972f0) will decrease coverage by 0.1%.
The diff coverage is 0.0%.

@@            Coverage Diff            @@
##           master   #32945     +/-   ##
=========================================
- Coverage    82.0%    81.9%   -0.1%     
=========================================
  Files         784      784             
  Lines      212534   212597     +63     
=========================================
- Hits       174370   174328     -42     
- Misses      38164    38269    +105

KirillLykov · 2023-08-23T10:07:03Z

tpu-client/src/nonblocking/tpu_client.rs

+                    }
+
+                    // Send them all at once
+                    let results = join_all(futures).await;


Since we send all the transactions at the same time won't it create spiky traffic?

We don't actually send them all at once! We wait the proper amount of time before each round of sending, just like before. The gain is that we await all of the futures at once, so if one leader takes a long time for the send, it doesn't hold up the rest of the transactions

... but I can see why that's confusing given the comment, I'll update it

KirillLykov · 2023-08-23T12:01:00Z

I've tested this new implementation and results look slower than before [1m11s, 2m48s, 2m59s, 1m14s] (#32595 (comment))

I thought maybe it is wrong time for testnet and tried the same with previous PR, it showed time ~26s.

I used the following procedure to deploy:

#!/usr/bin/bash
SPL_DIR="../solana-program-library"
SO_FILE="${SPL_DIR}/target/deploy/spl_governance.so"
PROGRAM_KEYPAIR="test-program-id.json"
CLUSTER_ARG="-ut"
SOLANA="target/release/solana"

# Check if so file is there
if [ ! -f ${SO_FILE} ]; then
    echo '${SO_FILE} was not found, build it'
    cd "${SPL_DIR}/governance"
    cargo build-sbf
    cd -
    # our file is ./target/deploy/spl_governance.so
fi

${SOLANA} ${CLUSTER_ARG} balance
du -sh ${SO_FILE}

${SOLANA} ${CLUSTER_ARG} --version
if [ ! -f ${PROGRAM_KEYPAIR} ]; then
    echo '${PROGRAM_KEYPAIR} was not found, generate it'
    solana-keygen new --no-bip39-passphrase -o ${PROGRAM_KEYPAIR}
fi
time ${SOLANA} ${CLUSTER_ARG} program deploy --program-id ${PROGRAM_KEYPAIR} ${SO_FILE}

joncinque · 2023-08-23T12:11:28Z

I've tested this new implementation and results look slower than before [1m11s, 2m48s, 2m59s, 1m14s] (#32595 (comment))

Ah, that's because program deploy uses the new method now on master, which still has slow sends. I wanted to address that side separately, since backporting this will fix 1.16

KirillLykov · 2023-08-23T12:28:30Z

I've tested this new implementation and results look slower than before [1m11s, 2m48s, 2m59s, 1m14s] (#32595 (comment))

Ah, that's because program deploy uses the new method now on master, which still has slow sends. I wanted to address that side separately, since backporting this will fix 1.16

Right, if I modify cli/src/program.rs to use the old method to deploy program, I get [0m58s, 0m23s, 0m36s, 0m21s]

) * tpu-client: Await all futures at once * Add timeout when sending to not waste time on down nodes * Update comment to make it clearer that we're not spiking (cherry picked from commit b42249f)

…ce (backport of #32945) (#32972) * tpu-client: Speed up performance by awaiting all futures at once (#32945) * tpu-client: Await all futures at once * Add timeout when sending to not waste time on down nodes * Update comment to make it clearer that we're not spiking (cherry picked from commit b42249f) * Add additional trait bounds required in 1.16 --------- Co-authored-by: Jon Cinque <[email protected]>

joncinque requested review from t-nelson, KirillLykov and lijunwangs August 22, 2023 23:24

joncinque mentioned this pull request Aug 22, 2023

tpu-client: Speed up performance by detaching tasks #32844

Closed

t-nelson reviewed Aug 22, 2023

View reviewed changes

joncinque mentioned this pull request Aug 23, 2023

rpc-client: Encapsulate set_message_for_confirmed_transactions #32941

Merged

joncinque added 2 commits August 23, 2023 11:47

tpu-client: Await all futures at once

5672758

Add timeout when sending to not waste time on down nodes

cb76257

joncinque force-pushed the tpufaster branch from 3b3b2be to cb76257 Compare August 23, 2023 09:47

KirillLykov reviewed Aug 23, 2023

View reviewed changes

Update comment to make it clearer that we're not spiking

8550207

KirillLykov mentioned this pull request Aug 23, 2023

Solana program deploys on devnet with Solana 1.16 are slow #32595

Closed

KirillLykov approved these changes Aug 23, 2023

View reviewed changes

joncinque added the v1.16 PRs that should be backported to v1.16 label Aug 24, 2023

joncinque merged commit b42249f into solana-labs:master Aug 24, 2023

joncinque deleted the tpufaster branch August 24, 2023 11:04

mergify bot mentioned this pull request Aug 24, 2023

v1.16: tpu-client: Speed up performance by awaiting all futures at once (backport of #32945) #32972

Merged

godmodegalactus mentioned this pull request Aug 29, 2023

Making default program deployment on RPC instead of QUIC #32631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tpu-client: Speed up performance by awaiting all futures at once #32945

tpu-client: Speed up performance by awaiting all futures at once #32945

joncinque commented Aug 22, 2023 •

edited

Loading

t-nelson Aug 22, 2023

joncinque Aug 22, 2023 •

edited

Loading

codecov bot commented Aug 23, 2023 •

edited

Loading

KirillLykov Aug 23, 2023

joncinque Aug 23, 2023

joncinque Aug 23, 2023

KirillLykov commented Aug 23, 2023 •

edited

Loading

joncinque commented Aug 23, 2023

KirillLykov commented Aug 23, 2023

tpu-client: Speed up performance by awaiting all futures at once #32945

tpu-client: Speed up performance by awaiting all futures at once #32945

Conversation

joncinque commented Aug 22, 2023 • edited Loading

Problem

Summary of Changes

t-nelson Aug 22, 2023

Choose a reason for hiding this comment

joncinque Aug 22, 2023 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Aug 23, 2023 • edited Loading

Codecov Report

KirillLykov Aug 23, 2023

Choose a reason for hiding this comment

joncinque Aug 23, 2023

Choose a reason for hiding this comment

joncinque Aug 23, 2023

Choose a reason for hiding this comment

KirillLykov commented Aug 23, 2023 • edited Loading

joncinque commented Aug 23, 2023

KirillLykov commented Aug 23, 2023

joncinque commented Aug 22, 2023 •

edited

Loading

joncinque Aug 22, 2023 •

edited

Loading

codecov bot commented Aug 23, 2023 •

edited

Loading

KirillLykov commented Aug 23, 2023 •

edited

Loading