Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds keep-alive-interval to repair QUIC transport config #33866

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

behzadnouri
Copy link
Contributor

Problem

QUIC connections timeout due to infrequent repair requests.

Summary of Changes

Added keep_alive_interval to transport config.

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Merging #33866 (fe866f3) into master (eba1b2d) will increase coverage by 0.0%.
Report is 1 commits behind head on master.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##           master   #33866   +/-   ##
=======================================
  Coverage    81.9%    81.9%           
=======================================
  Files         811      811           
  Lines      219356   219359    +3     
=======================================
+ Hits       179732   179735    +3     
  Misses      39624    39624           

const MAX_CONCURRENT_BIDI_STREAMS: VarInt = VarInt::from_u32(512);
const MAX_IDLE_TIMEOUT: Duration = Duration::from_secs(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see definitions for these in sdk/src/quic.rs. Can you add comments about why new definitions are needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones in sdk are used for tpu, which has very different properties than repair. In particular tpu packets are only forwarded to the upcoming leaders which we know in advance who they are, it is a uni-directional traffic, and has a very different load and latency tolerance. Repair is bi-directional request-response, peers are randomly sampled, etc.

I pretty much rather not to reuse the same parameters at this stage until this code is stable and repair over quic protocol is fully rolled out over mainnet. Until then reusing same parameters can introduce regressions on one protocol while trying to optimize another protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just add some code comments on the reasoning will help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@behzadnouri behzadnouri requested a review from jbiseda October 26, 2023 14:36
let mut config = TransportConfig::default();
config
.datagram_receive_buffer_size(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment this to disable datagrams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@lijunwangs lijunwangs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@behzadnouri behzadnouri merged commit 3ac2507 into solana-labs:master Nov 8, 2023
17 checks passed
@behzadnouri behzadnouri deleted the repair-quic-config branch November 8, 2023 20:09
@behzadnouri behzadnouri added the v1.17 PRs that should be backported to v1.17 label Nov 8, 2023
mergify bot pushed a commit that referenced this pull request Nov 8, 2023
QUIC connections may timeout due to infrequent repair requests. The commit
configures keep_alive_interval and max_idle_timeout to avoid timeouts.

(cherry picked from commit 3ac2507)
mergify bot added a commit that referenced this pull request Nov 9, 2023
…port of #33866) (#33992)

adds keep-alive-interval to repair QUIC transport config (#33866)

QUIC connections may timeout due to infrequent repair requests. The commit
configures keep_alive_interval and max_idle_timeout to avoid timeouts.

(cherry picked from commit 3ac2507)

Co-authored-by: behzad nouri <[email protected]>
anwayde pushed a commit to firedancer-io/solana that referenced this pull request Nov 16, 2023
…port of solana-labs#33866) (solana-labs#33992)

adds keep-alive-interval to repair QUIC transport config (solana-labs#33866)

QUIC connections may timeout due to infrequent repair requests. The commit
configures keep_alive_interval and max_idle_timeout to avoid timeouts.

(cherry picked from commit 3ac2507)

Co-authored-by: behzad nouri <[email protected]>
anwayde pushed a commit to firedancer-io/solana that referenced this pull request Nov 16, 2023
…port of solana-labs#33866) (solana-labs#33992)

adds keep-alive-interval to repair QUIC transport config (solana-labs#33866)

QUIC connections may timeout due to infrequent repair requests. The commit
configures keep_alive_interval and max_idle_timeout to avoid timeouts.

(cherry picked from commit 3ac2507)

Co-authored-by: behzad nouri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.17 PRs that should be backported to v1.17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants