-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bigtable: add timeout to token refresh #28728
bigtable: add timeout to token refresh #28728
Conversation
After about 2 weeks, we just have had 1 node disconnect from bigtable. This seems to be less common than before applying this change, but still not great. |
@brandon-j-roberts , thanks for the update. I've been trying to get further testing on this done on foundation RPC nodes to make sure out test cases are seeing significant loads. So far, it seems like the disconnects are still happening, although if we can determine they are happening less often, it could still be worth the merge. |
Foundation nodes running with this patch have gone 10 days without a bigtable disconnect, so does seem like there's notable improvement. I will kick off CI |
@brandon-j-roberts , I'd like to see if a rebase fixes the docs CI failure. Could you give me push perms? Or rebase on master, please? |
I just rebased, I believe. Also, I am doing some more investigation on why our node disconnected. I'll let you know if I find anything interesting. @CriesofCarrots |
Going to merge! Feeling Pollyanna-y today, so I'm going to have this close #20336 |
Co-authored-by: Kirill Fomichev <[email protected]> (cherry picked from commit 5598570)
Co-authored-by: Kirill Fomichev <[email protected]> (cherry picked from commit 5598570)
Co-authored-by: Kirill Fomichev <[email protected]> (cherry picked from commit 5598570)
bigtable: add timeout to token refresh (#28728) Co-authored-by: Kirill Fomichev <[email protected]> (cherry picked from commit 5598570) Co-authored-by: Brandon Roberts <[email protected]>
bigtable: add timeout to token refresh (#28728) Co-authored-by: Kirill Fomichev <[email protected]> (cherry picked from commit 5598570) Co-authored-by: Brandon Roberts <[email protected]>
Co-authored-by: Kirill Fomichev <[email protected]>
Co-authored-by: Kirill Fomichev <[email protected]>
Co-authored-by: Kirill Fomichev <[email protected]>
Problem
Solana archival nodes have been disconnecting from big table. This causes headache for node providers, since we need to restart these nodes once they disconnect. This issue also causes 30-90 minutes of downtime for the node user, since we need to restart the node to regain access to bigtable. ( This can happen multiple times a day )
Summary of Changes
Add a timeout causing auth token for bigtable to refresh.
These changes were part of an older PR that was closed due to inactivity. I cherry picked the changes from the developed patch.
Testing
Our team has built the solana client and applied this change to our ~10 archival nodes. They haven't experienced any disconnects from bigtable in the ~1 week of testing we have undergone. These nodes have been able to access all the way to block 0 in bigtable for ~1 week without any issues. ( These would have disconnected multiple times already before the patch was applied )
Closes #20336