Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigtable: add timeout to token refresh #28728

Merged

Conversation

brandon-j-roberts
Copy link
Contributor

@brandon-j-roberts brandon-j-roberts commented Nov 2, 2022

Problem

Solana archival nodes have been disconnecting from big table. This causes headache for node providers, since we need to restart these nodes once they disconnect. This issue also causes 30-90 minutes of downtime for the node user, since we need to restart the node to regain access to bigtable. ( This can happen multiple times a day )

Summary of Changes

Add a timeout causing auth token for bigtable to refresh.

These changes were part of an older PR that was closed due to inactivity. I cherry picked the changes from the developed patch.

Testing

Our team has built the solana client and applied this change to our ~10 archival nodes. They haven't experienced any disconnects from bigtable in the ~1 week of testing we have undergone. These nodes have been able to access all the way to block 0 in bigtable for ~1 week without any issues. ( These would have disconnected multiple times already before the patch was applied )

Closes #20336

@mergify mergify bot added the community Community contribution label Nov 2, 2022
@mergify mergify bot requested a review from a team November 2, 2022 20:17
@brandon-j-roberts
Copy link
Contributor Author

After about 2 weeks, we just have had 1 node disconnect from bigtable. This seems to be less common than before applying this change, but still not great.
We are getting a lot of timeouts on the token refresh it seems.
WARN solana_storage_bigtable::access_token] Token refresh timeout

@CriesofCarrots
Copy link
Contributor

CriesofCarrots commented Nov 11, 2022

@brandon-j-roberts , thanks for the update. I've been trying to get further testing on this done on foundation RPC nodes to make sure out test cases are seeing significant loads. So far, it seems like the disconnects are still happening, although if we can determine they are happening less often, it could still be worth the merge.
Please keep me posted if/as you are able to get more data.

@CriesofCarrots
Copy link
Contributor

Foundation nodes running with this patch have gone 10 days without a bigtable disconnect, so does seem like there's notable improvement. I will kick off CI

@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Nov 14, 2022
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Nov 14, 2022
@CriesofCarrots CriesofCarrots added CI Pull Request is ready to enter CI v1.13 labels Nov 14, 2022
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Nov 14, 2022
@CriesofCarrots
Copy link
Contributor

@brandon-j-roberts , I'd like to see if a rebase fixes the docs CI failure. Could you give me push perms? Or rebase on master, please?

@brandon-j-roberts
Copy link
Contributor Author

I just rebased, I believe. Also, I am doing some more investigation on why our node disconnected. I'll let you know if I find anything interesting. @CriesofCarrots

@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Nov 14, 2022
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Nov 14, 2022
@CriesofCarrots
Copy link
Contributor

Going to merge! Feeling Pollyanna-y today, so I'm going to have this close #20336
@brandon-j-roberts, let's start a new issue if you find out more about your node's disconnection, or still see it happening occasionally.

@CriesofCarrots CriesofCarrots merged commit 5598570 into solana-labs:master Nov 14, 2022
mergify bot pushed a commit that referenced this pull request Nov 14, 2022
Co-authored-by: Kirill Fomichev <[email protected]>
(cherry picked from commit 5598570)
mergify bot pushed a commit that referenced this pull request Nov 14, 2022
Co-authored-by: Kirill Fomichev <[email protected]>
(cherry picked from commit 5598570)
CriesofCarrots pushed a commit that referenced this pull request Nov 15, 2022
Co-authored-by: Kirill Fomichev <[email protected]>
(cherry picked from commit 5598570)
mergify bot added a commit that referenced this pull request Nov 15, 2022
bigtable: add timeout to token refresh (#28728)

Co-authored-by: Kirill Fomichev <[email protected]>
(cherry picked from commit 5598570)

Co-authored-by: Brandon Roberts <[email protected]>
mergify bot added a commit that referenced this pull request Nov 15, 2022
bigtable: add timeout to token refresh (#28728)

Co-authored-by: Kirill Fomichev <[email protected]>
(cherry picked from commit 5598570)

Co-authored-by: Brandon Roberts <[email protected]>
xiangzhu70 pushed a commit to xiangzhu70/solana that referenced this pull request Nov 16, 2022
gnapoli23 pushed a commit to gnapoli23/solana that referenced this pull request Dec 16, 2022
nickfrosty pushed a commit to nickfrosty/solana that referenced this pull request Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bigtable timeouts leads to permanent disconnection
4 participants