Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

bigtable: fix AccessToken issues #34213

Merged
merged 5 commits into from
Nov 27, 2023

Conversation

fanatid
Copy link
Contributor

@fanatid fanatid commented Nov 23, 2023

Problem

The problem is widely known that sometimes nodes lose BigTable connection (Access Token expired actually).

Previously described in different issues and pull requests, like #20336 #26217 #28728 #29692

The code itself looks completely OK, we test if the token is expired and if so set the flag to disable updates from other calls, have a timeout, and correctly handle an error, in the end we set the flag back to 'non-update status'. Everything is correct, except for one thing.... we miss that function is async and if Future would dropped during update we never finish the token update and never set the flag back to 'non-update status'. As a result, we will lose connection to BigTable. If Mutex were used instead AtomicFlag we would never had that problem because MutexGuard would be dropped anyway 😁 use AtomicFlag is ok too, but you need to be careful 😉

We currently test this patch on Triton nodes but everything looks great so far.

Summary of Changes

The obvious fix was to use Mutex instead AtomicFlag but instead I prefer to change try_refresh to 'non-async' function and use tokio::spawn for the token refresh. This is slightly more complicated but I think it's OK.

@mergify mergify bot added community Community contribution need:merge-assist labels Nov 23, 2023
@mergify mergify bot requested a review from a team November 23, 2023 03:36
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Nov 27, 2023
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Nov 27, 2023
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tracking this down! I made a few comments, as I think it could have been an easier review.

storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
storage-bigtable/src/access_token.rs Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Merging #34213 (c56cb61) into master (e832765) will decrease coverage by 0.1%.
Report is 7 commits behind head on master.
The diff coverage is 13.3%.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #34213     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         819      819             
  Lines      219350   219369     +19     
=========================================
- Hits       179698   179679     -19     
- Misses      39652    39690     +38     

@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Nov 27, 2023
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Nov 27, 2023
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, much simpler! One last nit, otherwise r+ fmt and clippy being happy.

storage-bigtable/src/access_token.rs Show resolved Hide resolved
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CriesofCarrots CriesofCarrots added the v1.17 PRs that should be backported to v1.17 label Nov 27, 2023
Copy link
Contributor

mergify bot commented Nov 27, 2023

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

@CriesofCarrots
Copy link
Contributor

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

Thanks, mergify! This is a bug fix, and very limited in scope.

@CriesofCarrots CriesofCarrots merged commit 873bef9 into solana-labs:master Nov 27, 2023
18 checks passed
mergify bot pushed a commit that referenced this pull request Nov 27, 2023
* bigtable: fix AccessToken issue

* remove inner

* less changes

* fmt + drop lock

(cherry picked from commit 873bef9)
@fanatid fanatid deleted the fix-bt-access-token branch November 28, 2023 00:06
CriesofCarrots pushed a commit that referenced this pull request Nov 28, 2023
bigtable: fix AccessToken issues (#34213)

* bigtable: fix AccessToken issue

* remove inner

* less changes

* fmt + drop lock

(cherry picked from commit 873bef9)

Co-authored-by: Kirill Fomichev <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community Community contribution need:merge-assist v1.17 PRs that should be backported to v1.17
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants