-
Notifications
You must be signed in to change notification settings - Fork 4.5k
bigtable: fix AccessToken issues #34213
bigtable: fix AccessToken issues #34213
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down! I made a few comments, as I think it could have been an easier review.
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #34213 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 819 819
Lines 219350 219369 +19
=========================================
- Hits 179698 179679 -19
- Misses 39652 39690 +38 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, much simpler! One last nit, otherwise r+ fmt and clippy being happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
Thanks, mergify! This is a bug fix, and very limited in scope. |
* bigtable: fix AccessToken issue * remove inner * less changes * fmt + drop lock (cherry picked from commit 873bef9)
bigtable: fix AccessToken issues (#34213) * bigtable: fix AccessToken issue * remove inner * less changes * fmt + drop lock (cherry picked from commit 873bef9) Co-authored-by: Kirill Fomichev <[email protected]>
Problem
The problem is widely known that sometimes nodes lose BigTable connection (Access Token expired actually).
Previously described in different issues and pull requests, like #20336 #26217 #28728 #29692
The code itself looks completely OK, we test if the token is expired and if so set the flag to disable updates from other calls, have a timeout, and correctly handle an error, in the end we set the flag back to 'non-update status'. Everything is correct, except for one thing.... we miss that function is
async
and ifFuture
would dropped during update we never finish the token update and never set the flag back to 'non-update status'. As a result, we will lose connection to BigTable. IfMutex
were used insteadAtomicFlag
we would never had that problem becauseMutexGuard
would be dropped anyway 😁 useAtomicFlag
is ok too, but you need to be careful 😉We currently test this patch on Triton nodes but everything looks great so far.
Summary of Changes
The obvious fix was to use
Mutex
insteadAtomicFlag
but instead I prefer to changetry_refresh
to 'non-async' function and usetokio::spawn
for the token refresh. This is slightly more complicated but I think it's OK.