Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we don't miss the resolution of an access request #9193

Merged
merged 5 commits into from
Dec 10, 2021

Conversation

espadolini
Copy link
Contributor

@espadolini espadolini commented Dec 2, 2021

Right now tsh fires up a watcher to wait for the access request to be resolved and then creates the request; however it doesn't wait for the watcher to be ready, and so it's possible to miss the resolution of the request (be it approval, denial or removal) and we get a hang.

This makes it so that we wait for the watcher to be ready before proceeding with the request creation. In addition, we create the watcher on the root cluster rather than whatever cluster we're connected to.

Closes #9003 and #9244.

@github-actions github-actions bot requested review from r0mant and russjones December 2, 2021 11:54
@espadolini
Copy link
Contributor Author

How should I go about adding a test for this? #9003 has some instructions on how to replicate it, but it could also be tested by artificially increasing the time it takes for the watcher to be ready in the code.

@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from fdb7501 to a6fdc22 Compare December 2, 2021 12:23
@russjones
Copy link
Contributor

@fspmarshall Any suggestions for @espadolini on how to write tests for this?

@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from a6fdc22 to c9acdcb Compare December 6, 2021 11:22
@espadolini
Copy link
Contributor Author

espadolini commented Dec 6, 2021

After some minor refactoring, the watcher now waits for access request updates on the root cluster, rather than whatever cluster we're currently connected to.

tool/tsh/tsh_test.go Outdated Show resolved Hide resolved
@espadolini
Copy link
Contributor Author

I added a test for the "access request while logged into a leaf cluster" scenario, it fails on master and passes after the changes in this PR. (for #9244)

I'm still not sure how to make a specific test for the spurious hangs of #9003; the test I just added does go through the codepath, so if we get a regression and the bug resurfaces we might eventually hit it in our CI runs anyway.

@espadolini espadolini marked this pull request as ready for review December 7, 2021 14:25
@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch 2 times, most recently from cc729ff to 851d062 Compare December 9, 2021 08:54
tool/tsh/tsh_test.go Outdated Show resolved Hide resolved
@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch 2 times, most recently from f456904 to 4915ab9 Compare December 9, 2021 14:58
@fspmarshall
Copy link
Contributor

I'm still not sure how to make a specific test for the spurious hangs

@espadolini Since this update removes the cause of the spurious hangs, I don't think there is a reasonable way to try to replicate them in a test. As you said, we cover the codepath, so if a new spurious hang is introduced, hopefully we'll catch it.

@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from 4915ab9 to 9b1eb0a Compare December 9, 2021 18:53
@russjones
Copy link
Contributor

@espadolini Please backport this to branch/v8 and branch/v7. You can create backport PRs now, but don't merge until next week.

@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from dd033de to 4ce2e07 Compare December 9, 2021 19:59
@espadolini espadolini enabled auto-merge (squash) December 9, 2021 20:10
@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from 4ce2e07 to 928b3c0 Compare December 9, 2021 20:19
@espadolini espadolini force-pushed the espadolini/tsh-watcher-race branch from 928b3c0 to acc84f0 Compare December 10, 2021 07:50
@espadolini espadolini merged commit c3dee23 into master Dec 10, 2021
@espadolini espadolini deleted the espadolini/tsh-watcher-race branch December 10, 2021 08:09
espadolini added a commit that referenced this pull request Dec 10, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v7 backport fixups.
espadolini added a commit that referenced this pull request Dec 10, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v8 backport fixups.
espadolini added a commit that referenced this pull request Dec 14, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v8 backport fixups.
espadolini added a commit that referenced this pull request Dec 14, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v7 backport fixups.
espadolini added a commit that referenced this pull request Dec 14, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v8 backport fixups.
espadolini added a commit that referenced this pull request Dec 16, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v7 backport fixups.
espadolini added a commit that referenced this pull request Dec 16, 2021
This makes it so that tsh will watch for access request resolution on the
correct (root) cluster, and it will not create access requests before the event
watcher is ready.

Fixes #9003 and #9244.

Includes v7 backport fixups.
@webvictim webvictim mentioned this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants