-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receiving 401 response during atlantis apply when using GitHub App authentication method #2285
Comments
An update that in fact we are only seeing this on 0.19.2 and .3 -- our original downgrade back to 0.17.5 didn't initially switch so we thought we were still seeing it with the older version. |
Because the error is about authentication, we decided to try updating back to 0.19.3 but changing the authentication from GH App to user+token. Previously the error would occur after a few hours (regardless of activity levels), but so far after changing the authentication, we have not been receiving 401 responses when applying. |
We've now been running a 0.19.3 Atlantis version since June 9 using the user+token with no errors. We'd very much like to use the GH App route again, but it seems fairly clear there's some sort of issue there. |
I'm regularly seeing this- I have to restart the atlantis pod in k8s before it will work again. Would very much like to see this resolved. Could this be related to some credential that isn't being refreshed properly? |
I have also faced this issue and I started investigating it. I think the bug comes from the GitHub App implementation.
Source: GitHub App documentation. The problem here is that the token is fetched by Atlantis once when the server starts running and also when a new GitHub repository clone is made. This means that every time a new PR is opened a new clone is made and the token is refreshed. This bug appears when Atlantis enters an "idle" state for more than one hour. I am not familiar with the codebase, but those are the calls I've found and from where I took this conclusion: My suggestion for this fix would be the following:
I'd ask someone who also knows the codebase to confirm my findings and if everything is correct, I will open a PR to solve this issue. |
@jamengual can you, please, take a look over my response? |
We see the same issue with v0.19.9-pre.20220822, but surprisingly only when atlantis doing |
@lilincmu this only happens when is a github app. |
@valentindeaconu we are working on this |
This is indeed happening only when using github app auth, which has a 1 hour token lifetime (https://docs.github.com/en/developers/apps/building-github-apps/authenticating-with-github-apps#authenticating-as-an-installation) because the token for the graphql calls is only created once - during the client initialization (
The underlying ghinstallation library includes the ability to refresh tokens if they're near expiration, I just don't believe we have enough information once the client is created to be able to do that - we need the credentials and the graphql url to be able to make those calls. I thought about a couple of ways to handle this - the cleanest to me seems to be to remove the initialization of the graphql client from the GithubClient initialization flow, and do that on-the-fly as we're making graphql queries to ensure we always have an up-to-date token. I'll put a PR with an example setup for this in a few. |
Community Note
Overview of the Issue
Beginning May 20, we've started receiving the error message below when
atlantis apply
is run.This is the same error as in #2187, but we are not using
gh-team-allowlist
and are getting the error with 0.17.5 (which does not have that feature) so I opened a separate issue.In that issue and in #2090, I've seen notes about checking the GitHub API rate limit, but I'm not sure that's possible with the OAuth installed application, since I don't have a token to call the rate limit APIs (REST or GraphQL, but this appears to be GraphQL from the error message) as that application user.
We originally noticed this behavior when we did an upgrade from 0.17.5 to 0.19.2 and thought that it was a bug, so upgraded to 0.19.3
and then downgraded back to 0.17.5 -- but all 3but both have been exhibiting the same behavior.Recreating the Atlantis container immediately resolves the problem (my guess is because it gets a new token from the oauth flow). That is obviously not a great workflow, but it is our workaround for the moment.
Reproduction Steps
While we see it regularly, I don't know how to provide steps that someone else could use to reproduce the behavior.
Environment details
0.17.5,0.19.2, 0.19.3Additional Context
We're running Atlantis as a GitHub app and in a container, which is how we have run it for the past ~1.5 years. Our cadence of updates to the repo that Atlantis is watching has not increased (if anything, it has slowed some).
The text was updated successfully, but these errors were encountered: