Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With defined gh-team-allowlist Atlantis randomly stops working with 401 Unauthorized body #2187

Closed
komljen opened this issue Apr 6, 2022 · 19 comments
Labels
bug Something isn't working waiting-on-response Waiting for a response from the user

Comments

@komljen
Copy link

komljen commented Apr 6, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

With defined gh-team-allowlist, Atlantis randomly stops working with the following error when running the plan:

{"level":"error","ts":"2022-04-05T15:33:04.300Z","caller":"events/command_runner.go:219","msg":"Unable to check user permissions: non-200 OK status code: 401 Unauthorized body: \"{\\\"message\\\":\\\"Bad credentials\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/graphql\\\"}\"","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\tgithub.com/runatlantis/atlantis/server/events/command_runner.go:219"}

A restart of the pod fixes it, but it breaks again after a few hours.

Atlantis version: v0.19.2
Config:

disable-apply-all: true
enable-diff-markdown-format: true
enable-regexp-cmd: true
gh-app-id: <ID>
gh-app-key-file: /atlantis/gh-app-key-file.pem
gh-app-slug: atlantis-faire
gh-org: Faire
gh-team-allowlist: "*:plan,*:unlock,backend-platform:*,data-infra:*"
gh-webhook-secret: <SECRET>
hide-prev-plan-comments: true
write-git-creds: true

I also tried v0.19.1, but it failed with the following error:

"Unable to check user permissions: struct field for \"__schema\" doesn't exist in any of 1 places to unmarshal

However, this is expected, from release notes in the latest version.

@jamengual
Copy link
Contributor

the struct issue you are reporting was fix in #2128

@komljen
Copy link
Author

komljen commented May 15, 2022

I'm not reporting that issue in this one. This is non-200 OK status code: 401 Unauthorized body with v0.19.2.
I just mentioned that I tried v0.19.1 as well and got the issue that is already fixed, but that is ok and expected.

@jamengual
Copy link
Contributor

understood

@jamengual jamengual changed the title With defined gh-team-allowlist Atlantis randomly stops working With defined gh-team-allowlist Atlantis randomly stops working with 401 Unauthorized body May 15, 2022
@raymondchen625
Copy link
Contributor

@komljen I found this article. Might be related to rate limit and the misleading error message. Could you check your rate limit when it happens again?

@komljen
Copy link
Author

komljen commented May 21, 2022

Interesting, will check that and report on the findings.

@komljen
Copy link
Author

komljen commented Jun 13, 2022

This is an interesting finding #2285 (comment)
So, it works with token auth but doesn't with GH App.

@cjbehm
Copy link

cjbehm commented Jun 14, 2022

@komljen Yeah, we've now been able to run for multiple days with 0.19.3 and the user+token authentication instead of GH App. With the GH App authentication, we could only go a few hours at most.

With 0.17.5, the GH App route worked perfectly fine.

@jamengual
Copy link
Contributor

jamengual commented Jun 14, 2022 via email

@cjbehm
Copy link

cjbehm commented Jun 14, 2022

@jamengual I'm not using the GH team allow list feature, just was confirming @komljen 's comment, so I can't test that out (also the gh team list feature was added in 0.18 and moved to GraphQL in 0.18.3)

I do think that #2285 and this issue could be the same root cause, but I created that issue specifically because our errors arose without using any new features; just as a pure version upgrade.

@jamengual
Copy link
Contributor

jamengual commented Jun 14, 2022 via email

@cjbehm
Copy link

cjbehm commented Jun 14, 2022

Could Atlantis request and log rate limit info in its query per https://docs.github.com/en/graphql/overview/resource-limitations ?

It's hard to imagine throttling as the source when our problem in #2285 disappeared by switching to token auth instead of GH App, but GitHub's API response on its own is nearly useless.

I'm starting to believe the API call throttling issue is what is causing this and the error message does not help much. I'm hoping Github API will be more descriptive of the real issue behind it and hopefully expose metrics around API calls.

@jamengual
Copy link
Contributor

jamengual commented Jun 14, 2022 via email

@jamengual
Copy link
Contributor

is this still happening in v0.19.8?

@jamengual jamengual added the waiting-on-response Waiting for a response from the user label Aug 26, 2022
@komljen
Copy link
Author

komljen commented Sep 8, 2022

is this still happening in v0.19.8?

I didn't try that version yet but will wait for this PR #2479.
Seems like a proper fix for this issue.

@jamengual
Copy link
Contributor

jamengual commented Sep 8, 2022 via email

@jullianow
Copy link

+1

@jamengual
Copy link
Contributor

jamengual commented Oct 3, 2022 via email

@komljen
Copy link
Author

komljen commented Oct 3, 2022

Yes, forgot to update here, but no issues with the latest version.

@primeroz
Copy link

primeroz commented Oct 5, 2022

this has been already fixed, test the new version

we are still hitting it with latest 0.19.8

edit: a4a49bf i see this is in 0.19.9 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting-on-response Waiting for a response from the user
Projects
None yet
Development

No branches or pull requests

6 participants