Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Octokit.AuthorizationException in dotneteng-status #4405

Open
3 tasks
riarenas opened this issue Nov 5, 2024 · 11 comments
Open
3 tasks

Octokit.AuthorizationException in dotneteng-status #4405

riarenas opened this issue Nov 5, 2024 · 11 comments
Assignees

Comments

@riarenas
Copy link
Member

riarenas commented Nov 5, 2024

https://dev.azure.com/dnceng/internal/_build/results?buildId=2575453&view=results didn't open an issue. Investigate

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

@riarenas
Copy link
Member Author

riarenas commented Nov 5, 2024

The service has consistent failures to authenticate against github. I had to update a private key in the helixprodservice manifest that seems related, but I'm unsure of the problem so far and I will continue looking.

@riarenas riarenas changed the title Investigate build monitor not opening issues for failed daily builds Octokit.AuthorizationException in dotnetengstatus Nov 6, 2024
@riarenas riarenas changed the title Octokit.AuthorizationException in dotnetengstatus Octokit.AuthorizationException in dotneteng-status Nov 6, 2024
@riarenas
Copy link
Member Author

riarenas commented Nov 8, 2024

I wasn't able to make much progress here. The only secret we've had to cycle recently is in https://dnceng.visualstudio.com/internal/_git/dotnet-helix-service?path=/.vault-config/helixprodservice.yaml&version=GBmain&line=170&lineEnd=177&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents. The secret specifies only the private key should be cycled, so that's what I've been doing.

It's possible that the secret was cycled successfully and this isn't related, but that particular secret secret seemed to be used for this particular endpoint so there might still be something there.

@riarenas riarenas removed their assignment Nov 8, 2024
@MilenaHristova MilenaHristova self-assigned this Nov 11, 2024
@premun
Copy link
Member

premun commented Nov 11, 2024

Image

It does not repro always though

@MilenaHristova
Copy link
Member

MilenaHristova commented Nov 11, 2024

I think I found a clue
If we also summarize by operation_Name ( query) most of the failed requests are for operation GitHubHook/IssuesHook
Looking at which GitHub issues are mentioned in the traces
All the failed requests are processing issues in dnceng - query
The successful requests are for other repos - query

Also the failed requests have traces "No cached installation token found for dotnet" which comes from InMemoryCacheInstallationLookup

@MilenaHristova
Copy link
Member

What I don't understand is how there is no cached installation token found for dnceng repo when it's found for other repos in dotnet while the lookup is by org

@MilenaHristova
Copy link
Member

@dkurepa updated Maestro recently to use Octokit.Webhooks.AspNetCore - dotnet/arcade-services#4111
@riarenas would you consider doing the same change in dnceng repo, we think it will help

@riarenas
Copy link
Member Author

Whatever fixes the issue sounds good to me

@riarenas
Copy link
Member Author

Could you summarize how the move to the new library would help with these exceptions?

@riarenas
Copy link
Member Author

riarenas commented Nov 13, 2024

#4299 (comment) was posted today from the staging environment, which makes me question why a library update would be needed to solve this.

Are we sure it's not just a production secret that isn't working properly? (I'll try to carve some time tomorrow to check the flows again)

@MilenaHristova
Copy link
Member

This is the issue opened in arcade-services mentioning that the current library is deprecated dotnet/arcade-services#3527
I'd need more time to understand the change in authentication, looks like in Maestro there is a different secret used instead of installation and private key
The library handles payload parsing and integration with webhook events - one of our theories is that the service might be getting incorrect payload

@riarenas riarenas assigned riarenas and unassigned MilenaHristova Nov 14, 2024
@riarenas
Copy link
Member Author

I used secret manager to cycle https://github.com/dotnet/dnceng/blob/main/.vault-config/shared/dotneteng-status-secrets.yaml#L1 and restarted the service.

I verified the alerting functionality is working again in #4449 and that the issue mirroring from azure devops to github is working in #4450.

I have a theory on what happened and I will add more details after I check that out tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants