Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent fault login_as_launchpad #216

Closed
JvHd-vw opened this issue Nov 4, 2021 · 5 comments · Fixed by #223
Closed

Intermittent fault login_as_launchpad #216

JvHd-vw opened this issue Nov 4, 2021 · 5 comments · Fixed by #223
Assignees
Labels
bug Something isn't working

Comments

@JvHd-vw
Copy link
Contributor

JvHd-vw commented Nov 4, 2021

I have encountered a intermittent fault when running the rover on GitHub actions with a service principal.
The error I get in the workflow is :

Getting launchpad coordinates from subscription: ***
 - keyvault_name: null
ERROR: AKV10032: Invalid issuer. Expected one of https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/, https://sts.windows.net/f8cdef31-a31e-4b4a-93e4-5f571e91255a/, https://sts.windows.net/e2d54eb5-3869-4f70-8578-dee5fc7331f4/, https://sts.windows.net/33e01921-4d64-4f8c-a055-5bdaffd5e33d/, https://sts.windows.net/975f013f-7f24-47e8-a7d3-abc4752bf346/, found https://sts.windows.net/***/.
Error on or near line 354: Not authorized to manage landingzones. User must be member of the security group to access the launchpad and deploy a landing zone; exiting with status 102

This error happens after the workflow already successfully deployed a level0 landingzone and is trying to deploy a higher level lz. After some debugging I found out that it has to do with the fact that our workflow uses a matrix and hence has separate jobs for each landingzone. If a followup job routes request to the same region as level0 e.g WESTUS2 the workflow succeeds but when routed to a different region e.g EASTUS2 the keyvault used in login_as_launchpad is not present in the output of the az keyvault list command.

I have created a Support Request (ID: 2110150050000485) on Azure to investigate/mitigate this issue.
For now Azure Support suggested to replace az keyvault list with az graph query as the latter should not have the issue.

I created a pull request to implement this.

@brk3
Copy link
Contributor

brk3 commented Nov 8, 2021

If a followup job routes request to the same region as level0 e.g WESTUS2 the workflow succeeds but when routed to a different region e.g EASTUS2 the keyvault used in login_as_launchpad is not present in the output of the az keyvault list command.

Hi, just curious as to why this would happen. My understanding is az cli requests are not region specific?

@JvHd-vw
Copy link
Contributor Author

JvHd-vw commented Nov 8, 2021

The request is not, but depending on where the response comes from it dit or dit not have the keyvault in the response. I ran az keyvault list --debug and received a different 'Response content' depending on the x-ms-routing-request-id. So depending on the region in the request-id I received different result.
Bug

@brk3
Copy link
Contributor

brk3 commented Nov 8, 2021

That sounds really odd behavior, I would expect that if this happened consistently a lot of people would be having problems. Are you behind some form of proxy that may be caching responses? Why does it only happen when using a service principal?

The PR looks good to me, just feel like there's more to this puzzle, perhaps maintainers from MS can share more insight.

@JvHd-vw
Copy link
Contributor Author

JvHd-vw commented Nov 8, 2021

That sounds really odd behavior, I would expect that if this happened consistently a lot of people would be having problems. Are you behind some form of proxy that may be caching responses? Why does it only happen when using a service principal?

The PR looks good to me, just feel like there's more to this puzzle, perhaps maintainers from MS can share more insight.

I experienced it when running on github-actions, but also locally.

@arnaudlh arnaudlh added the bug Something isn't working label Nov 9, 2021
@arnaudlh
Copy link
Member

arnaudlh commented Nov 9, 2021

hi @brk3 @JvHd-vw working on a repro now, will keep you guys posted.

@arnaudlh arnaudlh linked a pull request Dec 6, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants