Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ASO install #3450

Merged
merged 1 commit into from
May 8, 2023
Merged

add ASO install #3450

merged 1 commit into from
May 8, 2023

Conversation

nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Apr 18, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it: This PR adds the kustomize config necessary to install ASO. These changes deliberately omit integration with tilt or clusterctl init so ASO doesn't get installed before it can be used. This other branch on my fork includes the necessary foo to wire up the install if you'd like to try it yourself: https://github.com/nojnhuh/cluster-api-provider-azure/tree/aso-wired

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3520

Special notes for your reviewer:

  • cherry-pick candidate

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 18, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 18, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Patch coverage has no change and project coverage change: +0.01 🎉

Comparison is base (b084e15) 52.51% compared to head (97e7f92) 52.52%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3450      +/-   ##
==========================================
+ Coverage   52.51%   52.52%   +0.01%     
==========================================
  Files         182      182              
  Lines       18175    18175              
==========================================
+ Hits         9545     9547       +2     
+ Misses       8090     8088       -2     
  Partials      540      540              

see 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 19, 2023

/retest

name: aso-controller-settings
type: Opaque
data:
AZURE_SUBSCRIPTION_ID: ${AZURE_SUBSCRIPTION_ID_B64:=""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this temporary until we can use workload identity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking auth for ASO would essentially mirror the AzureClusterIdentity for the Cluster, which could also be Service Principal. Or would it make sense to always configure ASO to use workload identity? I still don't have a clear idea of exactly how workload identity works so I couldn't quite figure out how to get that set up based on the docs: https://azure.github.io/azure-service-operator/guide/authentication/#azure-workload-identity

I was also thinking having a default set of credentials here would make it easy for users to get started using ASO directly even if CAPZ can work around relying on the credentials here.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@Jont828 Jont828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small things but overall looks very solid!

type: Opaque
data:
AZURE_SUBSCRIPTION_ID: ${AZURE_SUBSCRIPTION_ID_B64:=""}
AZURE_TENANT_ID: ${AZURE_TENANT_ID_B64:=""}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there a reason we want to add the :="" part to the variable references?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit this was a shameless copy-paste from here and didn't really consider whether it's necessary or not:

data:
subscription-id: ${AZURE_SUBSCRIPTION_ID_B64:=""}

It looks like ASO will get stuck in a crash loop either way when the vars aren't defined, but with :="" it doesn't crash until an ASO resource is created, whereas without :="" it'll crash as soon as it starts up even when no ASO resources exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CAPZ manager credentials, the idea is that you can optionally define global credentials but it's also possible to define per Cluster credentials via AzureClusterIdentity, so we don't want the credentials to be required when installing CAPZ as they aren't actually required until you create your first cluster (without the defaulting to "", which is what :="" does, clusterctl init would fail when the environment variables are not set).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think with :="" is what we want then. Eventually, a set of ASO credentials will be generated for each individual cluster from an AzureClusterIdentity if one exists, so those users can still choose not to define the global credentials and ASO will work fine with CAPZ until the user tries to create an ASO resource themselves without specifying credentials and relying on the defaults.

- credentials.yaml

patches:
- patch: |- # default kustomization includes a namespace already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we want a line break at the end? Not sure if it's better to use |- or | in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that doesn't affect the output at all.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 21, 2023

I just noticed that CAPI Tilt assumes the root of the kustomize config is config/default (and can't be configured) where this changes wants it to be config/release. I haven't tried it yet, but 'm pretty sure that means spinning up CAPZ with CAPI Tilt won't deploy ASO. Should I modify the current default to be something like capz and then change release -> default?

Originally I didn't integrate ASO directly with config/default because some of the config applying to all resources (like namePrefix) would have messed up some of the ASO resources. I can try that again though to see if I can get that to work.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 21, 2023

I managed to get this working from the existing config/default, but it feels like kind of a mess: nojnhuh@39550967

Thoughts on this approach vs. renaming the current config/default?

@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Apr 21, 2023

CAPI Tilt assumes the root of the kustomize config is config/default (and can't be configured)

Have you considered making it configurable as an alternative?

another idea: what if we renamed default to something else, and config/release becomes config/default?

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 21, 2023

Have you considered making it configurable as an alternative?

Kind of, didn't take the time yet to figure out exactly how that might work though.

another idea: what if we renamed default to something else, and config/release becomes config/default?

Yeah I think this will be the easiest option. I'll update this PR next week with that change.

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 24, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 24, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 25, 2023

/retest

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 26, 2023

/hold

I think I broke something here :/

Error from server (InternalError): error when creating "jon/cluster.yaml": Internal error occurred: failed calling webhook "default.azuremanagedcontrolplanes.infrastructure.cluster.x-k8s.io": failed to call webhook: Post "https://capz-webhook-service.capz-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta1-azuremanagedcontrolplane?timeout=10s": x509: certificate is valid for azureserviceoperator-webhook-service.capz-system.svc, azureserviceoperator-webhook-service.capz-system.svc.cluster.local, not capz-webhook-service.capz-system.svc

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 28, 2023

/hold

I think I broke something here :/

Error from server (InternalError): error when creating "jon/cluster.yaml": Internal error occurred: failed calling webhook "default.azuremanagedcontrolplanes.infrastructure.cluster.x-k8s.io": failed to call webhook: Post "https://capz-webhook-service.capz-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta1-azuremanagedcontrolplane?timeout=10s": x509: certificate is valid for azureserviceoperator-webhook-service.capz-system.svc, azureserviceoperator-webhook-service.capz-system.svc.cluster.local, not capz-webhook-service.capz-system.svc

/hold cancel

Just pushed a fix for this. capz-webhook-service's selector was only selecting based on the cluster.x-k8s.io/provider: "infrastructure-azure" label which included the ASO controller manager so some webhook requests for CAPZ resources were being directed to ASO instead of CAPZ.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2023
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Apr 28, 2023

/retest

@nojnhuh
Copy link
Contributor Author

nojnhuh commented May 1, 2023

@Jont828 @CecileRobertMichon This is ready for another look whenever you have time.

@jackfrancis
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 8, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7043312555ed10d21bb4a4edf06e9cecab235dd3

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2023
@k8s-ci-robot k8s-ci-robot merged commit 5fafd1a into kubernetes-sigs:main May 8, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.10 milestone May 8, 2023
@nojnhuh nojnhuh deleted the aso-install branch May 8, 2023 23:10
@nojnhuh nojnhuh mentioned this pull request Sep 6, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate ASO install with clusterctl init, tilt up
6 participants