Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement Proposal: Break out cloud resource reconciliation #416

Closed
devigned opened this issue Feb 26, 2020 · 22 comments
Closed

Enhancement Proposal: Break out cloud resource reconciliation #416

devigned opened this issue Feb 26, 2020 · 22 comments
Assignees
Labels
kind/proposal Issues or PRs related to proposals. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/backlog Higher priority than priority/awaiting-more-evidence. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Milestone

Comments

@devigned
Copy link
Contributor

⚠️ Cluster API Azure maintainers can ask to turn an issue-proposal into a CAEP when necessary. This is to be expected for large changes that impact multiple components, breaking changes, or new large features.

Goals

  1. Separate concerns for cloud resource reconciliation from transformation of Cluster API custom resource specs to cloud resource specs
  2. Create parallel path for reconciliation in CAPZ which uses an Azure cloud resource controller / reconciler with Azure CRDs for the cloud resources required for Cluster-API resources, rather than using direct calls to Azure in CAPZ.

Non-Goals/Future Work

  1. Future work: possibly, choose to only rely on Azure resource CRDs for reconciliation rather than parallel implementations.

User Story

As a CAPZ developer I would like to only need to interact with the API server and Custom Resource Definitions of Azure types rather than having to deal with the impedance of interacting with the Azure APIs. This would enable me to focus more on the core value of CAPZ, transforming Cluster-API specs to Azure resource specs.

Detailed Description

CAPZ currently has 2 major concerns.

  1. Transforming Cluster-API types to Azure types
  2. Imperatively interacting the Azure APIs to manipulate / create / delete the Azure resources

I think it would help to speed development in CAPZ if the two concerns were broken out into separate controller / reconcilers.

  1. CAPZ: responsible for translating from Cluster-API CRDs to Azure CRDs
  2. YTBNC (yet to be named controller): responsible for defining Azure CRDs and reconciling them with Azure

With this enhancement CAPZ could be verified to produce the correct specifications for Azure resources given an input without needing to have a dependency on the Azure API. The vast amount of verification could be done via API server and determining if a given input, Cluster API model, gets translated into the proper Azure resource specifications. At the same time, the concern on reconciling Azure resources could be focused and tested within its own component.

YTBNC (yet to be named controller)

Should have the following characteristics:

  • Define versioned CRDs which describe Azure resources across Azure API versions
  • Be able to reconcile Azure resources and provide status changes / conditions
  • Resources should be able to be applied in any order and converge -- goal seeking behavior
    • All of the ordering logic is currently built into CAPZ and is kind of brittle

Example of a ResourceGroup

Currently

      group := resources.Group{
		Location: to.StringPtr(s.Scope.Location()),
		Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
			ClusterName: s.Scope.Name(),
			Lifecycle:   infrav1.ResourceLifecycleOwned,
			Name:        to.StringPtr(s.Scope.ResourceGroup()),
			Role:        to.StringPtr(infrav1.CommonRoleTagValue),
			Additional:  s.Scope.AdditionalTags(),
		})),
        }

       // Wait on Azure to be done :: **synchronous**
       _, err := s.Client.CreateOrUpdate(ctx, s.Scope.ResourceGroup(), group)

Azure Custom Resource

	rg = resources.ResourceGroup{
		ObjectMeta: v1.ObjectMeta{
			Namespace: nn.Namespace,
			Name: nn.Name,
		},
		Spec: resources.ResourceGroupSpec{
			Location: s.Scope.Location(),
			Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
				ClusterName: s.Scope.Name(),
				Lifecycle:   infrav1.ResourceLifecycleOwned,
				Name:        to.StringPtr(s.Scope.ResourceGroup()),
				Role:        to.StringPtr(infrav1.CommonRoleTagValue),
				Additional:  s.Scope.AdditionalTags(),
			})),
		},
	}
       
        // PUT to API Server to create the resource
        // Just proceed and check back for condition
        err := s.Scope.Client.Create(ctx, &rg)

Contract changes [optional]
None.

Data model changes [optional]
The CAPZ CRDs may want to keep a reference to the underlying Azure custom resources which they own.

/kind proposal

@k8s-ci-robot
Copy link
Contributor

@devigned: The label(s) kind/proposal cannot be applied, because the repository doesn't have them

In response to this:

⚠️ Cluster API Azure maintainers can ask to turn an issue-proposal into a CAEP when necessary. This is to be expected for large changes that impact multiple components, breaking changes, or new large features.

Goals

  1. Separate concerns for cloud resource reconciliation from transformation of Cluster API custom resource specs to cloud resource specs
  2. Create parallel path for reconciliation in CAPZ which uses an Azure cloud resource controller / reconciler with Azure CRDs for the cloud resources required for Cluster-API resources, rather than using direct calls to Azure in CAPZ.

Non-Goals/Future Work

  1. Future work: possibly, choose to only rely on Azure resource CRDs for reconciliation rather than parallel implementations.

User Story

As a CAPZ developer I would like to only need to interact with the API server and Custom Resource Definitions of Azure types rather than having to deal with the impedance of interacting with the Azure APIs. This would enable me to focus more on the core value of CAPZ, transforming Cluster-API specs to Azure resource specs.

Detailed Description

CAPZ currently has 2 major concerns.

  1. Transforming Cluster-API types to Azure types
  2. Imperatively interacting the Azure APIs to manipulate / create / delete the Azure resources

I think it would help to speed development in CAPZ if the two concerns were broken out into separate controller / reconcilers.

  1. CAPZ: responsible for translating from Cluster-API CRDs to Azure CRDs
  2. YTBNC (yet to be named controller): responsible for defining Azure CRDs and reconciling them with Azure

With this enhancement CAPZ could be verified to produce the correct specifications for Azure resources given an input without needing to have a dependency on the Azure API. The vast amount of verification could be done via API server and determining if a given input, Cluster API model, gets translated into the proper Azure resource specifications. At the same time, the concern on reconciling Azure resources could be focused and tested within its own component.

YTBNC (yet to be named controller)

Should have the following characteristics:

  • Define versioned CRDs which describe Azure resources across Azure API versions
  • Be able to reconcile Azure resources and provide status changes / conditions
  • Resources should be able to be applied in any order and converge -- goal seeking behavior
  • All of the ordering logic is currently built into CAPZ and is kind of brittle

Example of a ResourceGroup

Currently

     group := resources.Group{
  	Location: to.StringPtr(s.Scope.Location()),
  	Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
  		ClusterName: s.Scope.Name(),
  		Lifecycle:   infrav1.ResourceLifecycleOwned,
  		Name:        to.StringPtr(s.Scope.ResourceGroup()),
  		Role:        to.StringPtr(infrav1.CommonRoleTagValue),
  		Additional:  s.Scope.AdditionalTags(),
  	})),
       }

      // Wait on Azure to be done :: **synchronous**
      _, err := s.Client.CreateOrUpdate(ctx, s.Scope.ResourceGroup(), group)

Azure Custom Resource

  rg = resources.ResourceGroup{
  	ObjectMeta: v1.ObjectMeta{
  		Namespace: nn.Namespace,
  		Name: nn.Name,
  	},
  	Spec: resources.ResourceGroupSpec{
  		Location: s.Scope.Location(),
  		Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
  			ClusterName: s.Scope.Name(),
  			Lifecycle:   infrav1.ResourceLifecycleOwned,
  			Name:        to.StringPtr(s.Scope.ResourceGroup()),
  			Role:        to.StringPtr(infrav1.CommonRoleTagValue),
  			Additional:  s.Scope.AdditionalTags(),
  		})),
  	},
  }
      
       // PUT to API Server to create the resource
       // Just proceed and check back for condition
       err := s.Scope.Client.Create(ctx, &rg)

Contract changes [optional]
None.

Data model changes [optional]
The CAPZ CRDs may want to keep a reference to the underlying Azure custom resources which they own.

/kind proposal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@detiber
Copy link
Member

detiber commented Feb 26, 2020

You'll need to PR github.com/kubernetes/test-infra to add the kind/proposal label for this repo, similar to the config we have in place for cluster-api here: https://github.com/kubernetes/test-infra/blob/2d574ef6978af8e03d0ac08503ad6af03d1830bc/label_sync/labels.yaml#L1282-L1289

@detiber
Copy link
Member

detiber commented Feb 26, 2020

It might be valuable for this YTBNC to be a separate project similar to: https://github.com/GoogleCloudPlatform/k8s-config-connector

@devigned
Copy link
Contributor Author

@detiber, precisely what we have been thinking about.

One worry is governance of such a project. Having CAPZ take a core / critical dependency on a project outside of K8s doesn’t feel right. As a prototype / optional thing, I think it would be ok, but not as a requirement. What do you think? Am I being too cautious?

@detiber
Copy link
Member

detiber commented Feb 26, 2020

One worry is governance of such a project. Having CAPZ take a core / critical dependency on a project outside of K8s doesn’t feel right. As a prototype / optional thing, I think it would be ok, but not as a requirement. What do you think? Am I being too cautious?

I think it largely depends... I don't think the dependency is really much greater than the one we already have on the various cloud provider SDKs.

That said, I do have concerns with Google's k8s-config-connector in particular:

  • Not open source and there are no published distribution guidelines/guarantees that would make inclusion in a cluster-api release or downstream user/product consumption safe/allowable
  • No way to affect potential bugs found, other than filing opaque issues

Of course it would be great if the project was underneath the k8s or cncf umbrella to help afford some assurances that we could resolve the above issues using the governance mechanisms in place, but I don't think that is a strict requirement to solving the problem.

@CecileRobertMichon
Copy link
Contributor

I don't think the dependency is really much greater than the one we already have on the various cloud provider SDKs.

I agree with that, and +1 on having that component be open source be a requirement for dependency.

cc @justaugustus @awesomenix @nader-ziada @juan-lee @rbitia

@devigned
Copy link
Contributor Author

/kind proposal

@k8s-ci-robot k8s-ci-robot added the kind/proposal Issues or PRs related to proposals. label Feb 26, 2020
@richardcase
Copy link
Member

@devigned - i'd love to see an open source equivalent to google's config connector but for Azure. This is something that people would use outside of capz, especially with gitops. Most of our customers ask for a way to treat their cloud infra in a declarative way like their k8s resources.

AWS had somthing similar to config connector: https://github.com/amazon-archives/aws-service-operator that i believe is being ressurected: https://github.com/aws/aws-service-operator-k8sg

@devigned
Copy link
Contributor Author

We are actively looking into this. Will update with more information as organizationally possible.

@devigned
Copy link
Contributor Author

/assign

@nprokopic
Copy link
Contributor

One worry is governance of such a project. Having CAPZ take a core / critical dependency on a project outside of K8s doesn’t feel right. As a prototype / optional thing, I think it would be ok, but not as a requirement. What do you think? Am I being too cautious?

I would agree that it would be preferable that a project like this stays inside of k8s. One thing from the top of my head is that it could be easier to participate and contribute to a project where you can join k8s/CAPI/CAPZ Slack and community meetings.

@nprokopic
Copy link
Contributor

Hi @devigned, if I understood correctly, you mentioned at the last CAPZ meeting that there will be some news about this. Is it me failing to find it, or there is nothing out there yet?

@devigned
Copy link
Contributor Author

Hi @nprokopic, I was hoping to drop a link on Friday, but was slightly delayed. The prototype we are working on is over here: https://github.com/Azure/k8s-infra.

Warnings... the project is nascent. We are experimenting with some different ideas, breaking some stuff and might end up rolling these ideas into another project.

@nprokopic
Copy link
Contributor

Awesome, thank you! It looks quite interesting, I definitely like the approach.

@nader-ziada nader-ziada added this to the v0.4.9 milestone Sep 3, 2020
@CecileRobertMichon CecileRobertMichon modified the milestones: v0.4.9, v0.4.10 Oct 1, 2020
@CecileRobertMichon CecileRobertMichon modified the milestones: v0.4.10, next Nov 12, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021
@CecileRobertMichon
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 11, 2021
@nader-ziada
Copy link
Contributor

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 12, 2021
@dtzar
Copy link
Contributor

dtzar commented Nov 30, 2022

Update on this issue:
YTBNC (yet to be named controller) = https://github.com/Azure/azure-service-operator/tree/main/v2

@dtzar
Copy link
Contributor

dtzar commented Jan 30, 2023

Proposal for using this approach with CAPZ:
#3113

@CecileRobertMichon
Copy link
Contributor

/lifecycle active
/unassign @devigned
/assign @nojnhuh

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 31, 2023
@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 31, 2023
@dtzar dtzar added priority/backlog Higher priority than priority/awaiting-more-evidence. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 5, 2023
@dtzar
Copy link
Contributor

dtzar commented May 6, 2023

This is moving forward; the proposal is merged 🙌, so closing this out. The larger epic for implementation work is here #3402

@dtzar dtzar closed this as completed May 6, 2023
@dtzar dtzar modified the milestones: next, v1.10 May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/proposal Issues or PRs related to proposals. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/backlog Higher priority than priority/awaiting-more-evidence. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
Development

No branches or pull requests