Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource group should get updated if tags are added #1696

Closed
CecileRobertMichon opened this issue Sep 17, 2021 · 34 comments · Fixed by #1721
Closed

Resource group should get updated if tags are added #1696

CecileRobertMichon opened this issue Sep 17, 2021 · 34 comments · Fixed by #1721
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@CecileRobertMichon
Copy link
Contributor

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]

Currently, we only create the resource group if it doesn't already exist:

// rg already exists, nothing to update.
. However, we should consider adding any additional tags that were mutated and updating it, if and only if the group is managed by capz.

  • if the group already exists and is BYO, we should NOT add the owner tag to it (or any other tags)
  • if the group already exists and is managed, we should update tags if and only if some are missing
  • if the group already exists and no tags are missing (desired tags exist) we should do nothing
  • if the group doesn't exist, we should create it

Someone working on this issue might want to take a look at how network security groups look at existing rules and only update them if some are missing, and use an etag to make sure there is no race with others also updating the NSG from other controllers:

// We append the existing NSG etag to the header to ensure we only apply the updates if the NSG has not been modified.

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 17, 2021
@karuppiah7890
Copy link
Contributor

I'm assuming no one is working on this as of now
/assign

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 20, 2021
@CecileRobertMichon
Copy link
Contributor Author

@karuppiah7890 go for it, let me know if you have any questions

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 21, 2021

@CecileRobertMichon Are there tests for the client.go files? For example, are there tests for azure/services/groups/client.go file's code in the case of this issue. I'm looking at where to add a test - the file, and what level (which function / method) and what kind of test (unit test etc) to add. I noticed that CreateOrUpdateAsync in azure/services/groups/client.go is what creates or updates resource groups -

func (ac *azureClient) CreateOrUpdateAsync(ctx context.Context, spec azure.ResourceSpecGetter) (azureautorest.FutureAPI, error) {

and based on the comment from here - #1667 (comment) and this issue's comments, looks like the code needs changes over here -

func (s *GroupSpec) Parameters(existing interface{}) (interface{}, error) {
if existing != nil {
// rg already exists, nothing to update.
return nil, nil
}

I was thinking I could check some existing tests and write a test accordingly by following it and keep things consistent, but couldn't find any. I was thinking of adding a unit test with mocks to Azure Client or integration test, as E2E test seemed too heavy for this situation, or it could be added as an additional check as part of some update test in E2E tests. How are the client.go files currently tested?

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 21, 2021

I could add a unit test for now for the updation of additional tags if there are any in

func (s *GroupSpec) Parameters(existing interface{}) (interface{}, error) {
if existing != nil {
// rg already exists, nothing to update.
return nil, nil
}
, the test could reside in spec_test.go. Parameters seems like the most smallest unit that can be tested compared to testing the other levels / functions / methods. I'll also implement the feature to fix the failing test that I write. And then I'll think about how to test the etag part and add the test and then implement that too. I'll then checkout the CAPZ managed vs CAPZ non-managed part next

@karuppiah7890
Copy link
Contributor

Also, I noticed the test in azure/services/groups/groups_test.go, but that mocks away CreateOrUpdateAsync so I'm not writing a test in that file

@karuppiah7890
Copy link
Contributor

I have a question regarding updates - should the update only allow updation of additional tags of resource groups? Atleast for this issue. Should we ensure we don't update the other data? I see that we use autorest.AsPut() call in the Azure Go SDK in CreateOrUpdate call which in turn calls CreateOrUpdatePreparer which uses PUT and use PUT method, which I'm assuming overrides the data rather than patching / overriding with a PATCH. I can also see the Azure Go SDK mention that resource group location cannot be changed once created in https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2019-05-01/resources?utm_source=gopls#Group

// Location - The location of the resource group. It cannot be changed after the resource group has been created. It must be one of the supported Azure locations.

So we can only change tags and we need to keep location as is from the existing resource group

return resources.Group{
Location: to.StringPtr(s.Location),
Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
ClusterName: s.ClusterName,
Lifecycle: infrav1.ResourceLifecycleOwned,
Name: to.StringPtr(s.Name),
Role: to.StringPtr(infrav1.CommonRole),
Additional: s.AdditionalTags,
})),
}, nil

If we are not changing the location, do we need to pass it because of the PUT call?

@karuppiah7890
Copy link
Contributor

Okay, looks like location field is actually required in the PUT call according to https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/create-or-update#request-body

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 21, 2021

For CAPZ managed vs CAPZ non-managed, we figure this out through the tags, right? So, we will check if the tags here

Tags: converters.TagsToMap(infrav1.Build(infrav1.BuildParams{
ClusterName: s.ClusterName,
Lifecycle: infrav1.ResourceLifecycleOwned,
Name: to.StringPtr(s.Name),
Role: to.StringPtr(infrav1.CommonRole),
Additional: s.AdditionalTags,
})),
excluding the additional tags are present, then it's CAPZ managed, right? Also we need to build it with infrav1.Build to get the exact tags with prefixes etc
// Build builds tags including the cluster tag and returns them in map form.
func Build(params BuildParams) Tags {
tags := make(Tags)
for k, v := range params.Additional {
tags[k] = v
}
tags[ClusterTagKey(params.ClusterName)] = string(params.Lifecycle)
if params.Role != nil {
tags[NameAzureClusterAPIRole] = *params.Role
}
if params.Name != nil {
tags["Name"] = *params.Name
}
return tags
}

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 21, 2021

I guess for checking if the group is managed or not we could just use

// IsGroupManaged returns true if the resource group has an owned tag with the cluster name as value,
// meaning that the resource group's lifecycle is managed.
func (s *Service) IsGroupManaged(ctx context.Context) (bool, error) {
ctx, span := tele.Tracer().Start(ctx, "groups.Service.IsGroupManaged")
defer span.End()
groupSpec := s.Scope.GroupSpec()
group, err := s.client.Get(ctx, groupSpec.ResourceName())
if err != nil {
return false, err
}
tags := converters.MapToTags(group.Tags)
return tags.HasOwned(s.Scope.ClusterName()), nil
}
but this is in azure/services/groups/groups.go. Maybe we could use a similar code in azure/services/groups/client.go to avoid calling spec.Parameters here
} else {
// rg already exists
params, err = spec.Parameters(existingRG)
if err != nil {
return nil, err
}
}
if it's not managed by CAPZ. I think that's the best place to put the check, or the next best place (or may not be the best place) to put the check would be inside spec.Parameters code itself in
// Parameters returns the parameters for the group.
func (s *GroupSpec) Parameters(existing interface{}) (interface{}, error) {
if existing != nil {
// rg already exists, nothing to update.
return nil, nil
}
but that might be a lot of code in spec.Parameters

Later we will have to see if we want to remove duplicate code which checks if a group is managed or not and where to place the common code and call is accordingly

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 21, 2021

Question: Let's say a cluster is created and managed using CAPZ, and an additional tag is added to the resource group from outside (by the user or something) without using CAPZ, now, when we manage the resource group, do we manage the extra tag? Do we remove it and only ensure that the CAPZ's AzureCluster additionalTags are present? Extrapolating the question to a generic question - do we only handle additions of tags based on additional tags in the AzureCluster? Or do we handle deletions too? For example, if a user adds a tag to additionalTags of AzureCluster and then remove it from the additionalTags field later. I guess this question is more like - is the additonalTags field fully managed if the management tag with owned value is present? In which case is it fully managed in a declarative manner? Which would include - additions, and all modifications - deletions, updations

karuppiah7890 added a commit to karuppiah7890/cluster-api-provider-azure that referenced this issue Sep 21, 2021
@karuppiah7890
Copy link
Contributor

About checking if the resource group is managed or not, looking at the code it looks like we want to move lot of the logic into Parameters method - since as of now even for existing resource group, a nil value is returned from Parameters and handled at the client level in client.go, instead of simply handling existing resource groups as a no-op. So I'm going to be putting all the logic in Parameters method using utility functions in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/47caf96183f7d3bd1e4dda5944a06dae10318bc7/azure/services/groups/spec.go and put a function for IsGroupManaged with existing group as input to do this logic -

tags := converters.MapToTags(group.Tags)
return tags.HasOwned(s.Scope.ClusterName()), nil

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 21, 2021
@karuppiah7890
Copy link
Contributor

cc @CecileRobertMichon ^

@karuppiah7890
Copy link
Contributor

Following the question on managed resource groups and declarative behavior, let's say a resource group only has the tag sigs.k8s.io_cluster-api-provider-azure_cluster_test-cluster: owned and no Name or role tags, maybe the user removed it let's say, then the next time additionalTags are updated from empty map to some tags, in this situation do we add just the additionalTags or do we also add tags like Name with value and role tag?

@karuppiah7890
Copy link
Contributor

I created a draft / WIP PR so that one can see the changes live as I'm working on it. #1721 cc @CecileRobertMichon

@karuppiah7890
Copy link
Contributor

About the checking of ETag , looks like the Azure API does not provide ETag for resource groups -

https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/get - GET
https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/check-existence - HEAD

But yeah, other APIs like Network Security Group have ETag in the response

https://docs.microsoft.com/en-us/rest/api/virtualnetwork/network-security-groups/get#networksecuritygroup

This is also reflected in the Azure Go SDK where resources.Group returned by the Azure API client library doesn't have any ETag field in it, unlike Network Security Group. References -

https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2019-05-01/resources?utm_source=gopls#Group

https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/services/network/mgmt/2021-02-01/network?utm_source=gopls#SecurityGroup

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 22, 2021

Any thoughts on how we can go about avoiding race conditions for resource group updates?

@devigned
Copy link
Contributor

One possible option is to use https://docs.microsoft.com/en-us/rest/api/resources/tags/update-at-scope where the scope is the resource group and the PATCH operation would only contain the tags we would like to mutate.

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 22, 2021

Interesting! Thanks for sharing @devigned ! I'll check it out in detail and also check the corresponding Azure Go SDK API for using it. One tricky thing is, this would mean that we would detect resource group tag updates separately and use separate API calls for it and deviate from the path of using the Create Or Update API of resource group which is currently being used as part of the reconciliation. Are we okay with that?

@devigned
Copy link
Contributor

One tricky thing is, this would mean that we would detect resource group updates separately and use separate API calls for it and deviate from the path of using the Create Or Update API of resource group which is currently being used as part of the reconciliation. Are we okay with that?

It will introduce more complexity, but I don't see another way around the possibility of wiping out tags applied between GET and PUT. I think the trade off of slightly more complexity in update is better than the harm of possible data loss.

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 22, 2021

Complexity - cool 👍 And I'm still trying to grok the problem and the solution we are aiming for. This is what I understand from the race condition problem -

If entity A does a GET on resource group and does some processing (checking diff etc and deciding to update or not) and then does a PUT finding out that it needs to update tags

In the meanwhile exactly after entity A does GET but before it does PUT, another entity B updates the resource group tags. So entity A is working with stale data (a dirty read problem) while it's doing a PUT

I have a few questions here - Is the above understanding a correct explanation of the problem? If yes, what are the possible actors that represent entity A and entity B? Two different instances of CAPZ controller? or something like CAPZ controller and an external entity, say a user using Azure Web UI or Azure CLI updating the resource group tags

possibility of wiping out tags applied between GET and PUT.

I'm assuming you want to avoid this, correct? Based on I think the trade off of slightly more complexity in update is better than the harm of possible data loss. that is.

I also want to ask what's the strategy of handling tags - like, the declarative behavior of resource group tags - what kind of behavior are we looking for? I had asked a similar question here - #1696 (comment) , #1696 (comment)

@CecileRobertMichon
Copy link
Contributor Author

@devigned
Copy link
Contributor

devigned commented Sep 22, 2021

If A and B GET at the same time they see the same RG. If A then updates the RG and we now have RG' stored, then B may be working with stale data.

Let's say that B only cares about making sure that RG exists, and that tags {Key1: Value1, Key2: Value2} exist on RG, then rather than doing a PUT on RG, we do a PATCH with merge on the tags. Even if B has stale data, the PATCH of the tags will only mutate the data that B owns.

When updating a resource group, the operator should not call PUT on the resource group. I don't think there is anything else we can change. If the resource group exists, CAPZ should only call PATCH on the tags for the resource group, which will scope the data mutation to only the data we own, thus eliminating the possibility of overwriting tags.

In the end, the stale read will not affect the end result we are seeking.

@karuppiah7890
Copy link
Contributor

yeah, I was gonna ask about what all data can we change on resource groups. Clearly location is constant, name is the identifier, we are left out with tags. So create uses create or update API and tags update can use the tags API

And about updating tags - are we saying that we want to do a PATCH and only manage the tags that CAPZ owns, in case the resource group is CAPZ managed and not meddle with any other tags that the external entities add or update that are not part of AzureCluster additional tags? Also, is there any situation where we delete tags? Since it's a PATCH I'm guessing not

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 22, 2021

Sorry about the list of questions. I'm basically trying to understand this - the end result we are seeking. from #1696 (comment)

@karuppiah7890
Copy link
Contributor

karuppiah7890 commented Sep 22, 2021

This is what I understand - if different entities are managing the resource group's tags, then each of them would have their own set of tags to manage which most probably won't collide. For this reason, just using PUT won't work as it would completely replace the existing set of tags when each entity tries to do an update using PUT, so instead we try to do a PATCH for doing the updates

I'm only curious about what these entities are if not CAPZ controllers. Also, if the other entities change the CAPZ managed tags, then CAPZ would change it back to the value that it has as part of reconciliation, right?

@devigned
Copy link
Contributor

I'm only curious about what these entities are if not CAPZ controllers.

Could be just about anything. For example, tags are used as a facet in billing which allows folks to set tags and break up billing for a given tag. These tags are often managed through outside systems or manually. There's lots of uses outside of what we use.

Also, if the other entities change the CAPZ managed tags, then CAPZ would change it back to the value that it has as part of reconciliation, right?

Yes. This is the intended behavior.

@karuppiah7890
Copy link
Contributor

Cool ! 👍

@CecileRobertMichon
Copy link
Contributor Author

If you do go with the update-at-scope tag service, could you please add a comment in

// rg already exists, nothing to update.
that specifies that tags are updated separately?

@karuppiah7890
Copy link
Contributor

What about deletion of tags? Do we handle that too? I see that the existing code here seems to handle it

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/c878bb9/azure/services/tags/tags.go#L105-L106

and I guess we expect that to be the case for resource group too?

@karuppiah7890
Copy link
Contributor

If you do go with the update-at-scope tag service, could you please add a comment in

// rg already exists, nothing to update.

that specifies that tags are updated separately?

Sure @CecileRobertMichon ! I'll try to follow pattern of the code that you referred in here #1696 (comment) - and use tag service from https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/tags/tags.go

@CecileRobertMichon
Copy link
Contributor Author

What about deletion of tags? Do we handle that too? I see that the existing code here seems to handle it

yes, if the tags are managed, I think we want the tag list to == our default tags + additional tags from spec

the only difference here will be to only do this if the RG is managed (for VMs there is no concept of managed/unmanaged so we update tags no matter what), but you can use tags to determine this (the owned tag specifically).

@karuppiah7890
Copy link
Contributor

Okay! 👍

@karuppiah7890
Copy link
Contributor

@CecileRobertMichon @devigned Since we want to manage the resource group's tag with updates and deletes too, I'm gonna be using the existing code in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/services/tags/tags.go which uses Tags Create or Update At Scope API rather than using Tags Update At Scope API along with Merge operation that was discussed in here - #1696 (comment) .

if _, err := s.client.CreateOrUpdateAtScope(ctx, tagsSpec.Scope, resources.TagsResource{Properties: &resources.Tags{Tags: tags}}); err != nil {

Tags Create or Update At Scope API helps with create, update and delete of tags and it manages only the tags it owns by using the concept of storing state in the last-applied annotation

annotation, err := s.Scope.AnnotationJSON(tagsSpec.Annotation)

if err = s.Scope.UpdateAnnotationJSON(tagsSpec.Annotation, newAnnotation); err != nil {

For machines, the annotation name is this -

Annotation: infrav1.VMTagsLastAppliedAnnotation,

// VMTagsLastAppliedAnnotation is the key for the machine object annotation
// which tracks the AdditionalTags in the Machine Provider Config.
// See https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
// for annotation formatting rules.
VMTagsLastAppliedAnnotation = "sigs.k8s.io/cluster-api-provider-azure-last-applied-tags-vm"

I hope it's okay to go ahead and use the existing tags service code and use an annotation in the Azure Cluster K8s custom resource to track the resource group's tag's last applied state using an annotation name like sigs.k8s.io/cluster-api-provider-azure-last-applied-tags-resource-group or sigs.k8s.io/cluster-api-provider-azure-last-applied-tags-rg

@karuppiah7890
Copy link
Contributor

I was also wondering how to do the testing for this. I did some manual testing, but for automated test, I wasn't sure where to put it. Currently I noticed I might have to do too much mocking to get even existing tests to work because of the way I have implemented. Maybe could someone take a look at the code in #1721 and let me know how the code looks? The placement of logic etc and then I can also start looking at how to make the code easily testable. Currently it seems a bit hard to test the part of checking if resource group tags are updated as it requires more mocks (mocking of groups scope, mocking of azure client etc)

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 23, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 28, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 29, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 29, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Sep 29, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Oct 4, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Oct 5, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Oct 15, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Oct 15, 2021
Signed-off-by: Karuppiah Natarajan <[email protected]>
Signed-off-by: Karuppiah Natarajan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants