Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds retry-on-conflict during updates #725

Merged
merged 3 commits into from
Apr 19, 2019

Conversation

chuckha
Copy link
Contributor

@chuckha chuckha commented Apr 18, 2019

Signed-off-by: Chuck Ha [email protected]

What this PR does / why we need it:
This fixes the errors we were seeing with Update conflicts.

The new logic will try to update; fetch a new version, set the most recent resource version and then try updating again.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #717

Special notes for your reviewer:

Release note:

NONE

@chuckha chuckha requested review from detiber and vincepri April 18, 2019 01:30
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 18, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chuckha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2019
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 18, 2019
m.Machine.ResourceVersion = newestMachine.ResourceVersion
return err
}
m.V(5).Info("latest machine", "machine", latest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific reason for V(5) v/s V(6) between line 137 and 150?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V5 is a bit more useful but not useful enough for our usual debug level of 4 unless you need lots of detail on one specific thing, v6 is even extra information.

s.Cluster.ResourceVersion = newestCluster.ResourceVersion
return err
}
s.V(5).Info("latest cluster status", "cluster-status", latest.Status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same V(5) v/s V(6) question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer above

if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
s.V(2).Info("updating cluster", "cluster-name", s.Cluster.Name, "cluster-resource-version", s.Cluster.ResourceVersion)
s.Cluster.Spec.ProviderSpec.Value = ext
s.V(6).Info("cluster status before update", "cluster-status", s.Cluster.Status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my info: how do I make these logs show up? is it the -v=# command line flag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the generated provider-components.yaml find the cluster-api-provider-aws controller and add a -v=# to the argument list

if err != nil {
m.Error(err, "failed to update machine")
m.Error(err, "failed to encode machine spec", "machine-name", m.Machine.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the machine name needed here with the new logging context?

if err != nil {
m.Error(err, "failed to store machine provider status")
m.Error(err, "failed to encode machine status", "machine-name", m.Machine.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the machine name needed here with the new logging context?

}

if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
m.V(2).Info("updating machine", "machine-name", m.Machine.Name, "machine-resource-version", m.Machine.ResourceVersion, "node-ref", m.Machine.Status.NodeRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the machine name needed here with the new logging context?

m.V(6).Info("machine status before update", "machine-status", m.Machine.Status)
latest, err := m.MachineClient.Update(m.Machine)
if err != nil {
m.Error(err, "error updating machine", "machine-name", m.Machine.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the machine name needed here with the new logging context?

m.Error(err, "error updating machine", "machine-name", m.Machine.Name)
newestMachine, err2 := m.MachineClient.Get(m.Machine.Name, metav1.GetOptions{})
if err2 != nil {
m.Error(err2, "failed to fetch latest Machine", "machine-name", m.Machine.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the machine name needed here with the new logging context?

}

if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
s.V(2).Info("updating cluster", "cluster-name", s.Cluster.Name, "cluster-resource-version", s.Cluster.ResourceVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cluster name needed here with the new logging context?

if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
s.V(2).Info("updating cluster", "cluster-name", s.Cluster.Name, "cluster-resource-version", s.Cluster.ResourceVersion)
s.Cluster.Spec.ProviderSpec.Value = ext
s.V(6).Info("cluster status before update", "cluster-status", s.Cluster.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct

s.V(6).Info("cluster status before update", "cluster-status", s.Cluster.Status)
latest, err := s.ClusterClient.Update(s.Cluster)
if err != nil {
s.Error(err, "error updating cluster", "cluster-name", s.Cluster.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cluster name needed here with the new logging context?

s.Error(err, "error updating cluster", "cluster-name", s.Cluster.Name)
newestCluster, err2 := s.ClusterClient.Get(s.Cluster.Name, metav1.GetOptions{})
if err2 != nil {
s.Error(err2, "failed to fetch latest cluster", "cluster-name", s.Cluster.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cluster name needed here with the new logging context?

}
s.V(2).Info("Successfully updated cluster", "cluster-name", s.Cluster.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the cluster name needed here with the new logging context?

if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
m.V(2).Info("updating machine", "machine-name", m.Machine.Name, "machine-resource-version", m.Machine.ResourceVersion, "node-ref", m.Machine.Status.NodeRef)
m.Machine.Spec.ProviderSpec.Value = ext
m.V(6).Info("machine status before update", "machine-status", m.Machine.Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log lines should always start with a capital letter

return err
}
m.V(5).Info("latest machine", "machine", latest)
// The machine may have status (nodeRef) that the latest doesn't yet have.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that when we set status all the fields will be set, some of them might rollback some changes / timestamps

pkg/cloud/aws/actuators/scope.go Show resolved Hide resolved
@chuckha chuckha force-pushed the retry-on-conflict branch from f8e0157 to a38bc46 Compare April 18, 2019 15:54
m.Error(err2, "failed to fetch latest Machine")
return err2
}
// Error if anything but the machine resource version changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment here seems a bit misleading, for a while I was trying to figure out how it was testing to see if anything other than the resourceVersion changed.

Copy link
Contributor Author

@chuckha chuckha Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about something like

// Update the resource version and retry the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or

// Update only the resource and try again. If something else in the resource has changed this will error.

// have, however some timestamps may be rolled back a bit with this copy.
m.Machine.Status.DeepCopyInto(&latest.Status)
latest.Status.ProviderStatus = status
_, err = m.MachineClient.UpdateStatus(latest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there also be a retry around UpdateStatus() here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update status doesn't need a retry. If the update status does fail then we'll run the Update again and get the new resource version and try the UpdateStatus again.

I think the biggest issue we could see is if there were multiple things updating this resource regularly, then we may have to consider breaking this Update & UpdateStatus from one operation into two with individual retry blocks.

s.Error(err2, "failed to fetch latest cluster")
return err2
}
// Error if anything but the cluster resource version changes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this comment seems a bit misleading.

s.V(5).Info("Latest cluster status", "cluster-status", latest.Status)
s.Cluster.Status.DeepCopyInto(&latest.Status)
latest.Status.ProviderStatus = status
_, err = s.ClusterClient.UpdateStatus(latest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a retry around updating the status as well?

Signed-off-by: Chuck Ha <[email protected]>
@detiber
Copy link
Member

detiber commented Apr 19, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 19, 2019
@k8s-ci-robot k8s-ci-robot merged commit 91f1a31 into kubernetes-sigs:master Apr 19, 2019
@k8s-ci-robot
Copy link
Contributor

@chuckha: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-cluster-api-provider-aws-bazel-integration 0e2907e link /test pull-cluster-api-provider-aws-bazel-integration

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

detiber pushed a commit to detiber/cluster-api-provider-aws that referenced this pull request May 2, 2019
* Adds retry-on-conflict during updates

Signed-off-by: Chuck Ha <[email protected]>

* adds note about status update caveat

Signed-off-by: Chuck Ha <[email protected]>

* clarify errors/comments

Signed-off-by: Chuck Ha <[email protected]>
k8s-ci-robot pushed a commit that referenced this pull request May 2, 2019
* Update the releasing docs (#689)

* Add error reason to output if fail to checkout an account from boskos (#698)

* Temporary workaround a data issue in boskos service (#699)

* Update checkout_account.py to not reuse connections (#700)

* Fix checkout_account.py (#702)

* Make hack/checkin_account.py executable (#703)

* Fix: all traffic ingress rule triggers fatal nil dereference (#697)

* fix: respect all traffic security group rules (and others)

For anything besides tcp, udp, icmp, and icmpv6 there is no applicable
notion of "port range." AWS omits FromPort and ToPort in its responses,
causing a fatal nil dereference when attempting to read any security
groups with e.g. an "all traffic" rule.

* fix: omit description when empty string

* fix: handle more security groups without crashing

This commit cleans up and clarifies a few of the less obvious components
of the previous work.

* fix: handle more security groups without crashing

Address linter failures.

* fix: handle more security groups without crashing

Usage needs to match declaration. Computers are sticklers about that
sort of thing.

* fix: handle more security groups without crashing

Add clarifying comment to serializer function.

* Fixes a bug and adds tests for kubeadm defaults (#707)

The pointers were not working as expected so the API is changing
to be more functional and leverage kubernetes' DeepCopy function.

* Update listed v1.14 AMIs to v1.14.1 (#708)

* Update listed v1.14 AMIs to v1.14.1

* Update README with list of published AMIs/Kubernetes versions

* GZIP user-data (#710)

Signed-off-by: Vince Prignano <[email protected]>

* Make sure Calico can talk IP-in-IP (#701)

* MAke sure Calico can talk IP-in-IP

* Add IP in IP protocol to the control plane security group

* Add IPv4 protocol definition and make sure it's handled properly.

* Make port ranges AWS complient and security groups more restrictive.

* Fix security groups

* Adds tests to kubeadm defaults (#709)

Attempt at documenting the assumptions made in the kubeadm
defaults code.

Signed-off-by: Chuck Ha <[email protected]>

* Logging (#713)

* Adds logr as dependency

Signed-off-by: Chuck Ha <[email protected]>

* Use logr in the cluster actuator

This only creates the logger. Does not yet swap out actual klog calls.

Signed-off-by: Chuck Ha <[email protected]>

* update bazel

Signed-off-by: Chuck Ha <[email protected]>

* update

Signed-off-by: Chuck Ha <[email protected]>

* Switch dep to use release-0.1 branch instead of version (#715)

* Adds logr as dependency (#714)

Adds context for logs and removes excessive logging

Signed-off-by: Chuck Ha <[email protected]>

* Ensure `make manifests` generates machines file for HA control plane too. (#720)

* Add HA machines template

* Introduce HA machines file in `make manifests` target

* Add clusterawsadm as make dependency to manifests make target. (#721)

Ensures manifests are generated from the current state of the source.
Assuming $GOPATH/bin is in the $PATH

* Update to Go 1.12 (#719)

Signed-off-by: Vince Prignano <[email protected]>

* Add ability to override Organization ID for image lookups (#723)

* Add ability to override Organization ID for image lookups

* Update pkg/cloud/aws/services/ec2/ami.go

Co-Authored-By: detiber <[email protected]>

* Add updated generated crd

* feat: support customizing root device size (#718)

* feat: support customizing root device size

* chore: re-generate CRDs

* fix: update formatting

* chore: add comment describing Service.sdkToInstance

* chore: make service.SDKToInstance public

* Rename BUILD -> BUILD.bazel for consistency (#724)

find . -type file -name BUILD -not -path "./vendor/*" | xargs -n1 -I{} -- git mv {} {}.bazel

Preferred build name changed in 3788fb1
Fixes #722

* Adds retry-on-conflict during updates (#725)

* Adds retry-on-conflict during updates

Signed-off-by: Chuck Ha <[email protected]>

* adds note about status update caveat

Signed-off-by: Chuck Ha <[email protected]>

* clarify errors/comments

Signed-off-by: Chuck Ha <[email protected]>

* Add the HA machines configuration to bazel (#733)

Signed-off-by: Chuck Ha <[email protected]>

* Ensure bazel is the correct version (#731)

Signed-off-by: Chuck Ha <[email protected]>

* Update OWNERS_ALIASES and SECURITY_CONTACTS (#712)

* Fix the prow jobs (#735)

Signed-off-by: Chuck Ha <[email protected]>

* Fix markdown formatting (#736)

* extract fmt from release tool (#738)

Signed-off-by: Chuck Ha <[email protected]>

* Use DEFAULT_REGION as the default and REGION as the supplied (#739)

Signed-off-by: Chuck Ha <[email protected]>

* e2e testing improvement (#743)

* Bump kind version
* Remove docker load in favor of kind load for e2e cluster

Signed-off-by: Chuck Ha <[email protected]>

* fix: Don't try to update root size when it's unset (#726)

* fix: Don't try to update root size when it's unset

This commit looks for empty RootDeviceSize in the spec and ignores it.
Otherwise, none of our control plane machines were updating with this
error:

```
E0418 23:07:48.250925       1 controller.go:214] Error updating machine "ns/controlplane-2": found attempt to change immutable state for machine "controlplane-2": ["Root volume size cannot be mutated from 8 to 0"]
```

* fix: updates without specifying a root volume size

Add unit test.

* fix: updates without specifying a root volume size

Fix gofmt.

* Scope nodeRef to workload cluster (#744)

Signed-off-by: Vince Prignano <[email protected]>

* Fix NPE on delete bastion host (#746)

Signed-off-by: Vince Prignano <[email protected]>

* Documentation for creating a new cluster on a different AWS account  (#728)

* Initial draft of documentation for Cluster creation using cross account role assumption

* Update roleassumption.md

Complete the document.

* cleanup the documentation for roleassumption

* Resolved the comments: role assumption documentation.

* Fix minor issues - roleassumption.md

* resolve more comments to roleassumption.md

* Resolve more comments - roleassumption.md

* include machines-ha.yaml.template in release artifacts (#741)

* Update AWS sdk, improve log in machine actuator delete (#747)

Signed-off-by: Vince Prignano <[email protected]>

* Fixes the infinite reconcile loop (#748)

* Uses patch for updating the cluster and machine specs
  - patch does not cause a re-reconcile in the capi controller
* Uses update for updating the cluster and machine status
  - update for status is ok since it does not update any of the metadata
    no re-reconcile is necessary for the capi controller

Signed-off-by: Chuck Ha <[email protected]>

* Update Gopkg.lock and cleanup Makefile (#751)

* Update cluster-api release-0.1 vendor (#750)

Signed-off-by: Vince Prignano <[email protected]>

* Reduce the number of re-reconciles (#752)

Signed-off-by: Chuck Ha <[email protected]>
richardchen-db pushed a commit to databricks/cluster-api-provider-aws-1 that referenced this pull request Jan 14, 2023
* Update the releasing docs (kubernetes-sigs#689)

* Add error reason to output if fail to checkout an account from boskos (kubernetes-sigs#698)

* Temporary workaround a data issue in boskos service (kubernetes-sigs#699)

* Update checkout_account.py to not reuse connections (kubernetes-sigs#700)

* Fix checkout_account.py (kubernetes-sigs#702)

* Make hack/checkin_account.py executable (kubernetes-sigs#703)

* Fix: all traffic ingress rule triggers fatal nil dereference (kubernetes-sigs#697)

* fix: respect all traffic security group rules (and others)

For anything besides tcp, udp, icmp, and icmpv6 there is no applicable
notion of "port range." AWS omits FromPort and ToPort in its responses,
causing a fatal nil dereference when attempting to read any security
groups with e.g. an "all traffic" rule.

* fix: omit description when empty string

* fix: handle more security groups without crashing

This commit cleans up and clarifies a few of the less obvious components
of the previous work.

* fix: handle more security groups without crashing

Address linter failures.

* fix: handle more security groups without crashing

Usage needs to match declaration. Computers are sticklers about that
sort of thing.

* fix: handle more security groups without crashing

Add clarifying comment to serializer function.

* Fixes a bug and adds tests for kubeadm defaults (kubernetes-sigs#707)

The pointers were not working as expected so the API is changing
to be more functional and leverage kubernetes' DeepCopy function.

* Update listed v1.14 AMIs to v1.14.1 (kubernetes-sigs#708)

* Update listed v1.14 AMIs to v1.14.1

* Update README with list of published AMIs/Kubernetes versions

* GZIP user-data (kubernetes-sigs#710)

Signed-off-by: Vince Prignano <[email protected]>

* Make sure Calico can talk IP-in-IP (kubernetes-sigs#701)

* MAke sure Calico can talk IP-in-IP

* Add IP in IP protocol to the control plane security group

* Add IPv4 protocol definition and make sure it's handled properly.

* Make port ranges AWS complient and security groups more restrictive.

* Fix security groups

* Adds tests to kubeadm defaults (kubernetes-sigs#709)

Attempt at documenting the assumptions made in the kubeadm
defaults code.

Signed-off-by: Chuck Ha <[email protected]>

* Logging (kubernetes-sigs#713)

* Adds logr as dependency

Signed-off-by: Chuck Ha <[email protected]>

* Use logr in the cluster actuator

This only creates the logger. Does not yet swap out actual klog calls.

Signed-off-by: Chuck Ha <[email protected]>

* update bazel

Signed-off-by: Chuck Ha <[email protected]>

* update

Signed-off-by: Chuck Ha <[email protected]>

* Switch dep to use release-0.1 branch instead of version (kubernetes-sigs#715)

* Adds logr as dependency (kubernetes-sigs#714)

Adds context for logs and removes excessive logging

Signed-off-by: Chuck Ha <[email protected]>

* Ensure `make manifests` generates machines file for HA control plane too. (kubernetes-sigs#720)

* Add HA machines template

* Introduce HA machines file in `make manifests` target

* Add clusterawsadm as make dependency to manifests make target. (kubernetes-sigs#721)

Ensures manifests are generated from the current state of the source.
Assuming $GOPATH/bin is in the $PATH

* Update to Go 1.12 (kubernetes-sigs#719)

Signed-off-by: Vince Prignano <[email protected]>

* Add ability to override Organization ID for image lookups (kubernetes-sigs#723)

* Add ability to override Organization ID for image lookups

* Update pkg/cloud/aws/services/ec2/ami.go

Co-Authored-By: detiber <[email protected]>

* Add updated generated crd

* feat: support customizing root device size (kubernetes-sigs#718)

* feat: support customizing root device size

* chore: re-generate CRDs

* fix: update formatting

* chore: add comment describing Service.sdkToInstance

* chore: make service.SDKToInstance public

* Rename BUILD -> BUILD.bazel for consistency (kubernetes-sigs#724)

find . -type file -name BUILD -not -path "./vendor/*" | xargs -n1 -I{} -- git mv {} {}.bazel

Preferred build name changed in 3788fb1
Fixes kubernetes-sigs#722

* Adds retry-on-conflict during updates (kubernetes-sigs#725)

* Adds retry-on-conflict during updates

Signed-off-by: Chuck Ha <[email protected]>

* adds note about status update caveat

Signed-off-by: Chuck Ha <[email protected]>

* clarify errors/comments

Signed-off-by: Chuck Ha <[email protected]>

* Add the HA machines configuration to bazel (kubernetes-sigs#733)

Signed-off-by: Chuck Ha <[email protected]>

* Ensure bazel is the correct version (kubernetes-sigs#731)

Signed-off-by: Chuck Ha <[email protected]>

* Update OWNERS_ALIASES and SECURITY_CONTACTS (kubernetes-sigs#712)

* Fix the prow jobs (kubernetes-sigs#735)

Signed-off-by: Chuck Ha <[email protected]>

* Fix markdown formatting (kubernetes-sigs#736)

* extract fmt from release tool (kubernetes-sigs#738)

Signed-off-by: Chuck Ha <[email protected]>

* Use DEFAULT_REGION as the default and REGION as the supplied (kubernetes-sigs#739)

Signed-off-by: Chuck Ha <[email protected]>

* e2e testing improvement (kubernetes-sigs#743)

* Bump kind version
* Remove docker load in favor of kind load for e2e cluster

Signed-off-by: Chuck Ha <[email protected]>

* fix: Don't try to update root size when it's unset (kubernetes-sigs#726)

* fix: Don't try to update root size when it's unset

This commit looks for empty RootDeviceSize in the spec and ignores it.
Otherwise, none of our control plane machines were updating with this
error:

```
E0418 23:07:48.250925       1 controller.go:214] Error updating machine "ns/controlplane-2": found attempt to change immutable state for machine "controlplane-2": ["Root volume size cannot be mutated from 8 to 0"]
```

* fix: updates without specifying a root volume size

Add unit test.

* fix: updates without specifying a root volume size

Fix gofmt.

* Scope nodeRef to workload cluster (kubernetes-sigs#744)

Signed-off-by: Vince Prignano <[email protected]>

* Fix NPE on delete bastion host (kubernetes-sigs#746)

Signed-off-by: Vince Prignano <[email protected]>

* Documentation for creating a new cluster on a different AWS account  (kubernetes-sigs#728)

* Initial draft of documentation for Cluster creation using cross account role assumption

* Update roleassumption.md

Complete the document.

* cleanup the documentation for roleassumption

* Resolved the comments: role assumption documentation.

* Fix minor issues - roleassumption.md

* resolve more comments to roleassumption.md

* Resolve more comments - roleassumption.md

* include machines-ha.yaml.template in release artifacts (kubernetes-sigs#741)

* Update AWS sdk, improve log in machine actuator delete (kubernetes-sigs#747)

Signed-off-by: Vince Prignano <[email protected]>

* Fixes the infinite reconcile loop (kubernetes-sigs#748)

* Uses patch for updating the cluster and machine specs
  - patch does not cause a re-reconcile in the capi controller
* Uses update for updating the cluster and machine status
  - update for status is ok since it does not update any of the metadata
    no re-reconcile is necessary for the capi controller

Signed-off-by: Chuck Ha <[email protected]>

* Update Gopkg.lock and cleanup Makefile (kubernetes-sigs#751)

* Update cluster-api release-0.1 vendor (kubernetes-sigs#750)

Signed-off-by: Vince Prignano <[email protected]>

* Reduce the number of re-reconciles (kubernetes-sigs#752)

Signed-off-by: Chuck Ha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failing to retry reconcile properly
5 participants