Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

Cluster reconciliation should reconcile exited ELB #53

Merged

Conversation

ashish-amarnath
Copy link
Contributor

  • get only running ELB containers
  • remove exited ELB containers with name conflict

Signed-off-by: Ashish Amarnath [email protected]

What this PR does / why we need it:
Cluster reconciliation should reconcile deleted ELBs

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #52

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot requested review from detiber and vincepri June 30, 2019 06:53
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 30, 2019
@ashish-amarnath ashish-amarnath force-pushed the reconcile-deleted-elb branch from 9fb70dd to 486939c Compare June 30, 2019 06:59
Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashish-amarnath What do you think about cleaning up the load balancer on cluster delete so that we don't have to clean up exited containers on create?

I'm curious how you're ending up with exited containers in the first place -- what sort of workflow are you using?

@ashish-amarnath
Copy link
Contributor Author

@chuckha I like the cleaning up on cluster delete. But I think we need to keep the container prune logic in SetupLoadbalancer
About how I am ending up w/ exited containers... I am being nasty and killing them.
Opened an issue #52 w/ repro

Copy link
Contributor

@amy amy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of scope for this PR but do we have any tests that can verify this change?

@ashish-amarnath
Copy link
Contributor Author

@amy Yes, we should have a good testing story for capd as a whole. This scenario, in particular, is kinda hard to test via automated tests. The repro steps for this is pretty much me doing bad things on purpose.

@chuckha
Copy link
Contributor

chuckha commented Jul 1, 2019

interesting...

i'd really like to find a balance here between readability and robustness.

CAPD doesn't have to be robust as it will never be a production system and should explicitly be designed to not be a robust system and more to be a general example of cluster-api assumptions.

So I'm thinking about what is the equivalent use case in cluster-api-provider-aws -- if we stop a node I don't think even CAPA is smart enough to clean up the old stopped node.

Because of that I think this logic should move into to the cluster delete and the SetupLoadBalancer should assume the environment is clean. Does that seem reasonable to you?

@ashish-amarnath
Copy link
Contributor Author

@chuckha drawing comparison w/ capa.. this is equivalent to deleting the ELB fronting the API server. At which point, IIRC, capa controller will reconcile the deleted ELB. No?

@chuckha
Copy link
Contributor

chuckha commented Jul 1, 2019

That's a good comparison, certainly more accurate than mine. There seems to be no equivalent state between aws and docker with regards to killing the container. A stopped container == what in AWS world? I would suggest that instead of cleaning up before creating that we instead add --rm to the load balancer set up. EDIT: actually after my second paragraph i don't even agree with this, i'd suggest using docker rm -f instead of docker kill to simulate a failure.

The only use case this particular code is covering is adding support for the docker kill command when really the only delete operation we support is docker rm (-f). Do you see a use case for supporting docker kill?

@ashish-amarnath
Copy link
Contributor Author

ashish-amarnath commented Jul 1, 2019

A stopped container == what in AWS world?

Depends on which container you stop. Stopping the nginx (loadbalancer) container == killing the ELB. Stopping the kind node container == deleting the EC2 instance. There is also a scenario where you stop the EC2 instance. In that case, capa controller will recreate it. At least based on https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/pkg/cloud/aws/services/ec2/instances.go#L107-L111
that is what I would expect.

@chuckha
Copy link
Contributor

chuckha commented Jul 1, 2019

ok, I think I see the use case now. The outlined use case for this is simulating a stopped load balancer. This situation doesn't exist in AWS since you cannot stop and ELB, but it does exist in docker. Do I have it right?

@ashish-amarnath
Copy link
Contributor Author

The outlined use case for this is simulating a stopped load balancer.

This is correct.

This situation doesn't exist in AWS since you cannot stop and ELB

In AWS you can delete the ELB from the AWS console and the capa cluster actuator will recreate it.

Copy link
Contributor

@chuckha chuckha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Thank you for clarifying the use case regarding this situation, much appreciated

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ashish-amarnath, chuckha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2019
@chuckha
Copy link
Contributor

chuckha commented Jul 2, 2019

/lgtm

on amy's behalf

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2019
@detiber
Copy link
Contributor

detiber commented Jul 2, 2019

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2019
Copy link
Contributor

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, lgtm, but we should be using logging now that #27 has merged.

@@ -40,9 +44,11 @@ func getRole(machine *clusterv1.Machine) string {
}

func getExternalLoadBalancerNode(clusterName string) (*nodes.Node, error) {
fmt.Printf("Getting external load balancer node for cluster %q\n", clusterName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fmt.Printfs should be replaced with log statements now that #27 has merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#27 added loggers into the clusterActuator and machineActuators. This function is not in either of those scopes. So, we'll need to use klog or something like that.
Looks like the build also failed because of undefined: fmt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interim step is to pass a log through to all of these functions and rely on the logger interface. we should not use klog anywhere.

Copy link
Contributor

@neolit123 neolit123 Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use klog anywhere.

+1
always pass a logger interface and not the logging implementation directly, whether it's fmt or klog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

* get only running ELB containers
* remove exited ELB containers with name conflict

Signed-off-by: Ashish Amarnath <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 3, 2019
@ashish-amarnath ashish-amarnath force-pushed the reconcile-deleted-elb branch from e43c100 to 3aa185e Compare July 3, 2019 06:30
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 3, 2019
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 3, 2019
@chuckha
Copy link
Contributor

chuckha commented Jul 3, 2019

/lgtm

@detiber you ok removing the hold?

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 3, 2019
@detiber
Copy link
Contributor

detiber commented Jul 3, 2019

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 3, 2019
@k8s-ci-robot k8s-ci-robot merged commit 1064568 into kubernetes-retired:master Jul 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster actuator should reconcile deleted ELB nodes
6 participants