🌱 [capd] Ensure Loadbalancer IP is not empty #4398

ashish-amarnath · 2021-03-29T22:21:35Z

Signed-off-by: Ashish Amarnath [email protected]

What this PR does / why we need it:

DockerCluster reconciliation tries to lookup IP of the loadbalancer container. In the process, it looks at all containers in the system by running docker ps -a filtering using the name of the cluster. This picks up even those containers that were stopped. The stopped containers will not have any IP addresses associated with them. This results in an error in the DockerCluster controller like

[manager] E0329 14:51:58.051663      10 controller.go:302] controller-runtime/manager/controller/dockercluster "msg"="Reconciler error" "error"="DockerCluster.infrastructure.cluster.x-k8s.io \"my-cluster\" is invalid: spec.controlPlaneEndpoint.host: Required value" "name"="my-cluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="DockerCluster"

This change adds a filter to the docker ps -a command to pick up only those containers that are running, and also returns an error if the LB has no IP addresses associated with it.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4396

ashish-amarnath · 2021-03-29T23:10:33Z

not sure how to kick the netlify checks

MarcelMue · 2021-03-30T08:52:12Z

test/infrastructure/docker/docker/types/node.go

@@ -83,6 +83,9 @@ func (n *Node) IP(ctx context.Context) (ipv4 string, ipv6 string, err error) {
 	if len(ips) != 2 {
 		return "", "", errors.Errorf("container addresses should have 2 values, got %d values", len(ips))
 	}
+	if ips[0] == "" && ips[1] == "" {


This change seems unrelated to the LB container change to me. Am I missing something obvious?

In case a change here was intended. Do we want to return an error if:

both are empty

one of them is empty

@MarcelMue and @elmiko This change is related to the LB container.
During DockerCluster reconciliation:

In the reconcileNormal method we try to look up the IP of the LB of the previously provisioned LB by calling the LoadBalancer.IP

LoadBalancer.IP in turn calls the Node.IP function by calling s.container.IP()

@sbueringer the IPs returned here are IPv4 (ips[0]) and IPv6(ips[1] addresses. Till we decide that only one of those is what we will support, returning an error if both are empty seems reasonable. WDYT?

On thinking about this further, returning this error from the Node.IP() may not be the correct thing. IMO, this check should be performed in the LoadBalancer.IP function.

@ashish-amarnath Sounds okay to me. I don't have the necessary context to know if we expect both of them to be set or just one of them.

ashish-amarnath · 2021-03-30T15:31:49Z

/assign @fabriziopandini

MarcelMue

/lgtm

test/infrastructure/docker/docker/loadbalancer.go

fabriziopandini · 2021-03-30T20:20:48Z

@ashish-amarnath changes lgtm to me but I have two concerns:

What is the use case that lead to having a stopped load balancer container, this should never happen
What is the expected behaviour in case a stopped load balancer container exists; I understand that with this PR we are not considering it as an existing load balancer, but if I'm not wrong, by ignoring it we ends up in trying to create a new one and this will fail because a container with the same name already exists...

ashish-amarnath · 2021-03-30T23:51:25Z

@fabriziopandini
I am not entirely sure how I ended up with a stopped LB container for my cluster. But the error message I saw in the controller when this happened wasn't indicative of the reason. Specifically, this is the error I observed:

[manager] E0329 14:51:58.051663      10 controller.go:302] controller-runtime/manager/controller/dockercluster "msg"="Reconciler error" "error"="DockerCluster.infrastructure.cluster.x-k8s.io \"my-cluster\" is invalid: spec.controlPlaneEndpoint.host: Required value" "name"="my-cluster" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="DockerCluster"

Your concerns are valid. First, I agree that this should not happen commonly. But in case this happens, at the very least the needs to be more meaningful than indicating that there was something wrong with the spec that was applied and accepted by the validation. Second, if this were a real provider, then in the case of the LB being stopped, killed, or deleted, the expectation would be that a new one will be spun up in its place as part of reconciling the infrastructure that makes up the cluster. That said, I also understand that this is not a real provider for production.
About reconciliation of the LB failing bc a container of the same name exists, I think that should be remediated too. This can be done by removing stopped containers with the same name. Happy to address that in this PR if you agree 🙂
WDYT?

fabriziopandini · 2021-03-31T09:38:07Z

TBH, I'm a little bit concerned that automatic remediation of stopped containers would hide the root cause of the problem and in the end introduce more instability to the system due to the load balancer going away and being recreated in an unpredictable way (also in a real production system, you can't really trust a load balancer that suddently goes away without reasons).

However I agree we should report a better error message, but this should be done without ignoring stopped containers.

sbueringer · 2021-03-31T09:53:44Z

I agree with @fabriziopandini . Let's try to first improve error reporting / logging. If we then know the root cause, we can decide if an automatic remediation is the right way to resolve it.

I opened PR #4414 to gather more data in ci. So if we hit the issue there, it should be easier to find out what leads to this problem.

ashish-amarnath · 2021-03-31T14:30:32Z

@fabriziopandini Considering that this is not a real provider but one to catch problems, I agree that this change will be papering over real issues. I will remove the filtering change and keep the error check which could give us the meaningful error message that we are looking for.

test/infrastructure/docker/docker/loadbalancer.go

sbueringer · 2021-03-31T15:13:31Z

/lgtm

Signed-off-by: Ashish Amarnath <[email protected]>

ashish-amarnath · 2021-03-31T15:15:36Z

Realized we are not categorizing this as a bug

fabriziopandini · 2021-03-31T15:19:25Z

/lgtm

MarcelMue · 2021-03-31T16:16:45Z

/lgtm

fabriziopandini · 2021-04-02T12:05:18Z

/approve

k8s-ci-robot · 2021-04-02T12:05:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/infrastructure/docker/OWNERS~~ [fabriziopandini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sbueringer · 2021-04-02T12:32:30Z

/test pull-cluster-api-test-main

EDIT: faster :)

fabriziopandini · 2021-04-02T12:32:35Z

/retest

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 29, 2021

k8s-ci-robot requested review from detiber and vincepri March 29, 2021 22:21

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 29, 2021

ashish-amarnath force-pushed the lb-status-filter branch from 4a510c3 to 069d4b9 Compare March 29, 2021 22:25

ashish-amarnath changed the title ~~🐛 [WIP] [capd] Filter running LB containers for DockerCluster~~ 🐛 [capd] Filter running LB containers for DockerCluster Mar 29, 2021

MarcelMue reviewed Mar 30, 2021

View reviewed changes

k8s-ci-robot assigned fabriziopandini Mar 30, 2021

ashish-amarnath force-pushed the lb-status-filter branch 2 times, most recently from 224fd1d to 29ee27c Compare March 30, 2021 16:20

MarcelMue reviewed Mar 30, 2021

View reviewed changes

test/infrastructure/docker/docker/loadbalancer.go Outdated Show resolved Hide resolved

k8s-ci-robot assigned MarcelMue Mar 30, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2021

ashish-amarnath force-pushed the lb-status-filter branch from 29ee27c to 9b8139a Compare March 31, 2021 14:36

ashish-amarnath changed the title ~~🐛 [capd] Filter running LB containers for DockerCluster~~ 🐛 [capd] Ensure Loadbalancer IP is not empty Mar 31, 2021

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 31, 2021

sbueringer reviewed Mar 31, 2021

View reviewed changes

test/infrastructure/docker/docker/loadbalancer.go Outdated Show resolved Hide resolved

sbueringer reviewed Mar 31, 2021

View reviewed changes

test/infrastructure/docker/docker/loadbalancer.go Outdated Show resolved Hide resolved

ashish-amarnath force-pushed the lb-status-filter branch 3 times, most recently from 282bf6f to 2dfdc99 Compare March 31, 2021 15:00

🌱 [capd] Ensure Loadbalancer IP is not empty

5c068db

Signed-off-by: Ashish Amarnath <[email protected]>

ashish-amarnath force-pushed the lb-status-filter branch from 2dfdc99 to 5c068db Compare March 31, 2021 15:14

ashish-amarnath changed the title ~~🐛 [capd] Ensure Loadbalancer IP is not empty~~ 🌱 [capd] Ensure Loadbalancer IP is not empty Mar 31, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 31, 2021

k8s-ci-robot assigned sbueringer Mar 31, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2021

k8s-ci-robot merged commit 00b2032 into kubernetes-sigs:master Apr 2, 2021

k8s-ci-robot added this to the v0.4 milestone Apr 2, 2021

ashish-amarnath deleted the lb-status-filter branch April 2, 2021 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 [capd] Ensure Loadbalancer IP is not empty #4398

🌱 [capd] Ensure Loadbalancer IP is not empty #4398

ashish-amarnath commented Mar 29, 2021 •

edited

Loading

ashish-amarnath commented Mar 29, 2021

MarcelMue Mar 30, 2021

sbueringer Mar 30, 2021 •

edited

Loading

ashish-amarnath Mar 30, 2021 •

edited

Loading

ashish-amarnath Mar 30, 2021

ashish-amarnath Mar 30, 2021

sbueringer Mar 30, 2021

ashish-amarnath commented Mar 30, 2021

MarcelMue left a comment

fabriziopandini commented Mar 30, 2021 •

edited

Loading

ashish-amarnath commented Mar 30, 2021

fabriziopandini commented Mar 31, 2021 •

edited

Loading

sbueringer commented Mar 31, 2021 •

edited

Loading

ashish-amarnath commented Mar 31, 2021

sbueringer commented Mar 31, 2021

ashish-amarnath commented Mar 31, 2021

fabriziopandini commented Mar 31, 2021

MarcelMue commented Mar 31, 2021

fabriziopandini commented Apr 2, 2021

k8s-ci-robot commented Apr 2, 2021

sbueringer commented Apr 2, 2021 •

edited

Loading

fabriziopandini commented Apr 2, 2021

🌱 [capd] Ensure Loadbalancer IP is not empty #4398

🌱 [capd] Ensure Loadbalancer IP is not empty #4398

Conversation

ashish-amarnath commented Mar 29, 2021 • edited Loading

ashish-amarnath commented Mar 29, 2021

MarcelMue Mar 30, 2021

Choose a reason for hiding this comment

sbueringer Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

ashish-amarnath Mar 30, 2021 • edited Loading

Choose a reason for hiding this comment

ashish-amarnath Mar 30, 2021

Choose a reason for hiding this comment

ashish-amarnath Mar 30, 2021

Choose a reason for hiding this comment

sbueringer Mar 30, 2021

Choose a reason for hiding this comment

ashish-amarnath commented Mar 30, 2021

MarcelMue left a comment

Choose a reason for hiding this comment

fabriziopandini commented Mar 30, 2021 • edited Loading

ashish-amarnath commented Mar 30, 2021

fabriziopandini commented Mar 31, 2021 • edited Loading

sbueringer commented Mar 31, 2021 • edited Loading

ashish-amarnath commented Mar 31, 2021

sbueringer commented Mar 31, 2021

ashish-amarnath commented Mar 31, 2021

fabriziopandini commented Mar 31, 2021

MarcelMue commented Mar 31, 2021

fabriziopandini commented Apr 2, 2021

k8s-ci-robot commented Apr 2, 2021

sbueringer commented Apr 2, 2021 • edited Loading

fabriziopandini commented Apr 2, 2021

ashish-amarnath commented Mar 29, 2021 •

edited

Loading

sbueringer Mar 30, 2021 •

edited

Loading

ashish-amarnath Mar 30, 2021 •

edited

Loading

fabriziopandini commented Mar 30, 2021 •

edited

Loading

fabriziopandini commented Mar 31, 2021 •

edited

Loading

sbueringer commented Mar 31, 2021 •

edited

Loading

sbueringer commented Apr 2, 2021 •

edited

Loading