-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 [capd] Ensure Loadbalancer IP is not empty #4398
🌱 [capd] Ensure Loadbalancer IP is not empty #4398
Conversation
4a510c3
to
069d4b9
Compare
not sure how to kick the netlify checks |
@@ -83,6 +83,9 @@ func (n *Node) IP(ctx context.Context) (ipv4 string, ipv6 string, err error) { | |||
if len(ips) != 2 { | |||
return "", "", errors.Errorf("container addresses should have 2 values, got %d values", len(ips)) | |||
} | |||
if ips[0] == "" && ips[1] == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems unrelated to the LB container change to me. Am I missing something obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case a change here was intended. Do we want to return an error if:
- both are empty
- one of them is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcelMue and @elmiko This change is related to the LB container.
During DockerCluster reconciliation:
- In the reconcileNormal method we try to look up the IP of the LB of the previously provisioned LB by calling the LoadBalancer.IP
- LoadBalancer.IP in turn calls the Node.IP function by calling s.container.IP()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbueringer the IPs returned here are IPv4 (ips[0]) and IPv6(ips[1] addresses. Till we decide that only one of those is what we will support, returning an error if both are empty seems reasonable. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On thinking about this further, returning this error from the Node.IP()
may not be the correct thing. IMO, this check should be performed in the LoadBalancer.IP
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashish-amarnath Sounds okay to me. I don't have the necessary context to know if we expect both of them to be set or just one of them.
/assign @fabriziopandini |
224fd1d
to
29ee27c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@ashish-amarnath changes lgtm to me but I have two concerns:
|
@fabriziopandini
Your concerns are valid. First, I agree that this should not happen commonly. But in case this happens, at the very least the needs to be more meaningful than indicating that there was something wrong with the spec that was applied and accepted by the validation. Second, if this were a real provider, then in the case of the LB being stopped, killed, or deleted, the expectation would be that a new one will be spun up in its place as part of reconciling the infrastructure that makes up the cluster. That said, I also understand that this is not a real provider for production. |
TBH, I'm a little bit concerned that automatic remediation of stopped containers would hide the root cause of the problem and in the end introduce more instability to the system due to the load balancer going away and being recreated in an unpredictable way (also in a real production system, you can't really trust a load balancer that suddently goes away without reasons). However I agree we should report a better error message, but this should be done without ignoring stopped containers. |
I agree with @fabriziopandini . Let's try to first improve error reporting / logging. If we then know the root cause, we can decide if an automatic remediation is the right way to resolve it. I opened PR #4414 to gather more data in ci. So if we hit the issue there, it should be easier to find out what leads to this problem. |
@fabriziopandini Considering that this is not a real provider but one to catch problems, I agree that this change will be papering over real issues. I will remove the filtering change and keep the error check which could give us the meaningful error message that we are looking for. |
29ee27c
to
9b8139a
Compare
282bf6f
to
2dfdc99
Compare
/lgtm |
Signed-off-by: Ashish Amarnath <[email protected]>
2dfdc99
to
5c068db
Compare
Realized we are not categorizing this as a bug |
/lgtm |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-cluster-api-test-main EDIT: faster :) |
/retest |
Signed-off-by: Ashish Amarnath [email protected]
What this PR does / why we need it:
DockerCluster reconciliation tries to lookup IP of the loadbalancer container. In the process, it looks at all containers in the system by running
docker ps -a
filtering using the name of the cluster. This picks up even those containers that were stopped. The stopped containers will not have any IP addresses associated with them. This results in an error in the DockerCluster controller likeThis change adds a filter to the
docker ps -a
command to pick up only those containers that are running, and also returns an error if the LB has no IP addresses associated with it.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4396