Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flagger constantly updates service port causing LB flapping #513

Closed
jtolsma opened this issue Mar 19, 2020 · 12 comments · Fixed by #514
Closed

Flagger constantly updates service port causing LB flapping #513

jtolsma opened this issue Mar 19, 2020 · 12 comments · Fixed by #514
Labels
kind/bug Something isn't working

Comments

@jtolsma
Copy link

jtolsma commented Mar 19, 2020

I have been playing with Flagger and I noticed that when I rollout a canary (basic kubernetes), Flagger constantly updates the service causing a change in port. Combine this with the ALB ingress controller with target groups and craziness happens with timeouts and constantly shifting target groups. Tested this was the recently release rc2 and previous 3 versions and they all show the same behavior with both regular LB and ALB. What am I missing?

Steps to reproduce:
Create hello world deployment/service here: https://kubernetes.io/docs/tutorials/stateless-application/expose-external-ip-address/. Adding app to the labels of course so that flagger can pick it up.

Initial deploy:
kubectl get service,deployment hello-world
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hello-world LoadBalancer 100.67.12.162 a9836931169f811ea884d1262646600e-400633381.us-east-1.elb.amazonaws.com 8080:31517/TCP 10m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.extensions/hello-world 5/5 5 5 23m

Deploy simple canary:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: hello-world
namespace: jtolsma
spec:
analysis:
interval: 15s
iterations: 1
threshold: 1
webhooks: []
progressDeadlineSeconds: 300
provider: kubernetes
targetRef:
apiVersion: apps/v1
kind: Deployment
name: hello-world
service:
name: hello-world
port: 8080
portDiscovery: true
portName: http
targetPort: 8080

Watch logs of flagger:
{"level":"info","ts":"2020-03-19T15:58:56.133Z","caller":"flagger/main.go:110","msg":"Starting flagger version 1.0.0-rc.2 revision 52d9951 mesh provider istio"}
{"level":"info","ts":"2020-03-19T15:58:56.165Z","caller":"flagger/main.go:381","msg":"Connected to Kubernetes API v1.13.11"}
{"level":"info","ts":"2020-03-19T15:58:56.165Z","caller":"flagger/main.go:237","msg":"Waiting for canary informer cache to sync"}
{"level":"info","ts":"2020-03-19T15:58:56.265Z","caller":"flagger/main.go:244","msg":"Waiting for metric template informer cache to sync"}
{"level":"info","ts":"2020-03-19T15:58:56.365Z","caller":"flagger/main.go:251","msg":"Waiting for alert provider informer cache to sync"}
{"level":"info","ts":"2020-03-19T15:58:56.465Z","caller":"flagger/main.go:151","msg":"Watching namespace jtolsma"}
{"level":"info","ts":"2020-03-19T15:58:56.542Z","caller":"flagger/main.go:161","msg":"Connected to metrics server http://prometheus.istio-system:9090"}
{"level":"info","ts":"2020-03-19T15:58:56.542Z","caller":"flagger/main.go:326","msg":"Notifications enabled for https://hooks.slack.com/servic"}
{"level":"debug","ts":"2020-03-19T15:58:56.542Z","caller":"controller/controller.go:76","msg":"Creating event broadcaster"}
{"level":"info","ts":"2020-03-19T15:58:56.542Z","caller":"server/server.go:29","msg":"Starting HTTP server on port 8080"}
{"level":"info","ts":"2020-03-19T15:58:56.543Z","caller":"controller/controller.go:150","msg":"Starting operator"}
{"level":"info","ts":"2020-03-19T15:58:56.543Z","caller":"controller/controller.go:159","msg":"Started operator workers"}
{"level":"info","ts":"2020-03-19T16:00:34.812Z","caller":"controller/controller.go:236","msg":"Synced jtolsma/hello-world"}
{"level":"info","ts":"2020-03-19T16:00:36.585Z","caller":"router/kubernetes_default.go:137","msg":"Service hello-world-canary.jtolsma created","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:00:36.608Z","caller":"router/kubernetes_default.go:137","msg":"Service hello-world-primary.jtolsma created","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:00:36.640Z","caller":"canary/deployment_controller.go:281","msg":"Deployment hello-world-primary.jtolsma created","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:00:36.645Z","caller":"controller/events.go:27","msg":"IsPrimaryReady failed: primary daemonset hello-world-primary.jtolsma not ready: waiting for rollout to finish: observed deployment generation less then desired generation","canary":"hello-world.jtolsma"}
{"level":"debug","ts":"2020-03-19T16:00:36.646Z","logger":"event-broadcaster","caller":"record/event.go:281","msg":"Event(v1.ObjectReference{Kind:"Canary", Namespace:"jtolsma", Name:"hello-world", UID:"c4817bad-69fa-11ea-884d-1262646600e6", APIVersion:"flagger.app/v1beta1", ResourceVersion:"37092574", FieldPath:""}): type: 'Warning' reason: 'Synced' IsPrimaryReady failed: primary daemonset hello-world-primary.jtolsma not ready: waiting for rollout to finish: observed deployment generation less then desired generation"}
{"level":"info","ts":"2020-03-19T16:00:51.654Z","caller":"canary/deployment_controller.go:45","msg":"Scaling down Deployment hello-world.jtolsma","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:00:51.700Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:00:51.732Z","caller":"controller/events.go:15","msg":"Initialization done! hello-world.jtolsma","canary":"hello-world.jtolsma"}
{"level":"debug","ts":"2020-03-19T16:00:51.732Z","logger":"event-broadcaster","caller":"record/event.go:281","msg":"Event(v1.ObjectReference{Kind:"Canary", Namespace:"jtolsma", Name:"hello-world", UID:"c4817bad-69fa-11ea-884d-1262646600e6", APIVersion:"flagger.app/v1beta1", ResourceVersion:"37092574", FieldPath:""}): type: 'Normal' reason: 'Synced' Initialization done! hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:01:06.669Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:01:21.607Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:01:36.609Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:01:51.678Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:02:06.640Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}
{"level":"info","ts":"2020-03-19T16:02:21.606Z","caller":"router/kubernetes_default.go:159","msg":"Service hello-world updated","canary":"hello-world.jtolsma"}

Everytime a "Service hello-world updated" message is printed, the service port changes on the service causing an LB update.
while true ; do date ; kubectl get service hello-world ; sleep 15 ; done
Thu Mar 19 12:07:54 EDT 2020
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-world LoadBalancer 100.67.12.162 a9836931169f811ea884d1262646600e-400633381.us-east-1.elb.amazonaws.com 8080:31262/TCP 22m
Thu Mar 19 12:08:09 EDT 2020
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-world LoadBalancer 100.67.12.162 a9836931169f811ea884d1262646600e-400633381.us-east-1.elb.amazonaws.com 8080:30813/TCP 23m
Thu Mar 19 12:08:24 EDT 2020
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-world LoadBalancer 100.67.12.162 a9836931169f811ea884d1262646600e-400633381.us-east-1.elb.amazonaws.com 8080:30972/TCP 23m

@stefanprodan
Copy link
Member

The Kubernetes service should be generated by Flagger and no other controller should mutate it. Flagger detects changes in the service definition and tries to reconcile it that's why you see all those logs.

@stefanprodan
Copy link
Member

This is how it could work with ALB:

  • create an ingress definition that points to hello-world:8080
  • apply only the deployment and canary object (Flagger will generate the service for you)

@jtolsma
Copy link
Author

jtolsma commented Mar 19, 2020

No other controllers are changing the service, I have been watching the service log and only Flagger is changing the service to reflect a port change. Are you saying it's because there was already a service beforehand and that I should only do a deployment and let flagger create the service? But I cannot create a canary object without the service definition.

@stefanprodan
Copy link
Member

But I cannot create a canary object without the service definition.

What I'm saying is the opposite. Do not create a Kubernetes service on your own, let Flagger create it based on the canary.spec.service definition. Be aware that Flagger generates ClusterIP services not LBs. You need an ingress controller like ALB and an ingress definition that points to the ClusterIP created by Flagger to expose the app outside the cluster.

@jtolsma
Copy link
Author

jtolsma commented Mar 19, 2020

Hmmm, I see that. So, we heavily use type: LoadBalancer today, is there no way to continue to use that with Flagger? It seems to work fine on setup other than the nodePort: 31192 constantly changing. We have seen that before when things try to apply a service with no nodePort defined and kubernetes just keeps finding new ones that haven't been used.

@stefanprodan
Copy link
Member

What happens if you disable auto discovery, as in:

service:
 name: hello-world
 port: 8080

Does Flagger updates the service every time or not?

@jtolsma
Copy link
Author

jtolsma commented Mar 19, 2020

Yes, I did a capture between two updates and you can see the only thing changing is the node port. For some reason it thinks something needs to change, but doesn't actually change anything but the node port.

diff test1.out test2.out
10c10
<   resourceVersion: "37133251"
---
>   resourceVersion: "37133295"
18c18
<     nodePort: 32506
---
>     nodePort: 30185

@jtolsma
Copy link
Author

jtolsma commented Mar 19, 2020

Looking in the code, I am guessing this is why?
portsDiff := cmp.Diff(svcSpec.Ports, svc.Spec.Ports, cmpopts.SortSlices(sortPorts))
selectorsDiff := cmp.Diff(svcSpec.Selector, svc.Spec.Selector)

if portsDiff != "" || selectorsDiff != "" {

And because the type NodePort and LoadBalancer have a nodePort that the portsDiff will always not match. And then when it applies, it doesn't have the existing nodePort because the Ports are being overwritten by:
svcClone.Spec.Ports = svcSpec.Ports

This would be a great feature to not have this overriding the nodePort everytime, would allow Flagger to be inserted for type LoadBalancer easily and NodePort applications that exist today.

@stefanprodan
Copy link
Member

@jtolsma can you please try stefanprodan/flagger:nodeport-fix and see if the node ports are preserved?

@stefanprodan stefanprodan added the kind/bug Something isn't working label Mar 20, 2020
@jtolsma
Copy link
Author

jtolsma commented Mar 20, 2020

@stefanprodan I tested a few different scenarios and it looks like flagger is no longer changing the node port and I am not getting the messages about constantly wanting to update the service. Looks good! Does this need to be applied to other providers besides kubernetes? I will be looking to do istio in the future and we will still have some type loadbalancers there.

@stefanprodan
Copy link
Member

stefanprodan commented Mar 20, 2020

Does this need to be applied to other providers besides kubernetes?

No since all providers are using the underlying Kubernetes service code. Fixing this would fix it for all providers.

@jtolsma
Copy link
Author

jtolsma commented Mar 20, 2020

Ah! Great! Thanks for looking at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants