Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to call out --update-status-on-shutdown for external DNS #1877

Closed
jordanjennings opened this issue Jan 5, 2018 · 3 comments
Labels
area/docs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jordanjennings
Copy link

Summary

I think it would be a great idea to have a callout for users of external DNS right on the main README.md that they probably want to set --update-status-on-shutdown=false, or they might experience DNS downtime.

Full details

Is this a BUG REPORT or FEATURE REQUEST?:
Documentation request

NGINX Ingress controller version:
0.9.0

What happened:
DNS records were deleted by external DNS during a cluster rolling update, because nginx ingress controller cleared out the ingress status fields on shutdown. This caused unexpected downtime while DNS was re-propagated after nginx ingress controller came back online and re-updated the ingress status, and then external DNS recreated the DNS records.

What you expected to happen:
No DNS changes when nginx ingress controller is evicted or redployed. From reading the code I can see that there's a flag for this that isn't very well called out --update-on-shutdown and I now see the original issue that requested that flag #881.

How to reproduce it (as minimally and precisely as possible):
Run one instance of nginx ingress controller along with external DNS set to watch ingress, then delete the nginx ingress controller pod. While the pod is shutting down it deletes the ingress status, then external DNS does a DELETE on the DNS records, then once the new nginx ingress pod comes up and becomes leader, it re-applies the ingress status, and external DNS then does a CREATE for the DNS record.

Anything else we need to know:
Even when running more than one nginx ingress controller, the issue came up from time to time when the leader was evicted. One way I can see this happening is if both nginx ingress controllers are scheduled on the same node and that node gets rolled, and the non-leader nginx ingress shuts down more quickly than the leader. That said, I applied pod anti-affinity and was still seeing issues sometimes when the leader was evicted even when another nginx ingress controller was running. Seems like the logic isn't bulletproof in status.go for determining if more than one controller is running. I haven't been able to pinpoint the exact issue there, and once I found the --update-status-on-shutdown flag I stopped investigating.

@aledbf aledbf added area/docs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 5, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants