Update documentation to call out `--update-status-on-shutdown` for external DNS #1877

jordanjennings · 2018-01-05T15:18:32Z

Summary

I think it would be a great idea to have a callout for users of external DNS right on the main README.md that they probably want to set --update-status-on-shutdown=false, or they might experience DNS downtime.

Full details

Is this a BUG REPORT or FEATURE REQUEST?:
Documentation request

NGINX Ingress controller version:
0.9.0

What happened:
DNS records were deleted by external DNS during a cluster rolling update, because nginx ingress controller cleared out the ingress status fields on shutdown. This caused unexpected downtime while DNS was re-propagated after nginx ingress controller came back online and re-updated the ingress status, and then external DNS recreated the DNS records.

What you expected to happen:
No DNS changes when nginx ingress controller is evicted or redployed. From reading the code I can see that there's a flag for this that isn't very well called out --update-on-shutdown and I now see the original issue that requested that flag #881.

How to reproduce it (as minimally and precisely as possible):
Run one instance of nginx ingress controller along with external DNS set to watch ingress, then delete the nginx ingress controller pod. While the pod is shutting down it deletes the ingress status, then external DNS does a DELETE on the DNS records, then once the new nginx ingress pod comes up and becomes leader, it re-applies the ingress status, and external DNS then does a CREATE for the DNS record.

Anything else we need to know:
Even when running more than one nginx ingress controller, the issue came up from time to time when the leader was evicted. One way I can see this happening is if both nginx ingress controllers are scheduled on the same node and that node gets rolled, and the non-leader nginx ingress shuts down more quickly than the leader. That said, I applied pod anti-affinity and was still seeing issues sometimes when the leader was evicted even when another nginx ingress controller was running. Seems like the logic isn't bulletproof in status.go for determining if more than one controller is running. I haven't been able to pinpoint the exact issue there, and once I found the --update-status-on-shutdown flag I stopped investigating.

The text was updated successfully, but these errors were encountered:

fejta-bot · 2018-04-05T19:56:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-05T20:13:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-06-04T21:00:18Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

aledbf added area/docs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 5, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2018

jordanjennings mentioned this issue May 28, 2018

External DNS deleted all Route53 records on cluster rolling update kubernetes-sigs/external-dns#414

Closed

k8s-ci-robot closed this as completed Jun 4, 2018

max-rocket-internet mentioned this issue Nov 25, 2019

Rolling update of controller pods caused recreation of external-dns records #4774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation to call out `--update-status-on-shutdown` for external DNS #1877

Update documentation to call out `--update-status-on-shutdown` for external DNS #1877

jordanjennings commented Jan 5, 2018

fejta-bot commented Apr 5, 2018

fejta-bot commented May 5, 2018

fejta-bot commented Jun 4, 2018

Update documentation to call out --update-status-on-shutdown for external DNS #1877

Update documentation to call out --update-status-on-shutdown for external DNS #1877

Comments

jordanjennings commented Jan 5, 2018

Summary

Full details

fejta-bot commented Apr 5, 2018

fejta-bot commented May 5, 2018

fejta-bot commented Jun 4, 2018

Update documentation to call out `--update-status-on-shutdown` for external DNS #1877

Update documentation to call out `--update-status-on-shutdown` for external DNS #1877