Update documentation to call out --update-status-on-shutdown
for external DNS
#1877
Labels
area/docs
help wanted
Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
lifecycle/rotten
Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Summary
I think it would be a great idea to have a callout for users of external DNS right on the main README.md that they probably want to set
--update-status-on-shutdown=false
, or they might experience DNS downtime.Full details
Is this a BUG REPORT or FEATURE REQUEST?:
Documentation request
NGINX Ingress controller version:
0.9.0
What happened:
DNS records were deleted by external DNS during a cluster rolling update, because nginx ingress controller cleared out the ingress status fields on shutdown. This caused unexpected downtime while DNS was re-propagated after nginx ingress controller came back online and re-updated the ingress status, and then external DNS recreated the DNS records.
What you expected to happen:
No DNS changes when nginx ingress controller is evicted or redployed. From reading the code I can see that there's a flag for this that isn't very well called out
--update-on-shutdown
and I now see the original issue that requested that flag #881.How to reproduce it (as minimally and precisely as possible):
Run one instance of nginx ingress controller along with external DNS set to watch ingress, then delete the nginx ingress controller pod. While the pod is shutting down it deletes the ingress status, then external DNS does a DELETE on the DNS records, then once the new nginx ingress pod comes up and becomes leader, it re-applies the ingress status, and external DNS then does a CREATE for the DNS record.
Anything else we need to know:
Even when running more than one nginx ingress controller, the issue came up from time to time when the leader was evicted. One way I can see this happening is if both nginx ingress controllers are scheduled on the same node and that node gets rolled, and the non-leader nginx ingress shuts down more quickly than the leader. That said, I applied pod anti-affinity and was still seeing issues sometimes when the leader was evicted even when another nginx ingress controller was running. Seems like the logic isn't bulletproof in status.go for determining if more than one controller is running. I haven't been able to pinpoint the exact issue there, and once I found the
--update-status-on-shutdown
flag I stopped investigating.The text was updated successfully, but these errors were encountered: