Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for an ExternalDNS Operator #1730

Closed
danehans opened this issue Aug 18, 2020 · 21 comments
Closed

Add Support for an ExternalDNS Operator #1730

danehans opened this issue Aug 18, 2020 · 21 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@danehans
Copy link

danehans commented Aug 18, 2020

What would you like to be added:
I propose adding an operator to manage ExternalDNS.

Why is this needed:
Currently, ExternalDNS is managed through a manual process (docs, manifests, etc) or external tooling (i.e. Helm). An operator would simplify the user experience by providing declarative management of ExternalDNS. Several operators (i.e. addons, seccomp, etc.) reside in the kubernetes-sigs org and many others (i.e. etcd-operator, prometheus-operator) exist in the same org as the applications they manage. The externaldns-operator and ExternalDNS would benefit by having each project reside in the same kubernetes-sigs org.

@danehans danehans added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 18, 2020
@danehans
Copy link
Author

Here are a few use cases that an operator can address:

  1. As a cluster admin, I need the ability to install ExternalDNS in my Kubernetes cluster to manage Route53 DNS records- An operator can a) verify that the necessary IAM policy/roles exist and create them if needed b) verify if the hosted zone(s) exist and create the zone(s) if needed c) deploy ExternalDNS (rbac, deployment, etc.). The reverse can also be accomplished for uninstalling ExternalDNS.
  2. As a cluster admin, I need the ability to enforce an "approved" ExternalDNS configuration- An operator implements the k8s controller design pattern to ensure the current state matches the desired state. An operator has the ability to surface status conditions when the two states differ.
  3. As a cluster admin, I need to ensure that ExternalDNS is working properly- An operator can programmatically create a test environment (i.e. client/server pod, a DNS record, curl client -> server FQDN, etc.) on a periodic basis to validate e2e functionality.
  4. As a cluster admin, I need the ability to perform zero downtime upgrades of ExternalDNS. An operator can implement best practices to ensure successful zero downtime ExternalDNS upgrades for production environments.
  5. As a cluster admin, I need to reduce the potential for breaking changes- An operator can a) expose an API to reduce configuration complexity b) be programmed to not implement a breaking change (i.e. change an arg) .

@danehans
Copy link
Author

The addon operator KEP and KubeCon video provides additional background for the motivation to support an ExternalDNS operator.

@szuecs
Copy link
Contributor

szuecs commented Aug 18, 2020

@danehans in which repository the „install operator“ should be?
For me this sounds like puppet module forge in kubernetes and I don’t believe in „you don’t have to understand what you run“.
For usability I agree that add/remove rbac and cloud iam roles can be an enhancement, but then who will configure the roles of the „install operator“?

@Raffo
Copy link
Contributor

Raffo commented Aug 19, 2020

As I briefly mentioned on the external-dns slack, I disagree on the need for an ExternalDNS operator. Let me explain why.

ExternalDNS was designed from the very beginning to be simple and for upgrades to never be a concern. Dealing with DNS, which is a clearly very eventual consistent, any setup can tolerate brief outages of the ExternalDNS pod without problems. Also, we essentially store no state other than what is in Kubernetes already which makes rollouts of new versions not a problem.

I understand that rolling out different versions could possibly bring challenges like having to deal with possible incompatible flags, etc, but I think:

  1. we did a decent job to make sure we didn't deprecate widely used flags till now.
  2. the operator would have to deal with this anyway.

And here comes the question... who maintains and configure the operator then? I believe an operator would essentially just move the problem to another tool instead of solving the whole configuration problem.

I've watched the video presentation from KubeCon and read the KEP, but I think we are facing a different problem. ExternalDNS is not strictly bundled with a version of Kubernetes and any of the latest versions are widely compatible with at least 3 releases of Kubernetes which is what all cloud providers support. It's even more backward compatible than that, but this is what we make sure we can guarantee.

Moreover, I would also love to add that more often than operators, the problem that really need to be solved is the one of config management and its versioning. When ExternalDNS was started, we added it to the clusters using git as source of truth. AFAIK it is still maintained like this and it still works well (@szuecs can contradict me if I'm wrong).
Being the problem of config management something that every company needs to solve (for terraform, cloudformation, puppet or pretty much any other thing that has to deal with configuration), I think this solves implicitly the problems with rolling out ExternalDNS.

Last but not least, there is already a helm chart available for this project and we recently added support to kustomize that can help anyone get started.

Those ☝️ are the reasons why I think we don't need an operator for this project. I think it would introduce complexity rather than simplify things.
That said, I am not opposed to someone writing that code (just not in this repo, for maintainability reasons): it could prove me completely wrong, it could turn out to be useful in some cases, it might serve some companies' interests. And this is why open source is great, we can have different opinions, different implementations and learn from those.

I hope this clarifies 😃

@danehans
Copy link
Author

For usability I agree that add/remove rbac and cloud iam roles can be an enhancement, but then who will configure the roles of the „install operator“?

@szuecs yes, an admin will need to kubectl apply -f /operator/config, so the rbac setup is not much different between the two. However, the operator can still provide value with the other pieces of use case 1.

@Raffo thanks for the detailed explanation on your views. The management space will continue to have different tools (helm, kustomize, operators, etc.) with each providing pros/cons. I don't see how ExternalDNS addresses each of the use cases I describe above. The other tools that you reference may be able to support these use cases, but I think the management space will continue to have different tools that compete or compliment one another. I'll start the project in my repo and provide an update to the community when it achieves the above use-cases.

@szuecs
Copy link
Contributor

szuecs commented Aug 20, 2020

@Raffo yes for us it just works like that.

I generally completely agree with @Raffo .

@danehans additional value the operator can provide if you split responsibility in clusters to different external-dns with maybe targeting different providers. I don’t know what other users have but from slack, many people try to configure aws cross account setups or having a 3rd party provider. So I see there are some cases that could fit into your operator and would provide additional value.

@danehans
Copy link
Author

split responsibility in clusters to different external-dns

@szuecs would an example be separate edns instances for managing public and private zones?

@szuecs
Copy link
Contributor

szuecs commented Aug 22, 2020

Yes for example, but you could also for example bind a zone to a namespace iirc.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 20, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 20, 2020
@raelga
Copy link
Member

raelga commented Jan 7, 2021

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 7, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2021
@raelga
Copy link
Member

raelga commented Apr 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2021
@sgreene570
Copy link
Contributor

The OpenShift Network Edge team is proceeding with a preliminary ExternalDNS operator design outlined via an OpenShift enhancement, for anyone who may be interested. We hope to prove that an ExternalDNS operator would be worthwhile, and would ultimately like to gain some community buy-in.

@sgreene570
Copy link
Contributor

Also, its worth mentioning #1961, which describes how ExternalDNS could be turned into an "operator" in itself.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021
@raelga
Copy link
Member

raelga commented Sep 23, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 23, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 22, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2022
@k8s-ci-robot k8s-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 21, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants