Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling host aliases and agent certificates in HA setups #5108

Open
SimonHoenscheid opened this issue Mar 30, 2017 · 6 comments
Open

Handling host aliases and agent certificates in HA setups #5108

SimonHoenscheid opened this issue Mar 30, 2017 · 6 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients) enhancement New feature or request needs-sponsoring Not low on priority but also not scheduled soon without any incentive

Comments

@SimonHoenscheid
Copy link

Environment:

In HA setups there is often a virtual IP address associated with a dns name. The address can point to different hosts. Often the address ist moved by a cluster framework, sometimes by hand.

The easy way would be to have a dummy host pointing to the virtual IP and monitoring the service on the agent endpoint. The problem with this solution is:

  • There is no agent endpoint for the dummy
  • there is no zone for the dummy
  • the real endpoints serve a different certificate, the connection will break

It would be great to have parameters inside the host object that would tell Icinga2 which real hosts are possible endpoints of this dummy. If one of the certificates is presented, Icinga2 will allow the connection and the service will be monitored on the agent endpoints.

@dnsmichi dnsmichi added the area/distributed Distributed monitoring (master, satellites, clients) label Mar 30, 2017
@dgoetz
Copy link
Contributor

dgoetz commented May 11, 2017

I have thought a while about how this could be solved from a configuration perspective, perhaps one of these ideas is possible to be actually implemented.

A clustered service could have multiple command endpoints by make command_endpoint an array in this case. Icinga 2 should then run the check on any host and print the best result or on one connected host and use this result. It also has to be solved if the check can only give a valid result on one specific node.

A cluster host could be an additional endpoint with its cluster address (or multiple addresses, so the host attribute being an array,) but this would require the agent also to have an alternative name in its certificate and to respond to this name. Also the agent has to accept both endpoints and zones in its configuration. This could perhaps also solve other issues where an host has multiple names.

All cluster endpoints could build an additional zone, which than could be used as zone attribute instead of the command_endpoint attribute. This would have the same problem like command_endpoint as array which check result is to used.

An internal "clustered" check could solve this perhaps to which allows to execute checks on all cluster members and allows to configure if the best result, the worst, minimum to ok or something else should be used.

I have no idea which one could be solved in the easiest way and requiring the least amount of changed. But for wishlisting I would prefer the internal check or agents having an alternative name in the certificate and responding to it. I hope at least my thoughts are somehow helpful for you.

@pdeneu
Copy link

pdeneu commented Nov 10, 2017

Hey Dirk, thank you for your thoughts.
The idea with the alternate certificate name is fine, but got the problem that cluster resources can be added or removed quite fast. in my opinion there is no productive use to always regenerate the certificates with all the "new" alternate names.

Whats up with adding a completly new object called maybe "VirtualHost"?
We define a VirtualHost like a normal Host but theres a attribute like "nodes" (or so on) where we define the cluster nodes. The virtualHost will represent the "cluster resource" and the attribute node will be an array of existing Endpoints where this VirtualHost can be. Next to this icinga has always to execute the check against all nodes and got the result from this one where the communication got from. Essential for this would be that there is a mechanism which checks the virutalHost ip or name on this endpoint/node.
This is just a blueprint - but maybe a second idea to your thoughts.

Regards

@julianbrost
Copy link
Contributor

I'm having a hard time understanding what this request is actually asking for or what problem is tried to be solved here. So if someone could provide a concrete example, I'd be happy.

So when monitoring some HA-clustered service, I'd use the following approach right now:

  • For each node providing the service, perform independent checks that sort of do a self-test of that node to check if everything is fine and the node is capable of taking over. If this goes into a problem state, that probably something to look at as you lost redundancy, but not super critical yet.
  • Have one host object representing the clustered service, i.e. its virtual IP address, that checks if the service is operating fine on that address. This check is performed externally, from example directly on a satellite. If this goes into a problem state, this probably need immediate action.

There's also the suggestion of combining multiple checks into one result, so maybe this is also asking for something like check_multi, but nicely integrated so that you don't have to fiddle around with building a command line for that and also allowing the individual checks to be executed on different icinga2 nodes.

@dgoetz
Copy link
Contributor

dgoetz commented Oct 21, 2021

You examples are what is quite easy to achieve but not all that could be relevant as there are cluster setups where a specific service has to be checked on a specific agent locally.

To come up with an example that is very generic and so it could also be created without any real cluster solution. The cluster consists of two node, cluster services run on an additional name and one cluster service is a filesystem only checkable locally. In this case running the check on both nodes will always result in one good and one bad, so running it only on the cluster name would be preferred as it will give you proper results, but this can not be addressed with the agent at the moment.

@julianbrost
Copy link
Contributor

Okay, so you'd basically want to have something like command_endpoint = $currently_active_node and some of the suggestions are to solve this by also letting the Icinga agent listen on the virtual address?

@dgoetz
Copy link
Contributor

dgoetz commented Nov 23, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients) enhancement New feature or request needs-sponsoring Not low on priority but also not scheduled soon without any incentive
Projects
None yet
Development

No branches or pull requests

5 participants