Handling host aliases and agent certificates in HA setups #5108

SimonHoenscheid · 2017-03-30T14:06:26Z

Environment:

In HA setups there is often a virtual IP address associated with a dns name. The address can point to different hosts. Often the address ist moved by a cluster framework, sometimes by hand.

The easy way would be to have a dummy host pointing to the virtual IP and monitoring the service on the agent endpoint. The problem with this solution is:

There is no agent endpoint for the dummy
there is no zone for the dummy
the real endpoints serve a different certificate, the connection will break

It would be great to have parameters inside the host object that would tell Icinga2 which real hosts are possible endpoints of this dummy. If one of the certificates is presented, Icinga2 will allow the connection and the service will be monitored on the agent endpoints.

dgoetz · 2017-05-11T12:38:42Z

I have thought a while about how this could be solved from a configuration perspective, perhaps one of these ideas is possible to be actually implemented.

A clustered service could have multiple command endpoints by make command_endpoint an array in this case. Icinga 2 should then run the check on any host and print the best result or on one connected host and use this result. It also has to be solved if the check can only give a valid result on one specific node.

A cluster host could be an additional endpoint with its cluster address (or multiple addresses, so the host attribute being an array,) but this would require the agent also to have an alternative name in its certificate and to respond to this name. Also the agent has to accept both endpoints and zones in its configuration. This could perhaps also solve other issues where an host has multiple names.

All cluster endpoints could build an additional zone, which than could be used as zone attribute instead of the command_endpoint attribute. This would have the same problem like command_endpoint as array which check result is to used.

An internal "clustered" check could solve this perhaps to which allows to execute checks on all cluster members and allows to configure if the best result, the worst, minimum to ok or something else should be used.

I have no idea which one could be solved in the easiest way and requiring the least amount of changed. But for wishlisting I would prefer the internal check or agents having an alternative name in the certificate and responding to it. I hope at least my thoughts are somehow helpful for you.

pdeneu · 2017-11-10T07:53:05Z

Hey Dirk, thank you for your thoughts.
The idea with the alternate certificate name is fine, but got the problem that cluster resources can be added or removed quite fast. in my opinion there is no productive use to always regenerate the certificates with all the "new" alternate names.

Whats up with adding a completly new object called maybe "VirtualHost"?
We define a VirtualHost like a normal Host but theres a attribute like "nodes" (or so on) where we define the cluster nodes. The virtualHost will represent the "cluster resource" and the attribute node will be an array of existing Endpoints where this VirtualHost can be. Next to this icinga has always to execute the check against all nodes and got the result from this one where the communication got from. Essential for this would be that there is a mechanism which checks the virutalHost ip or name on this endpoint/node.
This is just a blueprint - but maybe a second idea to your thoughts.

Regards

julianbrost · 2021-10-21T11:08:00Z

I'm having a hard time understanding what this request is actually asking for or what problem is tried to be solved here. So if someone could provide a concrete example, I'd be happy.

So when monitoring some HA-clustered service, I'd use the following approach right now:

For each node providing the service, perform independent checks that sort of do a self-test of that node to check if everything is fine and the node is capable of taking over. If this goes into a problem state, that probably something to look at as you lost redundancy, but not super critical yet.
Have one host object representing the clustered service, i.e. its virtual IP address, that checks if the service is operating fine on that address. This check is performed externally, from example directly on a satellite. If this goes into a problem state, this probably need immediate action.

There's also the suggestion of combining multiple checks into one result, so maybe this is also asking for something like check_multi, but nicely integrated so that you don't have to fiddle around with building a command line for that and also allowing the individual checks to be executed on different icinga2 nodes.

dgoetz · 2021-10-21T11:21:55Z

You examples are what is quite easy to achieve but not all that could be relevant as there are cluster setups where a specific service has to be checked on a specific agent locally.

To come up with an example that is very generic and so it could also be created without any real cluster solution. The cluster consists of two node, cluster services run on an additional name and one cluster service is a filesystem only checkable locally. In this case running the check on both nodes will always result in one good and one bad, so running it only on the cluster name would be preferred as it will give you proper results, but this can not be addressed with the agent at the moment.

julianbrost · 2021-10-21T11:42:07Z

Okay, so you'd basically want to have something like command_endpoint = $currently_active_node and some of the suggestions are to solve this by also letting the Icinga agent listen on the virtual address?

dgoetz · 2021-11-23T11:01:54Z

A different scenario posted by a user: https://community.icinga.com/t/use-icinga-agent-on-multihomed-hosts-with-independent-ip-addresses-for-services/8709

dnsmichi added the area/distributed Distributed monitoring (master, satellites, clients) label Mar 30, 2017

dgoetz mentioned this issue Nov 10, 2017

Icinga2 Agent communication on virtual resources #5745

Closed

dnsmichi added wishlist needs-sponsoring Not low on priority but also not scheduled soon without any incentive labels Jan 16, 2018

dnsmichi added the enhancement New feature or request label Jul 3, 2018

dnsmichi removed the wishlist label May 9, 2019

htriem mentioned this issue Jan 22, 2020

[dev.icinga.com #12628] Allow icinga2 agent (command execution bridge) to connect to multiple icinga2 masters/satellites #4626

Open

dnsmichi mentioned this issue Feb 5, 2020

Draft concept: Cluster: Message Routing, Performance, Connection Handling, Inventory, Discovery #7814

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling host aliases and agent certificates in HA setups #5108

Handling host aliases and agent certificates in HA setups #5108

SimonHoenscheid commented Mar 30, 2017

dgoetz commented May 11, 2017

pdeneu commented Nov 10, 2017

julianbrost commented Oct 21, 2021

dgoetz commented Oct 21, 2021

julianbrost commented Oct 21, 2021

dgoetz commented Nov 23, 2021

Handling host aliases and agent certificates in HA setups #5108

Handling host aliases and agent certificates in HA setups #5108

Comments

SimonHoenscheid commented Mar 30, 2017

dgoetz commented May 11, 2017

pdeneu commented Nov 10, 2017

julianbrost commented Oct 21, 2021

dgoetz commented Oct 21, 2021

julianbrost commented Oct 21, 2021

dgoetz commented Nov 23, 2021