-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to give more than one address for an Endpoint #5976
Comments
Since I was asked in private messages I want to add some more information to clarify what this is all about. Say, your Icinga 2 masters are multihomed hosts. They have a nic in the production LAN to connect to satellites and agents, one in an "admin lan" used by ops-people to connect to the nodes and one nic in an "cluster lan" used for cluster communication. Say further the firewall people are a bit hung over and cut the connection between your data center for the production lan. There you go, you have a split brain scenario where both masters try to check your hosts and write their findings into IDO and a grapher which will lead to a messed up history. What I want is that I can all IP addresses from all 3 networks to the endpoints so when the main link is broken Icinga 2 will communicate over the cluster network for avoiding split brain. Icinga 2 will work like nothing happens but can alarm you that one of the three configured connections went down. |
Basically, what I want is:
So, when one link goes down the nodes can still communicate over the other. |
Just to be certain, both IPs point to the same Icinga 2 instance? I'm worried about the fallout of having one endpoint being two different installations which would be awkward to keep in sync |
Hm we basically should have the same behavior in dual-stack v4/v6. (only then based on DNS) |
This will be really hard to debug. I would rather use DNS and round-robin returned addresses based on routing hops. |
@Crunsher : Yes, both to the same Icinga 2 instance. I'd use the cerficitates DN for verifying that we connected to the same instance with all connections. Maybe some more information just to be really sure. This could be a use for the @dnsmichi : I'd really not want to use DNS round robin for this because I don't have any control over which connection to use. A detail that would help with this issue could be a way to mark one connection as primary and the others as fallback. Use the first one in the array? Or a seperate option. Besides I can't imagine users to create DNS entries that map to hosts in a production and a cluster network. |
I don't think that replacing the current string attribute into an array of strings is a possible migration route either (even if this gets implemented somehow). IP addresses do have another problem - if they're renumbered, who checks the monitoring system and updates such? In my opinion, you'll end up with many retries with unreachable addresses, and your cluster doesn't really work then. Previous issues had been created where we had multiple connections opened for an endpoint/client, and we have removed such to make the socket IO as performant as possible. This feature request will slow that down and make it complicated again. |
Then we could have another option for endpoints. Say:
If you don't want to rely on IP addresses, just use hostnames.
Besides this very issue: I tend to tell users to use IP addresses for Endpoints and Hosts. While it might be more common to change IPs than have DNS failing completely I think that changing IPs is always planned, while a DNS outage might hit you unexpectedly and a major problem like this calls for monitoring to keep an eye on what is still working. |
I still think this overly complicates everything, and solves a niche problem where other tools are doing better. A clear 'no' from my side. |
The more I think about this the more problems I see with this idea :/ What, for example, about recovery, once we switched to a working fallback how do we know when to change back? And how would you know your main master failed, unless it's monitored by the fallbacks (and the fallbacks monitor each other) - which would then lead to a complicated zone configuration on the nodes. I think this is a much bigger feature with harsher implications to our code and users, a rabbit hole so to say. |
Please consider:
Implementation is relatively simple, try to connect to each address until a connection can be established. This is similar to what dual-stack usually does. Config should be straight forward: |
IMAO an admin can use DNS or HAproxy LB. @N-o-X What do you think? |
@Al2Klimov yes, although I like the idea of having such a feature, I don't think this is currently worth our time. We can still keep this on our wish list tough. |
... for the case that #host isn't reachable. refs #5976
@widhalmt Oh, you DO have control. Recently I've set up an authoritative NS. ( Have I already mentioned iptables can also do LB? |
Does it try all addresses (i.e. all A and AAAA RRs directly or via CNAME) or does it try only the first one? I have seen many applications which take the first address they find and if that isn't working they fail... |
It tries all, admittedly even "too fanatic": |
@julianbrost Shall we just put a fullstop on this discussion and close this issue + PR? DNS can provide the LB and IMAO LB should shuffle/rotate the addresses in its responses. |
Hi,
While I know Split Brain is not as critical for Icinga 2 instances as it is for other highly available software there are scenarios where you definitely don't want to write two nodes different status information into your IDO database.
Many large enterprises and some well designed smaller networks have multiple networks that connect servers. e.g. administrative network, backup network, direct link between nodes and so on.
It would be nice to make use of such extra connections to more precisely determine the status of other endpoints.
Cheers,
Thomas
The text was updated successfully, but these errors were encountered: