Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with zones after upgrading to 2.11 #7519

Closed
ctrlaltca opened this issue Sep 20, 2019 · 6 comments
Closed

Problem with zones after upgrading to 2.11 #7519

ctrlaltca opened this issue Sep 20, 2019 · 6 comments
Labels
area/distributed Distributed monitoring (master, satellites, clients)

Comments

@ctrlaltca
Copy link
Contributor

ctrlaltca commented Sep 20, 2019

Describe the bug

Hi, I have some problems after upgrading a 2.10 installation to 2.11.
I have one master, one satellite and a few agents; the configuration is top-bottom.

MASTER
The master defines the satellite endpoint and zone in the main zones.conf:

object Endpoint "srv06.foobar.local" { }
object Zone "srv06.foobar.local" {
    endpoints = [ "srv06.foobar.local" ]
    parent = "master"
}

The directory zones.d/srv06.foobar.local/ contains a hosts.conf file with the definition of the host srv06.foobar.local:

object Host "srv06.foobar.local" {
....

The same directory zones.d/srv06.foobar.local/ also contains a childs.conf file with the definition of the agent's endpoints and zones:

object Endpoint "srv01.foobar.local" { }
object Zone "srv01.foobar.local" {
    endpoints = [ "srv01.foobar.local" ]
    parent = "srv06.foobar.local"
}

object Endpoint "srv03.foobar.local" { }
object Zone "srv03.foobar.local" {
    endpoints = [ "srv03.foobar.local" ]
    parent = "srv06.foobar.local"
}
...

Each agent then has its own directory, eg. zones.d/srv01.foobar.local/ containing a hosts.conf file with the host definition:

object Host "srv01.foobar.local" {
...

After upgrading to icinga 2.11, the configuration check spits some warnings:

[2019-09-20 08:42:16 +0200] information/cli: Icinga application loader (version: 2.11.0-1)
[2019-09-20 08:42:16 +0200] information/cli: Loading configuration file(s).
[2019-09-20 08:42:16 +0200] warning/config: Ignoring directory '/etc/icinga2/zones.d/srv01.foobar.local' for unknown zone 'srv01.foobar.local'.
[2019-09-20 08:42:16 +0200] warning/config: Ignoring directory '/etc/icinga2/zones.d/srv03.foobar.local' for unknown zone 'srv03.foobar.local'.

As the warning implies, the directories containing the agent hosts definitions are skipped, and i noticed the agents failed the pre-reload config check.

I found this note in the "Upgrading to v2.11 - Config Sync":

Zone directories which are not configured in zones.conf, are not included anymore on secondary master/satellites/clients.

I then tried to move the endpoint and zone definitions on the master from zones.d/srv06.foobar.local/childs.conf to the main zones.conf.
The master now doesn't complain anymore (no warning about ignored directories), but these zones are not synced to the satellite anymore, as I can see in the satellite log:

[2019-09-20 08:44:55 +0200] warning/ApiListener: Ignoring config update for unknown zone 'srv01.foobar.local'.
[2019-09-20 08:44:55 +0200] warning/ApiListener: Ignoring config update for unknown zone 'srv03.foobar.local'.

Where should i put these zones in order for the master to correctly sync them to the satellite?

  • Version used (icinga2 --version): 2.11.0-1
  • Operating System and version: Amazon Linux AMI 2018.03
  • Enabled features (icinga2 feature list): api checker command ido-mysql mainlog notification
  • Icinga Web 2 version and modules (System - About): 2.7.1, monitoring 2.7.1
  • Config validation (icinga2 daemon -C): see above
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
@dnsmichi
Copy link
Contributor

dnsmichi commented Sep 20, 2019

Hi,

thanks for all the details. I'm trying to follow, so let's sum this up what I get:

  • You're syncing the agent Endpoint/Zone objects to the satellite from zones.d/satellitezonename
  • Each agent has its dedicated zones.d directory with specific local configuration. This is then a local check scheduler and no command_endpoint
  • The master and satellite do not find the agent zone, and therefore deny to include this.

It is a chicken egg problem. With the cluster config sync stages, one improvement was also made - to not include every directory automatically, but only those where Zone objects have been configured. Even this implementation required us to guess from config items (not objects) in advance, but tests have proven reliability.

#6716

The problem you have - since the agent zone object hides in another zone, the config compiler only reads the satellite zone and all its content. At this stage it doesn't know about the config items/objects in there, excluding your agent Zone objects. They will never be synced.

Configuring and syncing Zone objects via the zones.d directory inside the cluster config sync was never supported nor intended in its design. As said, chicken egg problem. If someone says - fix this, I frankly and honestly have no idea how.

The easiest fix is to move the agent Zone objects out of zones.d into the zones.conf of both the masters and satellites. You can also do something like mkdir -p /etc/icinga2/agent.zones.d && echo include_recursive "agent.zones.d"' >> /etc/icinga2/icinga2.conf and put your agent config there.

Disclaimer: That's only needed for agents which have their own zone for the cluster config sync. command_endpoint agents don't require this step.

Cheers,
Michael

@dnsmichi dnsmichi added needs feedback We'll only proceed once we hear from you again area/distributed Distributed monitoring (master, satellites, clients) labels Sep 20, 2019
@ctrlaltca
Copy link
Contributor Author

I supposed that my setup was somewhat non-standard and hackish, but until now I followed the golden rule "if-it-works-don't-touch-it" :)
I duplicated the agent Zone objects, adding them both in the master and in the satellite's zones.conf; after zapping the old contents of api/zones/ and restarting the service everything started working again.
For the future I'll investigate the usage of command_endpoint, but that will probably take some time/tests to create the specific apply rules for agents.
Thanks a lot for your help!

@latuannetnam
Copy link

I also have the same problem with icinga2 2.11. We use Director to configure Zone and Endpoint for master and satellite cluster and don't know how to fix with Director.
So my only solution for now is downgrading to icinga2 2.10 as below:
yum downgrade icinga2-2.10.5-1.el7.icinga.x86_64 icinga2-bin-2.10.5-1.el7.icinga.x86_64 icinga2-common-2.10.5-1.el7.icinga.x86_64 icinga2-ido-mysql-2.10.5-1.el7.icinga.x86_64 -y

@dnsmichi
Copy link
Contributor

@ctrlaltca

I expected things to break with changing this, unfortunately the "other" issue had more importance for making troubleshooting tremendously hard with left-over zones and what not.

If someone comes up with a better solution, or algorithm, feel free to share. My design thoughts are illustrated in detail in #6716 which should help understand the root cause and motivation.

Thanks for your understanding, this helps with the always "bad" feedback after pushing a release.

@latuannetnam

You are using an unsupported scenario unfortunately. While it sounds "easy" to use the Director infrastructure tab, and not care about zones.conf, this brings you into the chicken egg problem again.

Here's some collected infos on the matter: https://community.icinga.com/t/icinga-2-11-released/2255/2

Continue on the community forums please.

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Sep 20, 2019
@dnsmichi
Copy link
Contributor

@ctrlaltca We're discussing how we can avoid such short comings, we just were not aware that one can build it this way. With 2.11, we tried to be more strict following along for future features we plan to add.

One thing definitely is making the Zone/Endpoint object handling easier. I'm not sure yet how, but rest assured we will be working on this in the future.

Meanwhile we will be discussing next week, if we can "fix the fix", but this is somewhat a Zone inception, a really tough one. For now, I'd suggest rolling back to 2.10.x unfortunately - unless you have adjusted and fixed it already, and are in need of the more stable cluster itself.

Have a nice weekend,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distributed Distributed monitoring (master, satellites, clients)
Projects
None yet
Development

No branches or pull requests

3 participants