Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Objects are removed from IDO during reload with config change #7125

Closed
widhalmt opened this issue Apr 17, 2019 · 20 comments · Fixed by #7157
Closed

Objects are removed from IDO during reload with config change #7125

widhalmt opened this issue Apr 17, 2019 · 20 comments · Fixed by #7157
Assignees
Labels
area/db-ido Database output bug Something isn't working ref/IP ref/NC
Milestone

Comments

@widhalmt
Copy link
Member

We have an issue where some minutes after a reload with new configuration is triggered, all objects (at least hosts and services) are removed from the IDO and Icinga Web 2 shows an empty list. Some minutes later all reappear at once.

This happens in a rather large setup.

Sorry for being sparse on information, I just wanted to open the ticket so we all can start putting information in it, so expect the detail data coming up.

All I can say now is that it happens on RHEL 7 with Icinga 2 2.10.4 and happened with other 2.10 versions as well.

@widhalmt
Copy link
Member Author

ref/NC/603111

@widhalmt
Copy link
Member Author

ref/IC/13763

@dnsmichi dnsmichi added area/db-ido Database output needs feedback We'll only proceed once we hear from you again labels Apr 17, 2019
@fbs
Copy link

fbs commented Apr 18, 2019

I think we're seeing the same sort of issue. In our case only hosts disappear for a while the services seem to stay.

  • r2.10.4-1 on an up-to-date CentOS 7
  • 1 zone, master
  • 2 "master nodes" in the zone
  • Config is placed in "zones.d/master/{services,hosts}.conf" on n01.master and synced to the other

We have 1895 hosts and 40533 services. Every hour puppet refreshes the inventory and
reloads Icinga. In this case one host with 15 services was removed, resulting in:

mysqlbinlog mariadb-bin.038922 |  awk '/190418/ { s = $1 FS $2 } /UPDATE icinga_objects SET/ { $NF=""; print s, $0; s="" }'  | uniq -c
   2104 #190418 12:02:57 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
     15 #190418 12:03:32 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
   2103 #190418 12:03:32 #Q> UPDATE icinga_objects SET is_active = 1 WHERE object_id =

And a similar 2 hours earlier

   2072 #190418 10:02:46 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
     14 #190418 10:03:17 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
   2071 #190418 10:03:17 #Q> UPDATE icinga_objects SET is_active = 1 WHERE object_id =

All these objects appear to be hosts, not services. So all hosts disappeared from the db for ~ 30 seconds.

If helpful I could look into collecting debug logs.

@dnsmichi
Copy link
Contributor

Hm. Can I see how the database was created?

show create database icinga;

show create table icinga_objects;
show create table icinga_hosts;
show create table icinga_services;

Which MySQL/MariaDB version is used here?

In order to trigger such a difference, the names between the config objects and the database objects need to be different. This cannot be the case, really, and may tests only show that one host object as added/removed, depending on the case.

I suspect a problem with collations or strings here, something where Icinga thinks that everything has changed, and as such, first disabling all the host objects ...

https://github.com/Icinga/icinga2/blob/master/lib/db_ido_mysql/idomysqlconnection.cpp#L425

... and then finding out that all host objects need to be added again.

https://github.com/Icinga/icinga2/blob/master/lib/db_ido_mysql/idomysqlconnection.cpp#L460

One idea while reading the code: if dbobj->GetObject() fails, the owning reference between from a ConfigObject is missing for this database object.

This is done beforehand in

DbObject::Ptr dbobj = dbtype->GetOrCreateObjectByName(row->Get("name1"), row->Get("name2"));

https://github.com/Icinga/icinga2/blob/master/lib/db_ido/dbtype.cpp#L71

This takes name1 and name2 into account, name2 is empty for host objects, set for services.

On the other hand, it may be the cause that dbobj->SetObject(ConfigObject::GetObject(objType, objName)); does not return a valid config object yet at this stage.

Meaning to say, the reconnect timer started when the config object activation is still in progress, and there's no active host objects yet. Coming from 13a8fa2 we introduced a load dependent order and c54e042 sets that. According to the code, the activation is just fine - 0 is checkable objects, 100 for features. That way the IDO starts after all normal config objects are loaded.

And obviously I would see that fail in my tests, which brings me back to the database collation thing, and string comparisons.

Cheers,
Michael

@bunghi
Copy link

bunghi commented Apr 18, 2019

I think we're seeing the same sort of issue. In our case only hosts disappear for a while the services seem to stay.

  • r2.10.4-1 on an up-to-date CentOS 7
  • 1 zone, master
  • 2 "master nodes" in the zone
  • Config is placed in "zones.d/master/{services,hosts}.conf" on n01.master and synced to the other

We have 1895 hosts and 40533 services. Every hour puppet refreshes the inventory and
reloads Icinga. In this case one host with 15 services was removed, resulting in:

mysqlbinlog mariadb-bin.038922 |  awk '/190418/ { s = $1 FS $2 } /UPDATE icinga_objects SET/ { $NF=""; print s, $0; s="" }'  | uniq -c
   2104 #190418 12:02:57 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
     15 #190418 12:03:32 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
   2103 #190418 12:03:32 #Q> UPDATE icinga_objects SET is_active = 1 WHERE object_id =

And a similar 2 hours earlier

   2072 #190418 10:02:46 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
     14 #190418 10:03:17 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
   2071 #190418 10:03:17 #Q> UPDATE icinga_objects SET is_active = 1 WHERE object_id =

All these objects appear to be hosts, not services. So all hosts disappeared from the db for ~ 30 seconds.

If helpful I could look into collecting debug logs.

I'm facing the same problem..

@dnsmichi
Copy link
Contributor

Please answer my questions from #7125 (comment) as well.

@bunghi
Copy link

bunghi commented Apr 18, 2019

I'm using version: 10.1.37-MariaDB-0+deb9u1 Debian 9.6

Not sure which is the database to query:

MariaDB [(none)]> show databases;
+---------------------+
| Database            |
+---------------------+
| #mysql50#lost+found |
| director            |
| icinga2             |
| icingaweb2          |
| information_schema  |
| mysql               |
| performance_schema  |
| phpmyadmin          |
| snmptt              |
+---------------------+

@dnsmichi
Copy link
Contributor

icinga2.

@bunghi
Copy link

bunghi commented Apr 18, 2019

MariaDB [(none)]> show create database icinga2;
+----------+------------------------------------------------------------------+
| Database | Create Database                                                  |
+----------+------------------------------------------------------------------+
| icinga2  | CREATE DATABASE `icinga2` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+------------------------------------------------------------------+
MariaDB [(none)]> use icinga2;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [icinga2]> show create table icinga_objects;
+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table          | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| icinga_objects | CREATE TABLE `icinga_objects` (
  `object_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `instance_id` bigint(20) unsigned DEFAULT '0',
  `objecttype_id` bigint(20) unsigned DEFAULT '0',
  `name1` varchar(128) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT '',
  `name2` varchar(128) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL,
  `is_active` smallint(6) DEFAULT '0',
  PRIMARY KEY (`object_id`),
  KEY `objecttype_id` (`objecttype_id`,`name1`,`name2`),
  KEY `objects_objtype_id_idx` (`objecttype_id`),
  KEY `objects_name1_idx` (`name1`),
  KEY `objects_name2_idx` (`name2`),
  KEY `objects_inst_id_idx` (`instance_id`),
  KEY `sla_idx_obj` (`objecttype_id`,`is_active`,`name1`)
) ENGINE=InnoDB AUTO_INCREMENT=55648 DEFAULT CHARSET=utf8 COMMENT='Current and historical objects of all kinds' |
+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [icinga2]> show create table icinga_hosts;
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| icinga_hosts | CREATE TABLE `icinga_hosts` (
  `host_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `instance_id` bigint(20) unsigned DEFAULT '0',
  `config_type` smallint(6) DEFAULT '0',
  `host_object_id` bigint(20) unsigned DEFAULT '0',
  `alias` varchar(255) CHARACTER SET latin1 DEFAULT '',
  `display_name` varchar(255) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT '',
  `address` varchar(128) CHARACTER SET latin1 DEFAULT '',
  `address6` varchar(128) CHARACTER SET latin1 DEFAULT '',
  `check_command_object_id` bigint(20) unsigned DEFAULT '0',
  `check_command_args` text CHARACTER SET latin1,
  `eventhandler_command_object_id` bigint(20) unsigned DEFAULT '0',
  `eventhandler_command_args` text CHARACTER SET latin1,
  `notification_timeperiod_object_id` bigint(20) unsigned DEFAULT '0',
  `check_timeperiod_object_id` bigint(20) unsigned DEFAULT '0',
  `failure_prediction_options` varchar(128) CHARACTER SET latin1 DEFAULT '',
  `check_interval` double DEFAULT '0',
  `retry_interval` double DEFAULT '0',
  `max_check_attempts` smallint(6) DEFAULT '0',
  `first_notification_delay` double DEFAULT '0',
  `notification_interval` double DEFAULT '0',
  `notify_on_down` smallint(6) DEFAULT '0',
  `notify_on_unreachable` smallint(6) DEFAULT '0',
  `notify_on_recovery` smallint(6) DEFAULT '0',
  `notify_on_flapping` smallint(6) DEFAULT '0',
  `notify_on_downtime` smallint(6) DEFAULT '0',
  `stalk_on_up` smallint(6) DEFAULT '0',
  `stalk_on_down` smallint(6) DEFAULT '0',
  `stalk_on_unreachable` smallint(6) DEFAULT '0',
  `flap_detection_enabled` smallint(6) DEFAULT '0',
  `flap_detection_on_up` smallint(6) DEFAULT '0',
  `flap_detection_on_down` smallint(6) DEFAULT '0',
  `flap_detection_on_unreachable` smallint(6) DEFAULT '0',
  `low_flap_threshold` double DEFAULT '0',
  `high_flap_threshold` double DEFAULT '0',
  `process_performance_data` smallint(6) DEFAULT '0',
  `freshness_checks_enabled` smallint(6) DEFAULT '0',
  `freshness_threshold` int(11) DEFAULT '0',
  `passive_checks_enabled` smallint(6) DEFAULT '0',
  `event_handler_enabled` smallint(6) DEFAULT '0',
  `active_checks_enabled` smallint(6) DEFAULT '0',
  `retain_status_information` smallint(6) DEFAULT '0',
  `retain_nonstatus_information` smallint(6) DEFAULT '0',
  `notifications_enabled` smallint(6) DEFAULT '0',
  `obsess_over_host` smallint(6) DEFAULT '0',
  `failure_prediction_enabled` smallint(6) DEFAULT '0',
  `notes` text CHARACTER SET latin1,
  `notes_url` text CHARACTER SET latin1,
  `action_url` text CHARACTER SET latin1,
  `icon_image` text CHARACTER SET latin1,
  `icon_image_alt` text CHARACTER SET latin1,
  `vrml_image` text CHARACTER SET latin1,
  `statusmap_image` text CHARACTER SET latin1,
  `have_2d_coords` smallint(6) DEFAULT '0',
  `x_2d` smallint(6) DEFAULT '0',
  `y_2d` smallint(6) DEFAULT '0',
  `have_3d_coords` smallint(6) DEFAULT '0',
  `x_3d` double DEFAULT '0',
  `y_3d` double DEFAULT '0',
  `z_3d` double DEFAULT '0',
  `config_hash` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`host_id`),
  UNIQUE KEY `instance_id` (`instance_id`,`config_type`,`host_object_id`),
  KEY `host_object_id` (`host_object_id`),
  KEY `hosts_i_id_idx` (`instance_id`),
  KEY `hosts_host_object_id_idx` (`host_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2446 DEFAULT CHARSET=utf8 COMMENT='Host definitions' |
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [icinga2]> show create table icinga_services;
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table           | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| icinga_services | CREATE TABLE `icinga_services` (
  `service_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `instance_id` bigint(20) unsigned DEFAULT '0',
  `config_type` smallint(6) DEFAULT '0',
  `host_object_id` bigint(20) unsigned DEFAULT '0',
  `service_object_id` bigint(20) unsigned DEFAULT '0',
  `display_name` varchar(255) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT '',
  `check_command_object_id` bigint(20) unsigned DEFAULT '0',
  `check_command_args` text CHARACTER SET latin1,
  `eventhandler_command_object_id` bigint(20) unsigned DEFAULT '0',
  `eventhandler_command_args` text CHARACTER SET latin1,
  `notification_timeperiod_object_id` bigint(20) unsigned DEFAULT '0',
  `check_timeperiod_object_id` bigint(20) unsigned DEFAULT '0',
  `failure_prediction_options` varchar(64) CHARACTER SET latin1 DEFAULT '',
  `check_interval` double DEFAULT '0',
  `retry_interval` double DEFAULT '0',
  `max_check_attempts` smallint(6) DEFAULT '0',
  `first_notification_delay` double DEFAULT '0',
  `notification_interval` double DEFAULT '0',
  `notify_on_warning` smallint(6) DEFAULT '0',
  `notify_on_unknown` smallint(6) DEFAULT '0',
  `notify_on_critical` smallint(6) DEFAULT '0',
  `notify_on_recovery` smallint(6) DEFAULT '0',
  `notify_on_flapping` smallint(6) DEFAULT '0',
  `notify_on_downtime` smallint(6) DEFAULT '0',
  `stalk_on_ok` smallint(6) DEFAULT '0',
  `stalk_on_warning` smallint(6) DEFAULT '0',
  `stalk_on_unknown` smallint(6) DEFAULT '0',
  `stalk_on_critical` smallint(6) DEFAULT '0',
  `is_volatile` smallint(6) DEFAULT '0',
  `flap_detection_enabled` smallint(6) DEFAULT '0',
  `flap_detection_on_ok` smallint(6) DEFAULT '0',
  `flap_detection_on_warning` smallint(6) DEFAULT '0',
  `flap_detection_on_unknown` smallint(6) DEFAULT '0',
  `flap_detection_on_critical` smallint(6) DEFAULT '0',
  `low_flap_threshold` double DEFAULT '0',
  `high_flap_threshold` double DEFAULT '0',
  `process_performance_data` smallint(6) DEFAULT '0',
  `freshness_checks_enabled` smallint(6) DEFAULT '0',
  `freshness_threshold` int(11) DEFAULT '0',
  `passive_checks_enabled` smallint(6) DEFAULT '0',
  `event_handler_enabled` smallint(6) DEFAULT '0',
  `active_checks_enabled` smallint(6) DEFAULT '0',
  `retain_status_information` smallint(6) DEFAULT '0',
  `retain_nonstatus_information` smallint(6) DEFAULT '0',
  `notifications_enabled` smallint(6) DEFAULT '0',
  `obsess_over_service` smallint(6) DEFAULT '0',
  `failure_prediction_enabled` smallint(6) DEFAULT '0',
  `notes` text CHARACTER SET latin1,
  `notes_url` text CHARACTER SET latin1,
  `action_url` text CHARACTER SET latin1,
  `icon_image` text CHARACTER SET latin1,
  `icon_image_alt` text CHARACTER SET latin1,
  `config_hash` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`service_id`),
  UNIQUE KEY `instance_id` (`instance_id`,`config_type`,`service_object_id`),
  KEY `service_object_id` (`service_object_id`),
  KEY `services_i_id_idx` (`instance_id`),
  KEY `services_host_object_id_idx` (`host_object_id`),
  KEY `services_combined_object_idx` (`service_object_id`,`host_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=52338 DEFAULT CHARSET=utf8 COMMENT='Service definitions' |
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [icinga2]>


@fbs
Copy link

fbs commented Apr 18, 2019

Full schema dump (mysqldump --no-data icinga)

schema.sql.txt

Package versions:

perl-DBD-MySQL-4.023-6.el7.x86_64
MariaDB-backup-10.2.23-1.el7.centos.x86_64
nagios-plugins-mysql-2.2.1-16.20180725git3429dad.el7.x86_64
glibc-headers-2.17-260.el7_6.3.x86_64
glibc-common-2.17-260.el7_6.3.x86_64
mysql-connector-python-1.1.6-1.el7.noarch
collectd-mysql-5.8.1-1.el7.x86_64
galera-25.3.26-1.rhel7.el7.centos.x86_64
MariaDB-compat-10.2.23-1.el7.centos.x86_64
MariaDB-server-10.2.23-1.el7.centos.x86_64
MariaDB-client-10.2.23-1.el7.centos.x86_64
MariaDB-common-10.2.23-1.el7.centos.x86_64
glibc-devel-2.17-260.el7_6.3.x86_64
glibc-2.17-260.el7_6.3.x86_64
icinga2-ido-mysql-2.10.4-1.el7.icinga.x86_64

Everything is done with the supplied puppet modules and icinga packages.

@dnsmichi
Copy link
Contributor

My database uses utf8mb4 on my Macbook which - according to the Internet - is the correct way of using UTF8 with MySQL. In the Vagrant boxes, this still runs with latin1.

https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

This may have an influence here, next to the MySQL/MariaDB version of course. I can see that both of you use utf8 as character set.

Maybe we can figure it out by querying for wrong column length.

select * from icinga_objects where length(name1) = char_length(name1);
select * from icinga_objects where length(name2) = char_length(name2);

all multi-byte values within UTF8 should not be shown, can can instead be queries with the > comparison operator.

The above query will also show plain strings, and might be longer than expected.

Look for the ? telling you about a problematic character being inserted rendering it to crap.

I could imagine that especially host object names contain special characters, while services and others do not.

Coming to a Director deployment (seen that in a customer ticket), where import sources generate the hostname, ... what if real utf8 is inserted into crap utf8? This will always lead to non-matching names in Icinga vs IDO DB objects.

More tests

utf8 vs utf8mb4

I've tried a simple test case with 5.5.60 MariaDB (CentOS) and 10.3.11-MariaDB on macOS.

mysql

create database test;
use test;
create table objects ( name varchar(256) ) ;

MariaDB [test]> insert into objects (name) values ('foo𝌆bar');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9D\x8C\x86ba...' for column 'name' at row 1

MariaDB [test]> set names utf8;
Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> insert into objects (name) values ('foo𝌆bar');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9D\x8C\x86ba...' for column 'name' at row 1

MariaDB [test]> set names utf8mb4;
Query OK, 0 rows affected (0.000 sec)

MariaDB [test]> insert into objects (name) values ('foo𝌆bar');
Query OK, 1 row affected (0.001 sec)

MariaDB [test]> select * from objects;
+------------+
| name       |
+------------+
| foo𝌆bar     |
+------------+
1 row in set (0.000 sec)

MariaDB [test]> select * from objects where length(name) = char_length(name);
Empty set (0.001 sec)

MariaDB [test]> select * from objects where length(name) > char_length(name);
+------------+
| name       |
+------------+
| foo𝌆bar     |
+------------+
1 row in set (0.000 sec)

I would consider that MySQL/MariaDB throws an error on insert, if these problems happen, but I can only see that with 10.3 MariaDB. In 5.5.60 this generates a warning, but the resulting values are broken.

MariaDB [test]> select * from objects where length(name) = char_length(name);
+------------+
| name       |
+------------+
| fooßbar    |
| fooöbar    |
| foo????bar |
| foo?bar    |
| foo?bar    |
| foo????bar |
+------------+
6 rows in set (0.00 sec)

I'm not sure whether the strict mode plays a role here, https://www.linode.com/community/questions/17070/how-can-i-disable-mysql-strict-mode

utf8

MariaDB [test]> create database test2 character set utf8
    -> ;
Query OK, 1 row affected (0.001 sec)

MariaDB [test]> use test2
Database changed
MariaDB [test2]> create table objects ( name varchar(256) ) default charset=utf8 ;
Query OK, 0 rows affected (0.016 sec)

MariaDB [test2]> insert into objects (name) values ('foo𝌆bar');
ERROR 1366 (22007): Incorrect string value: '\xF0\x9D\x8C\x86ba...' for column 'name' at row 1
MariaDB [test2]> SHOW VARIABLES LIKE 'sql_mode';
+---------------+-------------------------------------------------------------------------------------------+
| Variable_name | Value                                                                                     |
+---------------+-------------------------------------------------------------------------------------------+
| sql_mode      | STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+---------------+-------------------------------------------------------------------------------------------+
1 row in set (0.007 sec)

utf8mb4

MariaDB [test2]> create database test3 character set utf8mb4;
Query OK, 1 row affected (0.003 sec)

MariaDB [test2]> use test3
Database changed
MariaDB [test3]> create table objects ( name varchar(256) ) default charset=utf8mb4 ;
Query OK, 0 rows affected (0.010 sec)

MariaDB [test3]> insert into objects (name) values ('foo𝌆bar');
Query OK, 1 row affected (0.001 sec)

MariaDB [test3]> select * from objects where length(name) > char_length(name);
+------------+
| name       |
+------------+
| foo𝌆bar     |
+------------+
1 row in set (0.000 sec)

MariaDB [test3]> select * from objects where length(name) = char_length(name);
Empty set (0.000 sec)

@fbs
Copy link

fbs commented Apr 18, 2019

Mysql seems fine:

> select count(*) from icinga_objects;
|    80611 |

> select count(*) from icinga_objects where length(name1) = char_length(name1);
|    80611 |

> select count(*) from icinga_objects where length(name2) = char_length(name2);
|    78173 |

> select count(*) from icinga_objects where length(name2) != char_length(name2) and objecttype_id != 1;
|        0 |
> select * from icinga_objects where length(name1) > char_length(name1);
Empty set (0.04 sec)

> select * from icinga_objects where length(name2) > char_length(name2);
Empty set (0.05 sec)

We only have ascii hostnames and the way we generate config from puppetdb (custom) should ensure (i'll check) ascii as well.

Config seems to be fully ascii:

$ sudo iconv -f ascii -t utf-8 /etc/icinga2/zones.d/master/hosts.conf -o x
609eb232fccdbc932e7dd9a2d2c76d8dc453e81c  /etc/icinga2/zones.d/master/hosts.conf
$ sha1sum
609eb232fccdbc932e7dd9a2d2c76d8dc453e81c  /etc/icinga2/zones.d/master/hosts.conf
609eb232fccdbc932e7dd9a2d2c76d8dc453e81c  x
$ sudo file /etc/icinga2/zones.d/master/hosts.conf
/etc/icinga2/zones.d/master/hosts.conf: ASCII text
$ sudo file /etc/icinga2/zones.d/master/services.conf
/etc/icinga2/zones.d/master/services.conf: ASCII text

I've just noticed the same happens in our ACCP setup, I will try mb4 there. I've checked the debug log but I'm not sure what to look for. As far as I can tell DB updates are batched, making
correlation to earlier events a bit difficult

The log lines seem to arrive a while later than the db updates:

100x same
[2019-04-18 16:34:01 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 143545
[2019-04-18 16:34:01 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 1 WHERE object_id = 6
100x same
$ mysqlbinlog mariadb-bin.007313 |  awk '/190418/ { s = $1 FS $2 } /UPDATE icinga_objects SET/ { $NF=""; print s, $0; s="" }'  | uniq -c
    231 #190418 16:33:58 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
    129 #190418 16:34:01 #Q> UPDATE icinga_objects SET is_active = 0 WHERE object_id =
    231 #190418 16:34:01 #Q> UPDATE icinga_objects SET is_active = 1 WHERE object_id =

@Al2Klimov
Copy link
Member

If helpful I could look into collecting debug logs.

Yes, @fbs, this would be extremely helpful.

All of you, please collect Icinga debug logs from start to re-appearing of the hosts and upload them here:

https://nextcloud.icinga.com/index.php/s/Q988968f5H5Kc6H

@dnsmichi
Copy link
Contributor

This is an upload-only URL, no-one except Icinga devs working on this issue will have access to the files.

@Al2Klimov
Copy link
Member

And please upload also the output of icinga2 daemon -C, preferably together with the logs in one ZIP/TAR/...

@dnsmichi
Copy link
Contributor

Assigning this to 2.11 as a working task. No guarantee though - if we cannot find it, it won't block the release.

@dnsmichi dnsmichi added this to the 2.11.0 milestone Apr 23, 2019
@Elias481
Copy link
Contributor

Elias481 commented Apr 25, 2019

Just taking a quick look into this (as I assumed this is expected behaviour as I saw the same as far as I remember).
If this could be a matter of charset... Looking into above show creates... And for reference in our db's show create...:
Most of the varchar/text columns are not utilizing utf8 anyway - they are ecplicitly created with latin1 charset - so it shouldn't matter whether utf8mb3 or utf8mb4 is used as default character set as the name1 and name2 fields are latin1. (On our database - even worse - the default charset is also latin1.)

Having UTF8MB4 instead of UTF8MB3 becomes relevant only for characters outside of BMP and that only if the fields where they are inserted to/read from are not an 8bit charset that is completely covered within BMP.
(But better says: using UTF8MB3 instead of UTF8MB4 should only be configured for fields that should be indexed and are to be defined longer then 191 chars [more specifically if charcount*4 bytes would cause a desired index to become bigger than the MySQL 755 bytes limit and charcount*3 bytes wouldn't] - or for the unlikely case that indeed UTF8MB3 (UTF-8 BMP only) is needed/a constraint outside of the DB so that database can be utilized to ensure this constraint).

And yes for MySQL <5.7 and MariaDB < 10.2.4 it's sadly a must to ensure strict mode to get an error instead of silent ignores for violation of length constraints etc.. (And for more recent versions it could also be a must for applications to ensure these new proper defaults are configured differently by mistake...)
It's better to get an exception that must be handled instead of data that is not persistet exactly as requested and required by schema.
(Or as I would say: it's better to not use MySQL where avoidable :-))

Anyway, regarding the problem...
Without diving to deep - and without knowing it exactly, I could mind this one is not a valid construct to handle NULL fields in MySQL:

for (field = m_Mysql->fetch_field(result.get()), i = 0; field; field = m_Mysql->fetch_field(result.get()), i++)
dict->Set(field->name, String(row[i], row[i] + lengths[i]));

Because this probably translates to String(InputIterator begin, InputIterator end) -> std::string(InputIterator begin, InputIterator end) - "if the range specified by [first,last) is not valid, it causes undefined behavior." and the range is not valid (its a nullpointer to a nullpointer+0 so at least the contract that "end" is reachable from "begin" by incrementing "begin" finite times is not given, even if the definition could count a null pointer dereferencable as it is incrementable).
So if it is undefined it's not definied whether resulting string is an empty string and as a result "something" could be appended to to name1 when searching for the object - that means at least a trailing exclamation mark and a lookup fail and a failing getObject...
So I would first check this. (But I have no test setup and no time now to validate this somehow, so it's just an idea...)

@dnsmichi
Copy link
Contributor

dnsmichi commented Apr 25, 2019

Charsets were a route where changes over different versions could have gotten into.

The better chance are to analyse why GetOrCreateObjectByName() doesn't work in this regard. This method ensures that all active objects have a config object pointer set, within the loop and with ConfigObject::GetObject()

https://github.com/Icinga/icinga2/blob/master/lib/db_ido_mysql/idomysqlconnection.cpp#L431
https://github.com/Icinga/icinga2/blob/master/lib/db_ido/dbtype.cpp#L103

The later loop which deactivates all objects, has a conditional continue which avoids any db object which has a config object pointer. If that pointer is NULL, the update is_active=0 query happens.

That's likely a race condition, and I am not yet sure what changed or how one can reproduce this. Seemingly it only affects host names, but not check commands where name2 also is a null value in MySQL.

It could also be the case that the IdoMysqlConnection object gets activated, and the reconnect timer starts "too soon", leaving the host objects not activated. The code speaks a different language with the defined activation order in a serial fashion.

Likely this did not happen before as it was in a random order, and hosts where always activated before idomysqlconnections (alphanumeric sort in memory). With the new activation order and some sorting beforehand, this likely has an influence here.

A critical section would be to implement a sleep into the host's start method, and see whether the IdoMysqlConnection objects gets started and the reconnect timer starts to run.

Just some thoughts on the matter - I am not actively working on it now.

@dnsmichi dnsmichi assigned dnsmichi and unassigned Al2Klimov May 3, 2019
@dnsmichi
Copy link
Contributor

dnsmichi commented May 3, 2019

More Analysis

I've received debug logs from a partner customer which unveils that the is_active=0 events are fired on reload shutdown, and not during the initial startup routine as described originally.

[2019-04-09 10:52:24 +0200] debug/IdoMysqlConnection: Query: COMMIT
[2019-04-09 10:52:24 +0200] debug/IdoMysqlConnection: Query: BEGIN
[2019-04-09 10:52:25 +0200] information/Application: Reload requested, letting new process take over.
[2019-04-09 10:52:25 +0200] information/ApiListener: 'api' stopped.
[2019-04-09 10:52:25 +0200] information/CheckerComponent: 'checker' stopped.
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 4
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 33353
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 139
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 505
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 142
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 115
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 433
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 101211
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 562
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 196
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 109
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 209706
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 201772
....

[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_objects SET is_active = 0 WHERE object_id = 436
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: COMMIT
[2019-04-09 10:52:25 +0200] debug/IdoMysqlConnection: Query: BEGIN
[2019-04-09 10:52:35 +0200] information/FileLogger: 'debug-file' started.
[2019-04-09 10:52:35 +0200] notice/WorkQueue: Spawning WorkQueue threads for 'ApiListener, RelayQueue'
...
[2019-04-09 10:52:40 +0200] information/ApiListener: 'api' started.
...
[2019-04-09 10:52:41 +0200] information/NotificationComponent: 'notification' started.
[2019-04-09 10:52:41 +0200] information/CheckerComponent: 'checker' started.
[2019-04-09 10:52:41 +0200] information/DbConnection: 'ido-mysql' started.

Code Analysis

DbConnection::UpdateObject() will be called for any runtime event, not only during the Reconnect() handling. This was changed with 2.4 where runtime objects could be created any time.

It has a safety hook though - whenever the application is in shutdown mode, it will not work on such updates in order to prevent false object updates.

That's fine and always has been. Application::IsShuttingDown() gets the value from m_ShuttingDown which is set during a TERM signal. Now, what happens on a Restart or Reload?

Actually, there's a behavioural change here. Previously a reload would also just wait for an application shutdown In Application::RunEventLoop(). This was changed with the Systemd watchdog for reloads in #5996 and uses a new handler called SigUsr2Handler().

Wait, there was a problem with this handler already where reloads didn't write the state file in #6691. And if one looks closer, the ConfigObject::StopObjects() is duplicated into this scope.

But what about m_ShuttingDown? Is this really set to true when a reload is triggered?

Obviously not, since this happens before the IcingaApplication is really shutdown, and is controlled via the global variable l_Reloading.

By exposing this variable and using it next to IsShuttingDown, one can mitigate the behaviour (and likely fix it).

Tests

Heavy config

My current "many.conf".

const countHosts = 10000;
const countServices = 10;
const countNotifications = 1

for (i in range(countHosts)) {
  object Host "many-test-" + i {
    check_command = "random"
    //check_command = "sleep"
  }
}

for (j in range(countServices)) {
  apply Service "many-test-" + j {
    check_command = "random"
    //check_command = "sleep"
    check_interval = 10s
    retry_interval = 10s
    assign where match("many*", host.name)
  }
}

for (k in range(countNotifications)) {
  apply Notification "many-test-" + k to Service {
    command = "mail-service-notification"
    users = [ "many" ]
    assign where match("many*", host.name)
    interval = 0s
  }
}

object User "many" { email = "michi@localhost" }

A long startup time

[2019-05-03 13:31:48 +0200] information/cli: Icinga application loader (version: v2.10.4-1-g6849aaad2; debug)
[2019-05-03 13:31:48 +0200] information/cli: Loading configuration file(s).
[2019-05-03 13:31:49 +0200] information/ConfigItem: Committing config item(s).
[2019-05-03 13:31:49 +0200] information/ApiListener: My API identity: mbpmif.int.netways.de
[2019-05-03 13:31:59 +0200] information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 0, rate: 17.2/s (1032/min 1032/5min 1032/15min);
[2019-05-03 13:31:59 +0200] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-05-03 13:31:59 +0200] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 217 CheckCommands.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 2 ApiUsers.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 10007 Hosts.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 3 HostGroups.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 100013 Notifications.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 3 NotificationCommands.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 UserGroup.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 100012 Services.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 3 ServiceGroups.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 4 Zones.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 3 Users.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-05-03 13:32:23 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-05-03 13:32:23 +0200] information/ScriptGlobal: Dumping variables to file '/usr/local/icinga/icinga2/var/cache/icinga2/icinga2.vars'
[2019-05-03 13:32:23 +0200] information/ConfigObject: Restoring program state from file '/usr/local/icinga/icinga2/var/lib/icinga2/icinga2.state'
[2019-05-03 13:32:23 +0200] information/ConfigObject: Restored 21 objects. Loaded 210259 new objects without state.
[2019-05-03 13:32:24 +0200] information/ConfigItem: Triggering Start signal for config items
[2019-05-03 13:32:24 +0200] information/FileLogger: 'main-log' started.
[2019-05-03 13:32:33 +0200] information/WorkQueue: #8 (DaemonCommand::Run) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2019-05-03 13:32:38 +0200] information/ApiListener: 'api' started.
[2019-05-03 13:32:38 +0200] information/ApiListener: Started new listener on '[::]:5665'
[2019-05-03 13:32:38 +0200] information/NotificationComponent: 'notification' started.
[2019-05-03 13:32:38 +0200] information/CheckerComponent: 'checker' started.
[2019-05-03 13:32:39 +0200] information/ConfigItem: Activated all objects.

This is an important measurement for the reload then.

[2019-05-03 13:33:27 +0200] information/Application: Got reload command: Starting new instance.
[2019-05-03 13:35:35 +0200] information/Application: Reload requested, letting new process take over.

The nasty things

Well, not obviously in the log, I'm using a custom patch here after having analysed the problem from the code parts.

[2019-05-03 13:57:05 +0200] information/Application: Reload requested, letting new process take over.
[2019-05-03 13:57:05 +0200] information/ApiListener: 'api' stopped.
[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'api' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'api' of type 'ApiListener'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'root' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'root' of type 'ApiUser'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'blah' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'blah' of type 'ApiUser'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'ido' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'ido' of type 'CheckCommand'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'passive' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'passive' of type 'CheckCommand'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'rbl' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'rbl' of type 'CheckCommand'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'nscp-local-uptime' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'nscp-local-uptime' of type 'CheckCommand'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'random' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'random' of type 'CheckCommand'

This goes on. CheckCommand objects are not evaluated against is_active in the database in Icinga Web 2 so this isn't seen by default.

Look how shiny the hosts are deactivated though following the alpha-numeric order.


[2019-05-03 13:57:05 +0200] information/CheckerComponent: 'checker' stopped.
[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'checker' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'checker' of type 'CheckerComponent'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!be13dd28-f8f6-4e59-9a78-5196c334006c' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!be13dd28-f8f6-4e59-9a78-5196c334006c' of type 'Downtime'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!e6790c05-cc89-482d-9434-177953207767' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!e6790c05-cc89-482d-9434-177953207767' of type 'Downtime'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de' of type 'Endpoint'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'winmif' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'winmif' of type 'Endpoint'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'main-log' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'main-log' of type 'FileLogger'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'many-test-693' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-693' of type 'Host'

[2019-05-03 13:57:05 +0200] warning/DbConnection: Updating object 'many-test-814' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-814' of type 'Host'

This goes on and on. HostGroups are inactive as well, this might be taken into account for permission filters in Icinga Web 2.

[2019-05-03 13:57:06 +0200] warning/DbConnection: Updating object 'category_openshift' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'category_openshift' of type 'HostGroup'

[2019-05-03 13:57:06 +0200] warning/DbConnection: Updating object 'windows-servers' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'windows-servers' of type 'HostGroup'

Well, and then there's the IcingaApplication. It is not about IdoMysqlConnection or reconnect handling here, we are in a fucking shutdown state.

Well, and IcingaApplication calls Stop() which sets m_ShuttingDown to true.

[2019-05-03 13:57:06 +0200] warning/DbConnection: Updating object 'app' 		 active '0' shutting down '1' restarting '1'.
Context:
	(0) Deactivating object 'app' of type 'IcingaApplication'

If this object type would exist in IDO MySQL it would deactivated. Obviously it isn't.

Going further, IdoMysqlConnection gets deactivated too.

[2019-05-03 13:57:06 +0200] information/IdoMysqlConnection: 'ido-mysql' paused.
[2019-05-03 13:57:06 +0200] information/DbConnection: Pausing IDO connection: ido-mysql

It has some queries in the queue, and the connection is dropped after that.

Conclusion

So, who can now explain why only hosts are affected?

Right. The only visible objects in alphanumeric order which are visible in Icinga Web 2 by is_active=0. The hostgroups are not that visible but likely a problem.

In Git Master, the problem is more obvious with the changes from #6970 stopping the IcingaApplication at the latest point possible. Good that this issue got some customer priority.

My analysis patch already includes safety hooks to not fire the query just to log the state (won't be enabled for production!).

Patch

Well. Too short for my nerves with the IDO code.

@dnsmichi
Copy link
Contributor

dnsmichi commented May 3, 2019

Another test round while cleaning the patch.

2.10

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'many-test-460' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-460' of type 'Host'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'many-test-314' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-314' of type 'Host'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'many-test-151' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-151' of type 'Host'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'category_openshift' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'category_openshift' of type 'HostGroup'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'windows-servers' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'windows-servers' of type 'HostGroup'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'linux-servers' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'linux-servers' of type 'HostGroup'

[2019-05-03 15:07:08 +0200] warning/DbConnection: Updating object 'app' 		 active '0' shutting down '1' restarting '1'.
Context:
	(0) Deactivating object 'app' of type 'IcingaApplication'

[2019-05-03 15:07:08 +0200] information/IdoMysqlConnection: 'ido-mysql' paused.
[2019-05-03 15:07:08 +0200] information/DbConnection: Pausing IDO connection: ido-mysql

2.11

[2019-05-03 15:18:45 +0200] information/IdoMysqlConnection: 'ido-mysql' paused.
[2019-05-03 15:18:45 +0200] information/DbConnection: 'ido-mysql' stopped.
[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'ido-mysql' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'ido-mysql' of type 'IdoMysqlConnection'

[2019-05-03 15:18:45 +0200] information/ApiListener: 'api' stopped.
[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'api' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'api' of type 'ApiListener'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!backup-downtime' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!backup-downtime' of type 'ScheduledDowntime'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'global-templates' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'global-templates' of type 'Zone'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'master' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'master' of type 'Zone'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'winmif' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'winmif' of type 'Zone'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'director-global' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'director-global' of type 'Zone'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'winmif' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'winmif' of type 'Endpoint'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de' of type 'Endpoint'

[2019-05-03 15:18:45 +0200] warning/DbConnection: Updating object 'icingaadmins' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'icingaadmins' of type 'UserGroup'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'many-test-48!many-test-8' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'many-test-48!many-test-8' of type 'Service'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object '24x7' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object '24x7' of type 'TimePeriod'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'never' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'never' of type 'TimePeriod'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!7ce4bec1-ed3c-43ba-889c-79fdcf90c5a6' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!7ce4bec1-ed3c-43ba-889c-79fdcf90c5a6' of type 'Downtime'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!be13dd28-f8f6-4e59-9a78-5196c334006c' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!be13dd28-f8f6-4e59-9a78-5196c334006c' of type 'Downtime'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!e614388c-e5d3-46ea-a803-41a884ab5360' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!e614388c-e5d3-46ea-a803-41a884ab5360' of type 'Downtime'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'mbpmif.int.netways.de!load!e6790c05-cc89-482d-9434-177953207767' 		 active '0' shutting down '0' restarting '1'.
Context:
	(0) Deactivating object 'mbpmif.int.netways.de!load!e6790c05-cc89-482d-9434-177953207767' of type 'Downtime'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'app' 		 active '0' shutting down '1' restarting '1'.
Context:
	(0) Deactivating object 'app' of type 'IcingaApplication'

[2019-05-03 15:18:46 +0200] warning/DbConnection: Updating object 'main-log' 		 active '0' shutting down '1' restarting '1'.
Context:
	(0) Deactivating object 'main-log' of type 'FileLogger'

2.11 has the more immanent problem with the changed deactivation order, relying on the reload signal even more.

dnsmichi pushed a commit that referenced this issue May 3, 2019
This follows the same principle as with the shutdown handler,
and was introduced with the changed reload handling with 2.9.
Previously IsShuttingDown() was sufficient which got set at one
location.

SigUsr2 as handler introduced a new location where m_ShuttingDown
is not necessarily set yet. Since this handler gets called when
l_Restarting is enabled, we'll use this flag to avoid config update
events resulting in object deactivation (object->IsActive() always
returns false).

refs #5996
refs #6691
refs #6970

fixes #7125
@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels May 3, 2019
@dnsmichi dnsmichi unpinned this issue May 6, 2019
@dnsmichi dnsmichi modified the milestones: 2.11.0, 2.10.5 May 16, 2019
dnsmichi pushed a commit that referenced this issue May 16, 2019
This follows the same principle as with the shutdown handler,
and was introduced with the changed reload handling with 2.9.
Previously IsShuttingDown() was sufficient which got set at one
location.

SigUsr2 as handler introduced a new location where m_ShuttingDown
is not necessarily set yet. Since this handler gets called when
l_Restarting is enabled, we'll use this flag to avoid config update
events resulting in object deactivation (object->IsActive() always
returns false).

refs #5996
refs #6691
refs #6970

fixes #7125

(cherry picked from commit 78e24c5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-ido Database output bug Something isn't working ref/IP ref/NC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants