Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DB IDO HA failover behaviour #7062

Merged
merged 4 commits into from
Apr 1, 2019
Merged

Conversation

dnsmichi
Copy link
Contributor

[2019-03-29 16:13:53 +0100] information/IdoMysqlConnection: Last update by endpoint 'master1' was 8.33246s ago (< failover timeout of 30s). Retrying.

[2019-03-29 16:14:23 +0100] information/IdoMysqlConnection: Last update by endpoint 'master1' was 38.3288s ago. Taking over 'ido-mysql' in HA zone 'master'.
  • Add more logging for reconnect and disconnect handling
  • Add 'last_failover' attribute to IDO*Connection objects

refs #6970

@dnsmichi dnsmichi added enhancement New feature or request area/distributed Distributed monitoring (master, satellites, clients) area/db-ido Database output labels Mar 29, 2019
@dnsmichi dnsmichi added this to the 2.11.0 milestone Mar 29, 2019
Michael Friedrich added 2 commits April 1, 2019 08:50
- Decrease Object Authority updates to 10s (was 30s)
- Decrease failover timeout to 30s (was 60s)
- Decrease cold startup (after (re)start) with no OA updates to 30s (was 60s)
- Immediately connect on Resume()
- Fix query priority which got broken with #6970
- Add more logging when a failover is in progress

```
[2019-03-29 16:13:53 +0100] information/IdoMysqlConnection: Last update by endpoint 'master1' was 8.33246s ago (< failover timeout of 30s). Retrying.

[2019-03-29 16:14:23 +0100] information/IdoMysqlConnection: Last update by endpoint 'master1' was 38.3288s ago. Taking over 'ido-mysql' in HA zone 'master'.
```

- Add more logging for reconnect and disconnect handling
- Add 'last_failover' attribute to IDO*Connection objects

refs #6970
@dnsmichi dnsmichi force-pushed the bugfix/ido-reconnect-failover branch from 515c8a7 to 4a26a48 Compare April 1, 2019 06:51
@dnsmichi
Copy link
Contributor Author

dnsmichi commented Apr 1, 2019

Had this running on my Macbook over the weekend by accident, works like a charm.

master2:

[2019-04-01 08:33:17 +0200] information/DbConnection: 'ido-mysql' stopped.
[2019-04-01 08:33:17 +0200] information/ApiListener: 'api' stopped.
[2019-04-01 08:33:17 +0200] information/ConfigObject: Dumping program state to file 'var-b/lib/icinga2/icinga2.state'
[2019-04-01 08:33:17 +0200] information/IcingaApplication: Icinga has shut down.

master1:

[2019-04-01 08:33:41 +0200] information/IdoMysqlConnection: 'ido-mysql' resumed.
[2019-04-01 08:33:41 +0200] notice/IdoMysqlConnection: Reconnect: 'ido-mysql' is now connected to database 'icinga'.
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SELECT @@global.max_allowed_packet AS max_allowed_packet
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SELECT version FROM icinga_dbversion WHERE name='idoutils'
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SELECT instance_id FROM icinga_instances WHERE instance_name = 'default'
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SELECT UNIX_TIMESTAMP(status_update_time) AS status_update_time, endpoint_name FROM icinga_programstatus WHERE instance_id = 1
[2019-04-01 08:33:41 +0200] information/IdoMysqlConnection: Last update by endpoint 'master2' was 30.3371s ago. Taking over 'ido-mysql' in HA zone 'master'.
[2019-04-01 08:33:41 +0200] notice/IdoMysqlConnection: Enabling IDO connection in HA zone.
[2019-04-01 08:33:41 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SET SESSION TIME_ZONE='+00:00'
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: SET SESSION SQL_MODE='NO_AUTO_VALUE_ON_ZERO'
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: BEGIN
[2019-04-01 08:33:41 +0200] notice/DbConnection: Updating programstatus table.
[2019-04-01 08:33:41 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_programstatus SET active_host_checks_enabled = '1',  active_service_checks_enabled = '1',  daemon_mode = '1',  endpoint_name = 'master1',  event_handlers_enabled = '1',  flap_detection_enabled = '1',  instance_id = 1,  is_currently_running = '1',  last_command_check = FROM_UNIXTIME(1554100421),  notifications_enabled = '1',  passive_host_checks_enabled = '1',  passive_service_checks_enabled = '1',  process_id = '54071',  process_performance_data = '1',  program_start_time = FROM_UNIXTIME(1554100391),  program_version = 'v2.10.4-399-gb75a36b4a',  status_update_time = FROM_UNIXTIME(1554100421) WHERE instance_id = 1

Start master2 again.

[2019-04-01 08:35:25 +0200] notice/IdoMysqlConnection: Reconnect: 'ido-mysql' is now connected to database 'icinga'.
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: SELECT @@global.max_allowed_packet AS max_allowed_packet
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: SELECT version FROM icinga_dbversion WHERE name='idoutils'
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: SELECT instance_id FROM icinga_instances WHERE instance_name = 'default'
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: SELECT UNIX_TIMESTAMP(status_update_time) AS status_update_time, endpoint_name FROM icinga_programstatus WHERE instance_id = 1
[2019-04-01 08:35:25 +0200] information/IdoMysqlConnection: Last update by endpoint 'master1' was 4.81982s ago (< failover timeout of 30s). Retrying.

[2019-04-01 08:35:45 +0200] information/IdoMysqlConnection: Last update by endpoint 'master1' was 24.8078s ago (< failover timeout of 30s). Retrying.


[2019-04-01 08:35:55 +0200] information/IdoMysqlConnection: Last update by endpoint 'master1' was 34.8053s ago. Taking over 'ido-mysql' in HA zone 'master'.
[2019-04-01 08:35:55 +0200] notice/IdoMysqlConnection: Enabling IDO connection in HA zone.
[2019-04-01 08:35:55 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2019-04-01 08:35:55 +0200] debug/IdoMysqlConnection: Query: SET SESSION TIME_ZONE='+00:00'
[2019-04-01 08:35:55 +0200] debug/IdoMysqlConnection: Query: SET SESSION SQL_MODE='NO_AUTO_VALUE_ON_ZERO'
[2019-04-01 08:35:55 +0200] debug/IdoMysqlConnection: Query: BEGIN
[2019-04-01 08:35:55 +0200] notice/DbConnection: Updating programstatus table.
[2019-04-01 08:35:55 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_programstatus SET active_host_checks_enabled = '1',  active_service_checks_enabled = '1',  daemon_mode = '1',  endpoint_name = 'master2',  event_handlers_enabled = '1',  flap_detection_enabled = '1',  instance_id = 1,  is_currently_running = '1',  last_command_check = FROM_UNIXTIME(1554100555),  notifications_enabled = '1',  passive_host_checks_enabled = '1',  passive_service_checks_enabled = '1',  process_id = '54149',  process_performance_data = '1',  program_start_time = FROM_UNIXTIME(1554100525),  program_version = 'v2.10.4-399-gb75a36b4a',  status_update_time = FROM_UNIXTIME(1554100555) WHERE instance_id = 1

master1 stops

[2019-04-01 08:35:25 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'master2'.
[2019-04-01 08:35:25 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'.
[2019-04-01 08:35:25 +0200] information/ApiListener: Sending replay log for endpoint 'master2' in zone 'master'.
[2019-04-01 08:35:25 +0200] notice/ApiListener: Replayed 0 messages.
[2019-04-01 08:35:25 +0200] notice/ApiListener: Replaying log: var-a/lib/icinga2/api/log/1554100526
[2019-04-01 08:35:25 +0200] notice/ApiListener: Replayed 0 messages.
[2019-04-01 08:35:25 +0200] notice/ApiListener: Updating object authority for objects at endpoint 'master1'.
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Attempting to pause 'ido-mysql'.
[2019-04-01 08:35:25 +0200] information/DbConnection: Pausing IDO connection: ido-mysql
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: UPDATE icinga_endpointstatus SET is_connected = '1',  status_update_time = FROM_UNIXTIME(1554100525) WHERE endpoint_object_id = 27867 AND instance_id = 1
[2019-04-01 08:35:25 +0200] debug/IdoMysqlConnection: Query: COMMIT
[2019-04-01 08:35:25 +0200] information/IdoMysqlConnection: Disconnected from 'ido-mysql' database 'icinga'.
[2019-04-01 08:35:25 +0200] information/IdoMysqlConnection: 'ido-mysql' paused.

Stop master 2 again.

[2019-04-01 08:38:22 +0200] information/Application: Shutting down...
[2019-04-01 08:38:22 +0200] debug/IdoMysqlConnection: Attempting to pause 'ido-mysql'.
[2019-04-01 08:38:22 +0200] information/DbConnection: Pausing IDO connection: ido-mysql
[2019-04-01 08:38:22 +0200] debug/IdoMysqlConnection: Query: COMMIT
[2019-04-01 08:38:22 +0200] information/IdoMysqlConnection: Disconnected from 'ido-mysql' database 'icinga'.
[2019-04-01 08:38:22 +0200] information/IdoMysqlConnection: 'ido-mysql' paused.
[2019-04-01 08:38:22 +0200] information/DbConnection: 'ido-mysql' stopped.

master1 takes over.

[2019-04-01 08:38:31 +0200] notice/ApiListener: Updating object authority for objects at endpoint 'master1'.

[2019-04-01 08:38:31 +0200] information/IdoMysqlConnection: 'ido-mysql' resumed.
[2019-04-01 08:38:31 +0200] information/DbConnection: Resuming IDO connection: ido-mysql
[2019-04-01 08:38:31 +0200] notice/IdoMysqlConnection: Reconnect: 'ido-mysql' is now connected to database 'icinga'.
[2019-04-01 08:38:31 +0200] debug/IdoMysqlConnection: Query: SELECT @@global.max_allowed_packet AS max_allowed_packet
[2019-04-01 08:38:31 +0200] debug/IdoMysqlConnection: Query: SELECT version FROM icinga_dbversion WHERE name='idoutils'
[2019-04-01 08:38:31 +0200] debug/IdoMysqlConnection: Query: SELECT instance_id FROM icinga_instances WHERE instance_name = 'default'
[2019-04-01 08:38:31 +0200] debug/IdoMysqlConnection: Query: SELECT UNIX_TIMESTAMP(status_update_time) AS status_update_time, endpoint_name FROM icinga_programstatus WHERE instance_id = 1
[2019-04-01 08:38:31 +0200] information/IdoMysqlConnection: Last update by endpoint 'master2' was 16.3688s ago (< failover timeout of 30s). Retrying.
[2019-04-01 08:38:31 +0200] debug/IdoMysqlConnection: Reconnect: Clearing ID cache.

[2019-04-01 08:38:51 +0200] notice/IdoMysqlConnection: Reconnect: 'ido-mysql' is now connected to database 'icinga'.
[2019-04-01 08:38:51 +0200] debug/IdoMysqlConnection: Query: SELECT @@global.max_allowed_packet AS max_allowed_packet
[2019-04-01 08:38:51 +0200] debug/IdoMysqlConnection: Query: SELECT version FROM icinga_dbversion WHERE name='idoutils'
[2019-04-01 08:38:51 +0200] debug/IdoMysqlConnection: Query: SELECT instance_id FROM icinga_instances WHERE instance_name = 'default'
[2019-04-01 08:38:51 +0200] debug/IdoMysqlConnection: Query: SELECT UNIX_TIMESTAMP(status_update_time) AS status_update_time, endpoint_name FROM icinga_programstatus WHERE instance_id = 1
[2019-04-01 08:38:51 +0200] information/IdoMysqlConnection: Last update by endpoint 'master2' was 36.3446s ago. Taking over 'ido-mysql' in HA zone 'master'.
[2019-04-01 08:38:51 +0200] notice/IdoMysqlConnection: Enabling IDO connection in HA zone.
[2019-04-01 08:38:51 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2019-04-01 08:38:51 +0200] debug/IdoMysqlConnection: Query: SET SESSION TIME_ZONE='+00:00'

And so on.

@dnsmichi dnsmichi self-assigned this Apr 1, 2019
@dnsmichi
Copy link
Contributor Author

dnsmichi commented Apr 1, 2019

Screen Shot 2019-04-01 at 09 15 15

@dnsmichi
Copy link
Contributor Author

dnsmichi commented Apr 1, 2019

Additional goodie: Add the last_failover timestamp to the ido CheckCommand output when enable_ha is enabled and the feature is not paused on this instance.

Screen Shot 2019-04-01 at 09 29 31

@dnsmichi dnsmichi merged commit 1e124fd into master Apr 1, 2019
@dnsmichi dnsmichi deleted the bugfix/ido-reconnect-failover branch April 1, 2019 08:30
dnsmichi pushed a commit that referenced this pull request Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/db-ido Database output area/distributed Distributed monitoring (master, satellites, clients) enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant