Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icinga2 does not reliably collect all performance data values during reload #6841

Closed
ix-dev opened this issue Dec 17, 2018 · 6 comments · Fixed by #6970
Closed

Icinga2 does not reliably collect all performance data values during reload #6841

ix-dev opened this issue Dec 17, 2018 · 6 comments · Fixed by #6970
Assignees
Labels
area/metrics General metrics handling bug Something isn't working ref/NC
Milestone

Comments

@ix-dev
Copy link

ix-dev commented Dec 17, 2018

When we deploy a new configuration using the Director some performance data is lost, but the actual checks are executed (either by the satellites or the core itself).

Expected Behavior

No performance data should be lost during reload of the Icinga2 core.

Current Behavior

Some random performance data is lost when the core reloads its configuration.

Possible Solution

Buffer check results during the reload phase of the Icinga2 core so that no performance data is lost.

Steps to Reproduce (for bugs)

  1. Set check_interval for hostalive checks to 5 minutes
  2. Enable the perfdatawriter to collect performance data in a file
  3. Deploy new configuration using the Director
  4. Check if all performance data is written to file

Context

We want to collect performance counters in a reliable way.

Your Environment

Version used: Icinga2: r2.10.2-1, IcingaWeb2: 2.5.0, Director 1.3.1
Operating Systems: Debian 8

@dnsmichi
Copy link
Contributor

  • What exactly happens during such a reload? (Logs, etc.)
  • How big is your environment and how long does such a reload take? (Output of icinga2 daemon -C)
  • When do these check events occur, and what proof can be seen inside the written performance data files (the gap of timestamps for example).

@dnsmichi dnsmichi added area/metrics General metrics handling needs feedback We'll only proceed once we hear from you again labels Dec 17, 2018
@ix-dev
Copy link
Author

ix-dev commented Dec 18, 2018

  • Since the logs contain sensitive information we could not publish the logs here. Instead, we could offer to send them to Netways directly.
  • output of icinga2 daemon -C: icinga2_daemon_c.txt
  • hostperfdata with gap: host-perfdata.txt
    The check interval of the hostalive check is set to 5 minutes.

@lazyfrosch
Copy link
Contributor

I guess the problem is caused by queue data getting dropped during reload.

Al2Klimov added a commit that referenced this issue Jan 8, 2019
@Al2Klimov Al2Klimov self-assigned this Jan 8, 2019
@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again labels Jan 9, 2019
@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Feb 11, 2019
@dnsmichi dnsmichi added this to the 2.11.0 milestone Feb 11, 2019
@dnsmichi
Copy link
Contributor

Scheduling for 2.11 to not forget that code is in git master already for this.

@ix-dev
Copy link
Author

ix-dev commented Feb 18, 2019

We tested release 2.10.2 with the patches from pull requests #6882 and #6908 but it does not solve the problem yet. We will test again with the current snapshot and provide feedback.

@dnsmichi
Copy link
Contributor

ref/NC/591065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics General metrics handling bug Something isn't working ref/NC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants