Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognize PENDING -> UNKNOWN as state change #7815

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Al2Klimov
Copy link
Member

@Al2Klimov Al2Klimov commented Feb 5, 2020

... to get notified about that state change.

fixes #7807
fixes #10217

@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 5, 2020

This changes the current behavior, and will possibly influence the way states are counted and notifications are sent. Imho this needs a deeper discussion with @lippserd

@nilmerg nilmerg removed their request for review February 6, 2020 06:25
@dnsmichi dnsmichi self-requested a review February 10, 2020 14:28
Copy link
Contributor

@dnsmichi dnsmichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also discussed this with @N-o-X today. Changing the behavior of ProcessCheckResult is not allowed in this regard.

If IcingaDB needs this specific state change from "pending" (not known to Icinga 2 itself) to another state, you can add a new signal where the IcingaDbWriter can subscribe to. Something like

if (!old_cr)
  OnFirstCheckCheckResult(cr, origin);

@Al2Klimov
Copy link
Member Author

Just for understanding: Why exactly is it not allowed?

@dnsmichi
Copy link
Contributor

It is a change in the behavior of the state machine. Now, !cr (pending) -> first state is not treated as a state change. As such, people know exactly when event handlers and notifications are fired. If this is shifted by one, making this !cr (pending) -> first state a first state change, the counting logic changes and leads to unexpected behaviour with regard to event handlers and notifications.

If you really need to know about the change from pending (which isn't a state by definition in Icinga 2) to the first check result, our decision was to add a new signal where IcingaDB and other features may subscribe upon, leaving the current state machine intact.

@Al2Klimov
Copy link
Member Author

For me that notification behavior is absolutely expected, i.e. I if a service with one check attempt changes from a non-problem state (e.g. PENDING) to a problem state (e.g. UNKNOWN) I expect a notification.

By the way we've already changed that state machine indirectly. I opt for merging this at least after v2.12.

@dnsmichi
Copy link
Contributor

I can only repeat myself, pending is not a state in Icinga 2. This is something which has been invented by the web views, and is now slowly moved into the core without re-evaluation. Imho this needs a proper thought and conceptual review on all the affected components.

Anyhow, pinging @Thomas-Gelf since this affects the whole architecture.

@Al2Klimov
Copy link
Member Author

Sigh. Okay, then it's not PENDING, but UNKNOWN w/o any check result so far which have been changed to not a problem by you and me.

As a user I'd like to get notifications on a change from UNKNOWN w/o any check result so far to UNKNOWN the same way as e.g. on a change from UNKNOWN w/o any check result so far to CRITICAL.

@dnsmichi
Copy link
Contributor

Let's simplify this.

Please write a new proposal, similar to a draft concept with the label TBD, how the current state logic works, and how you'd like it being changed. Illustrate that in all its detail in a new issue, including the core and web components. 1000 words should be a good orientation for that. Then everyone can discuss based on your ideas.

That being said, I'll close this PR.

@dnsmichi dnsmichi closed this Feb 11, 2020
@Thomas-Gelf
Copy link
Contributor

IMHO the expected behaviour for related state transitions is as simple as follows:

From To Notify State Change
Pending Ok - x
Pending Up - x
Pending Warning x x
Pending Critical x x
Pending Unknown x x
Pending Down x x

State change should result in IDO/IcingaDB State & History updates, former (last) state fields should be NULL where a transition from Pending/NoState occurred. If you need more words please let me know.

@Al2Klimov
Copy link
Member Author

Thank you, @Thomas-Gelf, for your opinion. Yes, I need a bit more words, but I'll just add them by myself:

  1. Just to be clear, the thing you called Pending is actually UNKNOWN w/o any check result so far on our side
  2. You seem to expect an actual state change on transition from UNKNOWN w/o any check result so far to ANY w/ a check result
  3. This PR does no less and no more that that and should be re-opened for merging in the future

@Al2Klimov
Copy link
Member Author

Re-opening as there seem to be discussion potential. @dnsmichi and @N-o-X ("our decision") are against and @Thomas-Gelf and me are not (2 vs. 2).

@Al2Klimov Al2Klimov reopened this Feb 12, 2020
@N-o-X
Copy link
Contributor

N-o-X commented Feb 12, 2020

I'm not directly against it, I'm just not sure if this change would break stuff. I really think the behaviour @Thomas-Gelf proposed would be nice to have, but might break something during implementation. As @dnsmichi said, it would be nice to have a summary of what that change would touch (Core & Web) and if we're able to implement it without breaking those things.

@Al2Klimov
Copy link
Member Author

Even better. 1 vs. 2.5.

@dnsmichi
Copy link
Contributor

A decision can only be made with a document which covers all the details. Therefore I'd suggest that @Al2Klimov creates a new issue and does exactly that.

@Al2Klimov
Copy link
Member Author

  1. We really don't need more than one issue for one PR.
  2. IMAO @Thomas-Gelf has already covered all details.

@dnsmichi
Copy link
Contributor

I'm talking about a document, which includes the table, and describes the current situation, adds the changes and includes all dependencies. It should also discuss the possible impact on backends, features and the end-user. If you need an inspiration, look into #7814 how to write such a document/concept. Depending on the work load of yours, we should find a time slot in your schedule with @lippserd.

@Thomas-Gelf
Copy link
Contributor

To me this seems "slightly" exaggerated for a fix that adds a missing state change event. I do not know whether the pull request would lead to notifications being treated differently than state changes in the DB, so this is something @Al2Klimov could eventually verify and explain. It is important that we're not going to send notifications when transitioning from "pending" (no former result) to a non-problem state.

Web frontend: @dnsmichi, could you please elaborate why you're expecting this change to break anything there?

IDO: no new state type is being introduced. Fixing the NULL-issue I mentioned would be a change with potential impact. Skip that part if you feel unsure. SLA functions didn't trust this (last_state) column from the beginning, so who cares.

Notification commands: some might work with last_state. Could they fail when sending a notification for an penging-to-unknown transition? Only if they also failed for a pending-to-critical transition. Who is to blame? The script author. Is this worse than sending no notification at all? Definitively no.

@Al2Klimov
Copy link
Member Author

To me this seems "slightly" exaggerated for a fix that adds a missing state change event.

Full ack.

@Al2Klimov Al2Klimov added this to the 2.13.0 milestone Mar 17, 2020
@lippserd
Copy link
Member

lippserd commented Jun 9, 2020

I opt for changing this but we need a test protocol with all possible states, state changes and triggered notifications please.

@lippserd lippserd added the needs feedback We'll only proceed once we hear from you again label Jun 9, 2020
@lippserd lippserd removed their request for review June 9, 2020 08:35
@Al2Klimov Al2Klimov self-assigned this Jun 9, 2020
@lippserd lippserd added the needs feedback We'll only proceed once we hear from you again label Nov 10, 2020
@Al2Klimov
Copy link
Member Author

  1. Yes and no. Now it applies also to PENDING -> UNKNOWN, in addition to PENDING -> X.
  2. Hard state change? Why hard state change? (see below)
  3. -
  4. See Checkable#GetProblem(): consider PENDING not a problem #7685 and 2.
  5. Yes. (But see also 1.)
[2020-11-10 13:20:55 +0100] warning/PluginCheckTask: Check command for object 'Alexanders-MacBook-Pro.local!http' (PID: 88691, arguments: '/Users/aklimov/NET/WS/icinga2/prefix/usr/lib/nagios/plugins/check_http' '-I' '127.0.0.1' '-u' '/') terminated with exit code 128, output: execvpe(/Users/aklimov/NET/WS/icinga2/prefix/usr/lib/nagios/plugins/check_http) failed: No such file or directory

[2020-11-10 13:20:55 +0100] notice/Checkable: State Change: Checkable 'Alexanders-MacBook-Pro.local!http' soft state change from UNKNOWN to UNKNOWN detected.

@Al2Klimov Al2Klimov removed their assignment Nov 10, 2020
@Al2Klimov Al2Klimov removed the needs feedback We'll only proceed once we hear from you again label Nov 10, 2020
@Al2Klimov Al2Klimov force-pushed the bugfix/pending-unknown-state-change-7807 branch from fa758e2 to 4fe677b Compare December 14, 2020 18:03
@lippserd
Copy link
Member

@Al2Klimov I find it a little confusing that the first state without cr is also referred to as UNKNOWN in the logs. Is there any chance that we log something like "first state change to UNKNOWN" instead of "UNKNOWN to UNKNOWN"?

@Al2Klimov Al2Klimov self-assigned this Mar 22, 2021
@Al2Klimov Al2Klimov removed the request for review from lippserd March 22, 2021 17:07
@Al2Klimov Al2Klimov marked this pull request as draft March 22, 2021 17:07
@Al2Klimov Al2Klimov force-pushed the bugfix/pending-unknown-state-change-7807 branch from 4fe677b to 7cd1dd8 Compare March 22, 2021 17:22
@Al2Klimov Al2Klimov marked this pull request as ready for review March 22, 2021 17:22
@Al2Klimov Al2Klimov removed their assignment Mar 22, 2021
@Al2Klimov Al2Klimov requested a review from lippserd March 22, 2021 17:22
@Al2Klimov Al2Klimov modified the milestones: 2.13.0, 2.14.0 Jun 2, 2021
@Al2Klimov
Copy link
Member Author

@cla-bot check

@Al2Klimov
Copy link
Member Author

ref/IP/44082
ref/NC/774135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants