Throttle Period #14

Nishant23 · 2019-03-29T06:55:43Z

If we configure email or slack in the action block, it will send alerts each time monitor is triggered. In xpack of elasticseach this can be controlled by setting up throttle_period in action block. It will wait for the throttle_period amount of time after the first alert and then resend the alerts if the issue is not resolved yet once. Can we have the same functionality as in this?

For more info, you can look at https://www.elastic.co/guide/en/x-pack/current/actions.html#actions-ack-throttle

The text was updated successfully, but these errors were encountered:

CarlMeadows · 2019-03-29T23:30:18Z

Noted - thanks for the feedback. We do have the Acknowledge feature which will suspend subsequent notifications until the issue is resolved - but I can see folks also only wanting alerts to go out every 15 or 30 minutes on a 1 minute polling event - even before they can acknowledge it.

vamshin · 2019-04-03T19:35:18Z

Adding my observations for the throttling feature.

We could provide throttling at 3 levels.
Level 1: Monitor
Level2: Trigger (If no trigger level throttling, it falls back to Monitor)
Level3: Action (If no Action level throttling, it falls back to trigger)

Why would we need these levels?
A Monitor could have multiple triggers. If we want all triggers to have similar throttling, then instead of adding throttling configuration at each trigger level, we could set at monitor level. Same logic holds for Trigger and Action
Currently Xpack supports throttling at Trigger level and Action level. But For Xpack, 1 watch(Monitor) = 1 trigger, so trigger level throttling means similar to level 1 and level2 described above.

Also should the throttling be global to all the alerts of the trigger or it should be with respect to a alert?
If throttling is with respect to alert, then the throttling timer resets when alert is completed and the notification would be sent on next alert.
If throttling is global to alerts, then the timer runs across alerts.
One drawback we could think of with former approach is, if we have an alert thats going in and out, then the throttling does not come into picture.

Please share feedback if you see any concerns or if i am missing something here

Nishant23 · 2019-04-12T06:19:43Z

@CarlMeadows When is this planned to be released?

elfisher · 2019-04-12T19:52:55Z

@Nishant23 we are still working through how we want this mechanic to work. I'll be posting a high-level summary of our plans shortly to this issue thread. Stay tuned.

elfisher · 2019-04-12T19:56:57Z

[RFC] Alert throttling

The purpose of this request for comments (RFC) is to discuss how to enhance Open Distro for Elasticsearch Alerting to include throttling mechanisms on actions and provide users with the ability to undo acknowledgements.

Problem statement

Currently monitors, triggers, and actions all run on the schedule defined in the monitor. Each time a monitor runs, its triggers will be evaluated and actions will be taken if the trigger thresholds are exceeded. This could create a lot of noise if you have monitors that run often, however you may not want to reduce the monitor frequency because you still want to check the data often. For example, you may want to check the error rates of application logs in Elasticsearch every 5 minutes, but you might only want to send alerts every 30 minutes while your error rates are high.

Proposed solution

We will introduce a throttling property in actions which will be used to reduce the frequency at which the action is taken. Taking the same example from above, you will be able to run the monitor checking error rates in your application every 5 minutes, but define your alert to at most be sent every 30 minutes.

Note: We are planning to apply throttling to unique alerting events. Let's look at an example of what this means with throttling set to 30 minutes. Let's say a trigger that goes into alerting at 10:00 and completes at 10:10. If the trigger goes back to alerting at 10:20, alerts would be sent for the 10:00 event and the 10:20 event because each alert would be a unique event. If the 10:20 alert does not complete it will then send notifications at 10:50, 11:20, 11:50, and so on until completing or being acknowledged.

Example Action configuration

{
      "name": "test-action",
      "destination_id": "RtaaOmkBC25HCRGm0fxi",
      "throttle": {
        "value": 30,
         "unit": "MINUTES"
    },
    ...
}

Additional functionality we are considering

Additionally we are thinking of adding default throttling settings at the trigger and monitor level which you can configure if you want to inherit a default throttling setting from the parent. For example, you may have multiple triggers and actions configured for one monitor, but you want them all to have the same throttling settings. You could use the monitor default_throttling property to set a default that all of its actions will use.

It is important to note that if a child's throttling property is configured it will take precedence over its parent's. For example, if I configure a default_throttling property on a monitor and a trigger, the trigger property will be used for it's actions. If I configure throttling property on an action, it will over rule its trigger's property. Below are examples of how priority is selected.

Monitor (Throttle configured)	Trigger (Throttle configured)	Action (Throttle configure)	Throttle Used
Yes	Yes	Yes	Action
Yes	Yes	No	Trigger
Yes	No	No	Monitor
No	Yes	Yes	Action
No	Yes	No	Trigger

Example Monitor Configuration

{
  "name": "test-monitor",
  "type": "monitor",
  "enabled": true,
  "schedule": {
    "period": {
      "interval": 1,
      "unit": "MINUTES"
    },
    "default_throttle": {
      "value": 10,
      "unit": "MINUTES"
    },
    ...
 }

Example Trigger COnfiguration

{
    "id": "StaeOmkBC25HCRGmL_y-",
    "name": "test-trigger",
    "severity": "1",
    "default_throttle": {
      "value": 10,
      "unit": "MINUTES"
    },
    ...
 }

elfisher · 2019-04-12T19:57:47Z

@vamshin I've incorporated your thoughts from above into the RFC.

ylwu-amzn · 2019-04-12T20:12:26Z

@elfisher the throttling here is to prevent sending out too many alerts in short time, another manual way is acknowledge. Can we consider enhance the acknowledge to suppress the alert for a period(such as suppress the alert for next hour)?

ylwu-amzn · 2019-04-12T22:03:22Z

For better tracking, create another issue to track the enhancement of acknowledge for a period of time.

rhyscooper · 2019-04-18T08:44:56Z

This would be a very valuable addition to remove noise from alerting and is something I've been looking for!

It might be useful to be able to configure whether the throttling applies to only single events, or can span multiple events so it is the user's choice whether they want to filter out intermittent 'spikey' alerts, or still receive multiple alerts when the alert condition is fluctuating between true/false.

ylwu-amzn · 2019-05-15T21:56:10Z

Code change merged. Close this issue.

elfisher added the enhancement New feature or request label Apr 2, 2019

elfisher mentioned this issue Apr 10, 2019

unacknowledge an alert #27

Closed

This was referenced May 7, 2019

add throttle on action level #48

Merged

add throttle on action level opendistro-for-elasticsearch/alerting-kibana-plugin#45

Merged

mihirsoni assigned ylwu-amzn May 15, 2019

ylwu-amzn closed this as completed May 15, 2019

adityaj1107 mentioned this issue Jun 2, 2021

unacknowledge an alert opensearch-project/alerting#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle Period #14

Throttle Period #14

Nishant23 commented Mar 29, 2019 •

edited

Loading

CarlMeadows commented Mar 29, 2019

vamshin commented Apr 3, 2019

Nishant23 commented Apr 12, 2019 •

edited

Loading

elfisher commented Apr 12, 2019

elfisher commented Apr 12, 2019

elfisher commented Apr 12, 2019

ylwu-amzn commented Apr 12, 2019 •

edited

Loading

ylwu-amzn commented Apr 12, 2019 •

edited

Loading

rhyscooper commented Apr 18, 2019

ylwu-amzn commented May 15, 2019

Throttle Period #14

Throttle Period #14

Comments

Nishant23 commented Mar 29, 2019 • edited Loading

CarlMeadows commented Mar 29, 2019

vamshin commented Apr 3, 2019

Nishant23 commented Apr 12, 2019 • edited Loading

elfisher commented Apr 12, 2019

elfisher commented Apr 12, 2019

[RFC] Alert throttling

Problem statement

Proposed solution

Example Action configuration

Additional functionality we are considering

Example Monitor Configuration

Example Trigger COnfiguration

elfisher commented Apr 12, 2019

ylwu-amzn commented Apr 12, 2019 • edited Loading

ylwu-amzn commented Apr 12, 2019 • edited Loading

rhyscooper commented Apr 18, 2019

ylwu-amzn commented May 15, 2019

Nishant23 commented Mar 29, 2019 •

edited

Loading

Nishant23 commented Apr 12, 2019 •

edited

Loading

ylwu-amzn commented Apr 12, 2019 •

edited

Loading

ylwu-amzn commented Apr 12, 2019 •

edited

Loading