-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - apscheduler skipping alerts #187
Comments
Yes, ElastAlert can struggle with large numbers of rules, especially if the rule queries aren't tuned adequately, or if the ES cluster is under powered. If you can put together a PR for this new option we can get it merged in for the next release. |
|
@ferozsalam Monitoring the elastalert_status index, we can see an average of 150-180 out of +-330 rules/alerts being run each schedule, this happens indefinitely. We couldn't tell if the each rules was excluded everytime from being run. |
This is very curious! I did some investigation on our setup - we run ~128 rules against our cluster and I can see no evidence of this issue occurring. I think setting the I would be interested to see if others are also having this issue, and would be interested to see the average |
Thanks for the prompt reply. One thing worth mentioning is that the I'm not certain that this is the source of the problem - I see that you have mentioned that ElastAlert seems to sit waiting with nothing to do - but I think there's a chance it might be making it worse. For the rules that are taking a particularly long amount of time (perhaps >5s), are you able to provide an example of what they look like? |
Yeah, like I mentioned initially, 95% of our queries is very basic. Unfortunately we have a few (~15 alerts) that run regex and take 20-40s which skews the numbers, all the rest runs for about 0.1-0.6. |
Okay, @ferozsalam you right. Looks like when the queries take some time it skips some rules. I find it strange that it skips running some rules and then sits and waits 5mins for next run. Unfortunately there is no way around the regex/slow rules and I'm sure other people unknowingly will experience the same issue. |
The BackgroundScheduler defaults to a 10 thread limit (https://github.com/agronholm/apscheduler/blob/3.0/apscheduler/executors/pool.py), so if your roughly 15 slow rules are using up all those threads for 20-30 seconds then no other threads are available to service the rest of the rules. |
Could we maybe add an optional configuration parameter to change/raise that number of threads? |
Yes, I think both options would be useful:
|
Take a look at PR #192 and let me know if you have any concerns with this change. |
Looks good! |
Firstly, thanks for maintaining the project.
Elastalert version - latest
Python version - Python 3.8.5
OS - Ubuntu 20.04.1 LTS
Problem description. - This problem comes from the original elastalert. We noticed that amount of rules actually being run by Elastalert was different every time it ran - this was viewed in the Elastalert Elasticsearch index.
We never had this issue with a "small" amount of rules and only noticed it when a large set of rules was loaded.
In the Elastalert logs you would see this intermittently:
We modified elastalert.py and added
misfire_grace_time
to job as a hack to ensure all the rules runs.The parameter was found here : https://apscheduler.readthedocs.io/en/stable/modules/job.html
This is the result of change:
The text was updated successfully, but these errors were encountered: