-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stack Monitoring] [Test Scenario] Out of the box alerting #85841
Comments
Pinging @elastic/stack-monitoring (Team:Monitoring) |
Not quite sure how to test this one. Does this state only happen if you're monitoring multiple clusters? |
@chrisronline Got it, is it expected that the toggle switch doesn't appear if you're only monitoring a single node? |
@Zacqary Negative. That's something we should probably optimize in the near future, but it does not do that now. |
Okay, well I'm not seeing the |
@Zacqary Oh wow, that's weird. Do you mind posting a screenshot of what you are seeing? |
|
For the threadpool alerts, not sure if I'm doing it right but it doesn't seem to be firing when I:
|
@igoristic Can you advise on #85841 (comment)? |
@chrisronline I'm running 7.11 locally and still don't see the Group By Node toggle. I actually tried on master first by accident and it also didn't show up. |
@Zacqary Oh, my bad! You are unable to see this functionality while in |
Ah that makes sense. Works outside Setup Mode. |
Do I still need to do something to enable Watcher for the legacy alerts? Seems like I'm having trouble getting the license expiration pipeline to set it off |
You'll need trial or higher license and that should be it. I'd double check the watches exist and then you can do something like:
to verify the pipeline is properly changing the document |
Yeah the document's updating, watches don't exist though. |
What license level are you on? If you are on trial+ and using legacy monitoring collection, the watches should be created for you. |
Using Trial and Metricbeat monitoring, alerts are created but no watches. |
Ah, there is a known issue around watch creation and using only metricbeat monitoring: elastic/elasticsearch#51762 (comment) We're tracking to remove these watches in 7.12: #85047 so this bug will be moot but it's still there for now. To get around it, please enable legacy monitoring (via the cluster setting) in order for the watches to exist. You can disable legacy monitoring as soon as the watches have been created |
Remind me how to enable legacy monitoring again? It's a dev tools request, right? |
Yea
|
Marked this as working, but note that alerts can be deleted from the management UI |
@Zacqary Really? I'm able to do it on staging cloud 7.11: |
@chrisronline Yeah, is that not a bug? There doesn't seem to be a way to create them again |
It's a bit of a confusing flow, I'll admit. Users should be able to delete from there, but the creation is handled purely by the Stack Monitoring UI. In the future, we hope to simplify this but we don't imagine it's a huge point of confusion, as users are most likely not deleting these alerts. |
@ravikesarwani Any thoughts on this experience and if we need to change it? |
Unsure how to get this one to fire. Tried turning off Metricbeat for a while, but that just switched the stack to Self-Monitoring mode and didn't fire any alerts. |
I'm not sure if that's a bug with
That should work. The collection method doesn't matter here - I'd just be consistent. Monitor the ES node with either legacy or MB for enough time to see the monitoring data show up, then disable either and make sure an alert fires in the Kibana server log. If not, then sounds like a bug |
Yeah, seems to not be firing, then |
If the user deleted the alert don't they get recreated at the time of stack monitoring UI reload? |
They do, but I think @Zacqary is saying it's not obvious to the user that will happen |
That actually didn't happen for me the first time I tried it, but it's working now that I've tried it again. If I can find a consistent way to reproduce I'll let y'all know. |
Setting up CCS on Cloud doesn't seem to be showing the remote cluster in the Stack Monitoring UI. This is from creating two Cloud deployments and adding one of them as a remote cluster using the Cloud UI, not through Kibana (where Remote Clusters isn't available in Stack Management). |
I'm not sure how this works. I did this as well and there is no remote info from |
I can mark the CCS case as working if we're not targeting the Cloud CCS for this release, but I'd recommend tracking that as an issue for the future. Still unable to get the two unchecked alerts above (missing monitoring data and threadpool rejection) to fire on my end. |
@Zacqary Thanks.
This one appears to be bugged now so great catch. Fix is #87882 @igoristic Are you able to look into the threadpool rejection alert? |
@Zacqary This won't work for triggering the alert. This is because that number/value is actually a cumulative counter. So, by the time you test the query it will be replaced by a more recent document that has it set as zero, thus you won't get the min and max delta that the query was designed for. The only way to do this is by setting the size to 0 on a designated node that is monitored, via: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html and then executing a search operation on that specific node. Though, I was only able to do this locally, because I couldn't change the thread pool size on cloud |
@Zacqary @chrisronline @igoristic where are we with this? |
Sorry, missed the updates in the transition back to 7.12 work. I can try these test cases again asap. |
Summary
Stack Monitoring provides a set of out-of-the-box alerts, created by simply loading the Stack Monitoring UI within Kibana. The default action for each alert is a server log and the action messaging is controlled by the Stack Monitoring UI code directly.
PRs
Original, and CPU alert: #68805
Disk usage alert: #75419
JVM memory usage alert: #79039
Missing monitoring data alert: #78208
Threadpool rejections alert: #79433
Testing
Creation
Management
UX
Specific alerts
Edge cases
The text was updated successfully, but these errors were encountered: