[Monitoring] Scale alerts in UI #80397

chrisronline · 2020-10-13T19:01:38Z

Eventually, this will not scale for the large number of alerts we plan to add. We need to think of a way to only show some and provide a way to see them all in a friendly way.

elasticmachine · 2020-10-13T19:01:40Z

Pinging @elastic/stack-monitoring (Team:Monitoring)

ravikesarwani · 2020-10-20T18:44:55Z

We have a few other usability issues that we should look at solving as well.
Adding it here so we can have a brain storming session to see how we can make this user experience better.

Show this information in a more concise manner.

We need some organization and unique information to distinguish these alerts.

igoristic · 2020-10-20T19:24:08Z

For the 8 CPU alert notifications I think we can group them into 5min (or some interval) time buckets, eg:

1 minutes ago  
2 High Disk Usage   >
--------------
5 minutes ago  
2 High CPU Usage    >
--------------
1 hour ago  
5 High CPU Usage    >

And we can also do some kind of grouping/categories when editing alerts, eg:

Usage Alerts        >
--------------
Query Alerts        >
--------------
Monitoring Alerts   >

Clicking on the Usage Alerts would go to next menu, eg:

CPU Usage      >
--------------
Disk Usage     >
--------------
Memory Usage   >

But, I really think we should get a UI/UX designer involved to help us out

ravikesarwani · 2020-10-22T21:40:27Z

I will spend sometime to think these UX issues and propose possible presentation options.

cc: @katrin-freihofner I know UX team is super busy in the 7.11 timeframe but at least you will follow the discussion and maybe keep us straight if we are going completely off the rails.

chrisronline · 2020-10-26T14:04:33Z

I think there is progress we can make until design can help us more, such as grouping alerts better per Igor's comment.

I have opened #81569 to explore some of these ways.

ravikesarwani · 2020-10-26T14:34:23Z

Organizing alerts in "setup mode"
As we add more alerts in 7.11 the ES Node list will get long.
In 7.11 we are adding the following alerts:

Write threadpool rejects
Search threadpool rejects
CCR Read Exceptions
Max shard size
We need to organize the alerts into categories for easy visualizations.

Consolidate alerts on the ES nodes under the following sub menus:

Cluster health
- Nodes changed
- Version mismatch
- Max shard size
Resource utilization
- CPU usage
- Disk usage
- Memory usage (JVM)
Errors and exceptions
- Missing monitoring data
- Write threadpool rejects
- Search threadpool rejects
- CCR Read Exceptions

ravikesarwani · 2020-10-26T14:37:57Z

Firing alerts
When looking at the firing alerts on the overview pages we currently show the alerts sorted based on time stamp and display the alert name. When multiple nodes of the cluster have the same alert firing there's no way to distinguish and help user drill down to the right one.

When 8 items or less:
Show a flat list sorted my most recent with the following 3 values shown at the top level:

Time stamp
Alert name
Add Node name and link

When there are 8+ alerts firing we group alerts based on node at the top level.
The hypothesis is that investigation and any fix will be done by the Admin on a per node basis on the cluster and hence grouping by node when there are many alerts firing can help them focus on fixing issues in a more organized manner.

Node with link (# of alerts)
- Time stamp, Alert

We should allow switching by the user between a flat
timestamp sorted list (like right now with node name and link added) vs.
grouped by nodes. We can choose the default view for the user based on the
number of firing alerts but they can switch between them using a toggle
button anytime.

ravikesarwani · 2020-10-26T15:11:55Z

On node details page we currently show each firing alert and all the investigative suggestions expanded on top. If you have many alerts then that takes up a lot of information at the top pushing node metrics details lower requiring more scrolling by the user.

Can we add an expand collapse design for the firing alerts?
Each alert information is shown in one line with an expand arrow on the right. Clicking that will expand the details showing options for the investigative workflows.
Something like this:

chrisronline · 2020-10-26T18:58:54Z

How do we feel about organizing alerts based on severity level? It's a concept we have but I don't think we are leveraging much. Right now, we show badges in the UI for both warning and danger severity level alerts, but it doesn't seem we are giving much weight to this concept moving forward.

Should we keep it and organize each alert under one of them? Or should we move away from that level of categorization?

ravikesarwani · 2020-10-27T18:46:03Z

To me defining severity to distinguish alerts can work only for a very minimal set of use cases.

Is disk capacity a "warning" or "danger". I would say depending upon how full the disk is.
80% full to me is a warning and 85% (when ES will start to behave differently) maybe a danger.
Currently in our alerts we don't have multiple threshold levels but when we do we can tie severity to that and display that visually.

chrisronline · 2020-11-11T20:37:24Z

I have some screenshots to provide from the work I've been doing here. Please let me know if this matches expectations or where we need to correct.

Firing mode

Setup mode

ravikesarwani · 2020-11-11T22:06:17Z

@chrisronline looks really great. Awesome work here.

One minor question/suggestion I had was around the sorting of the list. For "Firing mode" whenever possible can we sort the list based on most recent alert?

chrisronline · 2020-11-12T14:26:32Z

@ravikesarwani Should we sort within the defined categories? Or keep the categories ordering constant and just sort the list of firing alerts within each category by most recent?

chrisronline · 2020-11-12T20:07:13Z

I also have an update on the detail pages:

Any thoughts appreciated on the direction here too.

ravikesarwani · 2020-11-16T15:04:10Z

I like the "configure" option being added here in the details page.

While the design you propose here will do the job I see expand/collapse as little more user friendly (compared to pop up) for the following reasons:

Allows users to "expand" and then work on the page and explore graphs etc. and come back to review the alert details again without having to find and click on the right link again (to show the popup).
Multiple alerts can be expanded and reviewed at the same time, when needed.
We will have more space to work with. This could come in handy as more alerts are added or enhancement done to make investigative workflows better.

ravikesarwani · 2020-11-16T15:16:11Z

@ravikesarwani Should we sort within the defined categories? Or keep the categories ordering constant and just sort the list of firing alerts within each category by most recent?

Just the list of firing alerts within each category should be a good start.
We do have to think a little bit about dynamic updates and not change the ordering when user has the pop up open.

chrisronline · 2020-11-16T18:37:41Z

@ravikesarwani How about:

ravikesarwani · 2020-11-16T21:11:38Z

Thanks, I like this better and feel it makes a better customer experience. Great work here!
Few minor comments:

Did you take out the "bell" icon? I thought it was a good icon to indicate an alert.
Can we make the arrow ">" a little more visible, sort of like call to action. I don't want user to miss out on that action.

chrisronline · 2020-11-17T17:00:41Z

Can we make the arrow ">" a little more visible, sort of like call to action. I don't want user to miss out on that action.

Unfortunately, we can't as the component doesn't allow it and the EUI team isn't in favor of making the size changeable.

Here is a new screenshot with the icon back in:

ravikesarwani · 2020-11-18T14:48:05Z

Let's get this code reviewed, tested & merged. Very helpful usability improvements.

chrisronline · 2020-12-14T14:58:16Z

Resolved with #81569

chrisronline added enhancement New value added to drive a business result Team:Monitoring Stack Monitoring team labels Oct 13, 2020

chrisronline mentioned this issue Oct 23, 2020

[Monitoring] Some progress on making alerts better in the UI #81569

Merged

chrisronline closed this as completed Dec 14, 2020

sgrodzicki added this to the Stack Monitoring UI 7.11 milestone Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Scale alerts in UI #80397

[Monitoring] Scale alerts in UI #80397

chrisronline commented Oct 13, 2020

elasticmachine commented Oct 13, 2020

ravikesarwani commented Oct 20, 2020

igoristic commented Oct 20, 2020 •

edited

Loading

ravikesarwani commented Oct 22, 2020

chrisronline commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

chrisronline commented Oct 26, 2020

ravikesarwani commented Oct 27, 2020

chrisronline commented Nov 11, 2020 •

edited

Loading

ravikesarwani commented Nov 11, 2020

chrisronline commented Nov 12, 2020

chrisronline commented Nov 12, 2020

ravikesarwani commented Nov 16, 2020

ravikesarwani commented Nov 16, 2020

chrisronline commented Nov 16, 2020

ravikesarwani commented Nov 16, 2020

chrisronline commented Nov 17, 2020

ravikesarwani commented Nov 18, 2020 •

edited

Loading

chrisronline commented Dec 14, 2020

[Monitoring] Scale alerts in UI #80397

[Monitoring] Scale alerts in UI #80397

Comments

chrisronline commented Oct 13, 2020

elasticmachine commented Oct 13, 2020

ravikesarwani commented Oct 20, 2020

igoristic commented Oct 20, 2020 • edited Loading

ravikesarwani commented Oct 22, 2020

chrisronline commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

ravikesarwani commented Oct 26, 2020

chrisronline commented Oct 26, 2020

ravikesarwani commented Oct 27, 2020

chrisronline commented Nov 11, 2020 • edited Loading

Firing mode

Setup mode

ravikesarwani commented Nov 11, 2020

chrisronline commented Nov 12, 2020

chrisronline commented Nov 12, 2020

ravikesarwani commented Nov 16, 2020

ravikesarwani commented Nov 16, 2020

chrisronline commented Nov 16, 2020

ravikesarwani commented Nov 16, 2020

chrisronline commented Nov 17, 2020

ravikesarwani commented Nov 18, 2020 • edited Loading

chrisronline commented Dec 14, 2020

igoristic commented Oct 20, 2020 •

edited

Loading

chrisronline commented Nov 11, 2020 •

edited

Loading

ravikesarwani commented Nov 18, 2020 •

edited

Loading