Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] JVM list and individual JVM metrics page #43765

Closed
8 tasks
katrin-freihofner opened this issue Aug 22, 2019 · 36 comments · Fixed by #46779
Closed
8 tasks

[APM] JVM list and individual JVM metrics page #43765

katrin-freihofner opened this issue Aug 22, 2019 · 36 comments · Fixed by #46779
Assignees

Comments

@katrin-freihofner
Copy link
Contributor

katrin-freihofner commented Aug 22, 2019

For Java services we want to provide a table showing all JVMs for that service. From there a drill-down to individual JVM pages should be possible.

Screen Shot 2019-09-23 at 10 28 01

Todos:

  • change Metrics tab for a JVMs tab
  • the local filters on this page should be the same as for the other tabs
  • add a table listing all JVMs and system.process.cpu.total.norm.pct (avg), jvm.memory.heap.used (avg), jvm.memory.non_heap.used (avg), jvm.thread.count (max)
  • the table should be sortable
  • the values in the table should be reference the timeframe selected in the datepicker
  • link these JVMs to a new JVM specific page
  • show JVM specific metrics on JVM page (the same charts as currently visible in the Metrics tab)
  • add a Meta data panel for container ID info

Notes:
service.node.name should be used for metrics aggregations (instead of ephemeral_id), and for the label in the JVM table

Marvel prototype:
https://marvelapp.com/63fejha/screen/60465020

Related design/discussion issue: #43009
#36320 depends on this

@katrin-freihofner katrin-freihofner added the Team:APM All issues that need APM UI Team support label Aug 22, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui

@roncohen
Copy link
Contributor

the values in the table should be reference the timeframe selected in the datepicker

did we align on this? e.g. if you have chosen 24h (default) and one JVM has been acting out for 30m you'll probably not see it because it will be averaging for the full 24h. If possible, it think it's better to show the most recent data point in the time range, or the avg. over the last minute of the time range and then we'd need to clearly state it next to the table

@katrin-freihofner
Copy link
Contributor Author

did we align on this?

Apparently not. As far as I remember, @sqren mentioned that this is the current solution. That is why I have been proposing this approach.
If we want to change to the most recent value as you mentioned, we need to explain the data within the table.

@eyalkoren
Copy link
Contributor

I am here to nag again- what are JVM 1, JVM 2 etc? What are these names/identifiers made of? A combination of host and env? A container ID if available? A manually set instance.id?

If there is an agreement on instance.id, please add it to the todo list as it needs to be implemented by all agents and server, and needs tracking.

@sorenlouv
Copy link
Member

sorenlouv commented Sep 2, 2019

I am here to nag again- what are JVM 1, JVM 2 etc?

I don't think we have ultimately decided what identifies a jvm. It was my understanding that this was something you and @felixbarny would come up with. I don't have any strong opinions.

@felixbarny previously suggestions:

The JVM ID would be consistent across restarts. It's not trivial to come up with a good id which works for all cases but there are some options. For a generic application, we can use $host/$service/$environment. For web applications, we can use $host/$service/$port, where $port would be the javax.servlet.ServletRequest#getLocalPort of the first served request.

And it sounded like you both agreed that it should be overwritable by a user-defined id.

@eyalkoren
Copy link
Contributor

Automatic name discovery would be awesome but not easy to come up with (and maybe impossible in some cases), so not something we want to go into at this point.

The user-defined name is something we can do as part of this feature- I added an APM issue for that.
However, it is going to be optional, so there will be UI logic around how to filter/aggregate and label (eg if user-defined name is set use it, otherwise if a container ID exists..., otherwise host and environment...).

Maybe not related directly to this issue, but worth mentioning- keep this manually set name in mind for the drop-down filters. If a user invests time in setting those, they are guaranteed to be unique and meaningful so the corresponding field should be a filtering criterion. Not sure if the fact that it is optional would complicate that.

@sorenlouv
Copy link
Member

However, it is going to be optional, so there will be UI logic around how to filter/aggregate and label (eg if user-defined name is set use it, otherwise if a container ID exists..., otherwise host and environment...).

I'd prefer this logic residing in the agent so the UI can always rely on a single field as the identifier. That'll make aggregations a lot simpler.

@eyalkoren
Copy link
Contributor

I'd prefer this logic residing in the agent so the UI can always rely on a single field as the identifier. That'll make aggregations a lot simpler.

Yes, this came up in the discussion we had. The are several reasons why I think we shouldn't go this way:

  1. any default we can currently come up with is not guaranteed to be unique, and we are going to document somewhere that this field is kept for unique service instance names.
  2. I'd like this field to contain a service instance human-readable name, not a host, or a combination of host-and-env or a container ID.
  3. When filtering and going over the list of instance names, user should not have ambiguity whether a value there is automatically or manually set
  4. There will be some logic related to how we create the automatic name when no manual is set. Why implement it multiple times in all agents, and then fixing in all when we realize we want to modify this logic?

@sorenlouv
Copy link
Member

sorenlouv commented Sep 2, 2019

There will be some logic related to how we create the automatic name when no manual is set. Why implement it multiple times in all agents, and then fixing in all when we realize we want to modify this logic?

If the logic between all agents is exactly the same, I can see your concerns. However this will require users making custom dashboards to implement this logic too. And as mentioned above, the aggregations are going to be a lot more complex.

If everyone else agrees this is a better approach we can ofc find a way to make it work.

@eyalkoren
Copy link
Contributor

However this will require users making custom dashboards to implement this logic too

This is exactly why this is important:

any default we can currently come up with is not guaranteed to be unique, and we are going to document somewhere that this field is kept for unique service instance names.

Users setting up dashboards on a field they see documented as a service instance unique name will be surprised to discover it is not really such- not a name and not unique...

@sorenlouv
Copy link
Member

any default we can currently come up with is not guaranteed to be unique, and we are going to document somewhere that this field is kept for unique service instance names.

What about documenting that it's only guaranteed to be unique if the user comes up with a name themselves. If not we'll provide a best-effort name.

@eyalkoren
Copy link
Contributor

It is not guaranteed to be unique if the user comes up with the name, it is more of a description of the requirements of what should be stored in this field. So we can say: "use this field to store a meaningful/human-readable unique name for the service instance, otherwise this field will contain a best-effort string which is not guaranteed to be unique nor meaningful". As long as we use a name field and not id (which will probably be defined as unique in ECS) - this is doable.

I expect the logic of determining the name to be pretty simple, so implementing it in the agents is not a big concern.
If this greatly complicates things in UI or make dashboard definition much more complex- let's go with setting in the agent and document that accordingly.

@roncohen I think there is enough info here to decide either way. You shared the same rationale with @sqren as well. Please make a decision and either way- suggest logic for the automatic name generation.

@roncohen
Copy link
Contributor

roncohen commented Sep 4, 2019

OK, sounds like the best thing we can do at the moment is:

  • decide on a field name that the UI will use for the list. The field name will be whatever we decide in ECS issue
  • we set this field to whatever is in host.name in agents initially and let users override it manually
  • taking inspiration from this discussion, if the value is manually set, agents will additionally set a service.instance.configured_name or similar to indicate that it was set manually
  • show a filter for service.instance.configured_name if there are any results that have it set (how difficult is that do to @sqren? and is having filters magically appear consistent enough with what we do elsewhere? )

to move the UI forward, I suggest we use host.name in the UI for now and whenever we have the rest figured out we switch it to the new field. WDYT?

@sorenlouv
Copy link
Member

sorenlouv commented Sep 4, 2019

show a filter for service.instance.configured_name if there are any results that have it set

@roncohen I'm not sure what that filter would do. The table is already showing jvm names (based on the instance name decided in elastic/ecs#538). Clicking on a jvm in the list is essentially the same as filtering by that jvm.
Filtering by service.instance.configured_name sounds like it would always only list a single item in the table. Would that be useful?

I'm probably missing something.

@roncohen
Copy link
Contributor

roncohen commented Sep 4, 2019

Filtering by service.instance.configured_name sounds like it would always only list a single item in the table. Would that be useful?

true, not very useful..

EDIT: it you have 100+ and they don't all fit in the table on the page and you want the specific JVM you've named, it would probably be useful, but we can start without

@eyalkoren
Copy link
Contributor

eyalkoren commented Sep 4, 2019

@sqren @roncohen I agree it is not needed for the metrics, given that the landing page is this table.
However, it may be useful for all other views. Wherever you originally felt filtering for host is interesting, this will probably be at least as useful.

Nevertheless, this is a separate discussion and should be done within the filtering feature probably. Maybe we can focus here only on the aggregation criterion for metrics presentation.

@eyalkoren
Copy link
Contributor

eyalkoren commented Sep 10, 2019

Decision:
Add an optional service.node.name config, defaults to host.name if not configured.
Meaning- it's an optional config, but mandatory in schema.
Start without service.instance.configured_name

@sorenlouv
Copy link
Member

sorenlouv commented Sep 10, 2019

Add this optional config, defaults to host.name if not configured.

Sorry for being thick, what is "this optional config"?

@eyalkoren
Copy link
Contributor

@sqren Sorry, wrote that during a meeting, so not the most clear...
Updated.

@eyalkoren
Copy link
Contributor

After more discussions, updated the suggested approach to the service.node.name issue- elastic/apm#141 (comment).
Main takeaway to the UI side: the service.node.name will be indexed and always available and should be used for metrics aggregations (instead of ephemeral_id), as well as for the label in the JVM table.

@dgieselaar
Copy link
Member

@formgeist @katrin-freihofner I've opened a PR (#46779), just need to add metadata, but not sure what the status is. Is there going to be a design update for the display of metadata, or should I implement the panel?

@nehaduggal
Copy link

We need a design update for the metadata display. The main use case of this page is not filtering, but one of the biggest reason a person would land on the page is because they are investigating a single JVM. The metadata associated with the JVM is important and should feature up top.

@formgeist
Copy link
Contributor

I’ll come back with a proposed design for showing the metadata up top

@formgeist
Copy link
Contributor

Here's a proposed design for adding the metadata up top. I believe we'd need to truncate the container ID in order for it to remain somewhat readable. We can display the full ID in a tooltip upon hover.

02 JVM details

@dgieselaar
Copy link
Member

dgieselaar commented Sep 30, 2019 via email

@dgieselaar
Copy link
Member

@formgeist I was missing the local UI filters on your mockup, and now I am wondering if they even should be there at all. AFAIK, host/container/pod will only have one value anyway?

@formgeist
Copy link
Contributor

@dgieselaar AFAIK the JVM detail page (pictured above) doesn't have local UI filters, similar to our error group detail page. I agree, they would only have 1 value in them.

@sorenlouv
Copy link
Member

I agree, the jvm details page doesn't need the local UI filters.

dgieselaar added a commit to dgieselaar/kibana that referenced this issue Oct 3, 2019
dgieselaar added a commit that referenced this issue Oct 3, 2019
* [APM] JVM List view & JVM metrics page

Closes #43765.

* Add service node metadata

* Use service node terminology

* Use asPercent for CPU

* Tooltip for truncated elements

* Add processor.event filters

* Use service.node.name

* Sort client-side only

* Add processor.event filter to service nodes projection
@zube zube bot reopened this Oct 3, 2019
@zube zube bot closed this as completed Oct 3, 2019
dgieselaar added a commit to dgieselaar/kibana that referenced this issue Oct 3, 2019
* [APM] JVM List view & JVM metrics page

Closes elastic#43765.

* Add service node metadata

* Use service node terminology

* Use asPercent for CPU

* Tooltip for truncated elements

* Add processor.event filters

* Use service.node.name

* Sort client-side only

* Add processor.event filter to service nodes projection
dgieselaar added a commit that referenced this issue Oct 4, 2019
* [APM] JVM List view & JVM metrics page

Closes #43765.

* Add service node metadata

* Use service node terminology

* Use asPercent for CPU

* Tooltip for truncated elements

* Add processor.event filters

* Use service.node.name

* Sort client-side only

* Add processor.event filter to service nodes projection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants