Add service node name #538

eyalkoren · 2019-09-02T10:01:21Z

Whenever a service is composed of multiple instances, there may be a need to distinguish between them for filtering and aggregation purposes.

The context for this issue is elastic/kibana#43765 - a service (eg a web application) may be composed from multiple instances. In APM, we recently introduced dedicated UI for showing JVM specific metrics. Currently, in order to drill down to see specific JVMs, you would need to use the query bar with other fields, that do not always guarantee uniqueness of JVMs (ie service instances). We now want to further enhance the Metrics feature and provide better options for drilling down to individual JVMs. For this purpose, we want to store a unique service instance name.

We are already using the agent fields and in terms of data-correctness we could use agent.id, but it doesn't really reflect the meaning of that field, which is naming the service instance, and not the agent, so we are not keen on using that. Besides, service instance fields may be valid for agent-less scenarios as well.

Would adding service.instance.name be proper? If so, I assume it would make sense to add service.instance.id as well, but we currently only need the name (we are going to require uniqueness, but it should be a human readable name, therefore - a name rather than an ID).

The text was updated successfully, but these errors were encountered:

ruflin · 2019-09-03T14:49:03Z

Few questions:

If a service instance is restarted, will the id / name change?
How is this different from service.id?

If we introduce service.instance.name we should definitively introduce service.instance.id and I would even suggest to rely on the id field if you expect it to be unique.

webmat · 2019-09-03T17:44:10Z

By "instance", do you mean a host? If so, I would recommend using the following:

host.hostname (machine's hostname)
host.name (defaults to hostname, but is user configurable)
host.id

Or are you trying to capture information about distinct processes, since the JVM can run more than once on a given host.

eyalkoren · 2019-09-04T03:12:09Z

Or are you trying to capture information about distinct processes, since the JVM can run more than once on a given host.

Yes! There are cases where multiple JVMs of the same service are installed on the same host, eg for redundancy purposes, so that you can restart one without affecting the service uptime.

We have two problems with using the hostname- one is uniqueness (as described above) and the other is meaningfulness. The host.name solves the meaningfulness problem. If you feel that the uniqueness problem is rare enough (and is likely to become rarer) not to be included in ECS, we can use it only for our purposes. However, if it is a valid case, it is relevant for some Beats as well.

ruflin · 2019-09-04T06:10:36Z

I initially thought service.id could help here but this is not possible because of our description:

Unique identifier of the running service. If the service is comprised of many nodes, the service.id should be the same for all nodes.

We faced a similar challenge with monitoring Elasticsearch. You might have multiple Elasticsearch cluster and each has it's own service.id. But each node also needs to be identified uniquely. There we went with the service specific elasticsearch.node.id key. But I think having a generic approach here is better.

We already use instance.id in cloud:

ecs/schemas/cloud.yml

Line 40 in 34b391c

- name: instance.id

I'm +1 on introducing service.instance.id and service.instance.name. It might also be that we need instance.ephemeral_id in addition like we have for the agent:

ecs/schemas/agent.yml

Line 63 in 34b391c

- name: ephemeral_id

@eyalkoren What happens on restart, will the id stay the same?

eyalkoren · 2019-09-04T06:24:06Z

What happens on restart, will the id stay the same?

This one- yes, that's its purpose.
We do use agent.ephemeral_id, which fits what we needed it for.

It makes sense to add service.instance.ephemeral_id, for example if you want to be able to do per-service-instance query without forcing a persistent unique identifier (which is normally harder to produce, or requires user's configuration).

ruflin · 2019-09-04T06:25:36Z

+1 one from my side to move to the PR stage :-)

roncohen · 2019-09-04T09:47:05Z

instance is super generic. Can we get a little closer? What we really mean in the APM case is app server instance name, but i don't think there's something like an app server concept in ECS. node.name perhaps? Depending on how you look at it, node can be anything. What do you call the smallest abstract unit you have in your infrastructure?

eyalkoren · 2019-09-04T10:24:42Z

This seems to fit any case of a service that is scaled out (not sure why this is APM-specific), so if we think about it as a service cluster, node seems appropriate.

webmat · 2019-09-04T13:44:05Z

So the proposal is to add the following, correct?

service.node.id
service.node.name
service.node.ephemeral_id

I would like to know how these will be populated. What's user-provided & what's derived automatically.

In your implementation, will you be using all 3?

I think service.node.id & service.node.name make a lot of sense.

I'm not sure if we need to make the distinction already between .id and .ephemeral_id, however. If you think so, I'd like to see a concrete example where both are needed.

On how to name this, I think both node and instance could work. Although in the case of cloud.instance, it's referring to a VM. Here we're referring to a process, of which there could be many on the same machine. So I have a clear preference for node, as I think using a different concept name here would help avoid confusion.

eyalkoren · 2019-09-04T14:08:03Z

service.node.id - a persistent ID of the node. Should remain the same after node restart.
service.node.name - a name for that node. This is currently the only one we plan on using in APM. We still debate whether it must be user-configured, or take an automatic default if not-configured.
service.node.ephemeral_id - a non-persistent ID, ie changes between restarts. The use case we had for using it was doing a derivatives aggregation for JVM metrics- assume you have a counter metric (ie monotonically increasing), but you want to make a query that yields deltas. This means each data point must know its preceding data point. Now assume you have two JVMs on the same service sending the same metrics data - you need some way to know what is the preceding document for each document. The advantage of the ephemeral_id over the persistent ID is that it is real easy to come up with- just generate a random one when node is started.

Makes sense?

webmat · 2019-09-04T14:38:41Z

All 3 explanations make sense, thank you.

However you're saying that APM currently intends to use only service.node.name? If that's the case, I think we should only add service.node.name.

The other two also sound like good ideas in theory, but if there's no concrete usage for them, I don't think they belong in the schema.

It's very easy to add things over time (as we need them), but it's much tricker to remove or modify something that's already in. The latter is what risks happening if we add fields we don't intend to use.

ruflin · 2019-09-04T14:49:02Z

If we go with node.id, that will also work nicely with the Elasticsearch module. (@ycombinator )

@eyalkoren It sounds like you want to use node.name but with the properties of node.id as it's expected to be unique. Do I misread this?

eyalkoren · 2019-09-04T15:12:52Z

It sounds like you want to use node.name but with the properties of node.id as it's expected to be unique

On the one hand we want it to be unique, on the other hand we want to make it human-readable, so both are not a 100% fit, but that's not a problem- we can add our own restrictions/instructions on those, as long as we don't violate the ECS definitions, right?

Essentially, it is a name, something we want to be meaningful, but a unique name.
But let's add both- service.node.name for our use and service.node.id for your use 🙂

axw · 2019-09-05T05:48:58Z

instance is super generic. Can we get a little closer?

Instance is generic, it's only meaningful when attached to something, like "service". Seems fine to me, except as others have pointed out we have "cloud.instance", which isn't really the same thing. i.e. it's not an instance of a cloud, but it's a VM instance within a cloud.

What we really mean in the APM case is app server instance name, but i don't think there's something like an app server concept in ECS.

What do you mean by app server? Something like WebLogic? That won't carry across to the example of Elasticsearch node ID. Not sure I understand what you're referring to though.

node.name perhaps? Depending on how you look at it, node can be anything. What do you call the smallest abstract unit you have in your infrastructure?

Node seems reasonable to me, since it's a pretty common term in clustering, which I think is essentially what we're talking about here.

roncohen · 2019-09-05T06:52:14Z

OK, let's start with service.node.name then? Maybe we can have a separate discussion on the elasticsearch node situation. I'd rather not add service.node.id now unless we're going to make the change to the Elasticsearch module to make use of it imminently.

simitt · 2019-09-10T11:31:18Z

From the description of service.name:

    The name of the service is normally user given. This allows if two
    instances of the same service are running on the same machine
    they can be differentiated by the `service.name`.

    Also it allows for distributed services that run on multiple hosts to
    correlate the related instances based on the name.

To me this reads contradicting. On the one hand the service.name is supposed to be unique for multiple processes on one machine, on the other hand it should be the same so services over multiple hosts can be matched.

When adding service.node.name I suggest to change the description for service.name to remove this first paragraph from it, as this is exactly what service.node.name is introduced for.

eyalkoren mentioned this issue Sep 2, 2019

Adding service.node.name field elastic/apm#141

Closed

roncohen mentioned this issue Sep 4, 2019

[APM] JVM list and individual JVM metrics page elastic/kibana#43765

Closed

8 tasks

simitt mentioned this issue Sep 11, 2019

Add new service field to Intake API + ES elastic/apm-server#2696

Closed

4 tasks

eyalkoren mentioned this issue Sep 18, 2019

Adding service.node.name #565

Merged

eyalkoren changed the title ~~Add service instance name~~ Add service node name Sep 18, 2019

eyalkoren closed this as completed in #565 Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add service node name #538

Add service node name #538

eyalkoren commented Sep 2, 2019

ruflin commented Sep 3, 2019

webmat commented Sep 3, 2019

eyalkoren commented Sep 4, 2019

ruflin commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

ruflin commented Sep 4, 2019

roncohen commented Sep 4, 2019 •

edited

Loading

eyalkoren commented Sep 4, 2019

webmat commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

webmat commented Sep 4, 2019

ruflin commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

axw commented Sep 5, 2019

roncohen commented Sep 5, 2019 •

edited

Loading

simitt commented Sep 10, 2019

Add service node name #538

Add service node name #538

Comments

eyalkoren commented Sep 2, 2019

ruflin commented Sep 3, 2019

webmat commented Sep 3, 2019

eyalkoren commented Sep 4, 2019

ruflin commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

ruflin commented Sep 4, 2019

roncohen commented Sep 4, 2019 • edited Loading

eyalkoren commented Sep 4, 2019

webmat commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

webmat commented Sep 4, 2019

ruflin commented Sep 4, 2019

eyalkoren commented Sep 4, 2019

axw commented Sep 5, 2019

roncohen commented Sep 5, 2019 • edited Loading

simitt commented Sep 10, 2019

roncohen commented Sep 4, 2019 •

edited

Loading

roncohen commented Sep 5, 2019 •

edited

Loading