Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add service node name #538

Closed
eyalkoren opened this issue Sep 2, 2019 · 16 comments · Fixed by #565
Closed

Add service node name #538

eyalkoren opened this issue Sep 2, 2019 · 16 comments · Fixed by #565

Comments

@eyalkoren
Copy link
Contributor

Whenever a service is composed of multiple instances, there may be a need to distinguish between them for filtering and aggregation purposes.

The context for this issue is elastic/kibana#43765 - a service (eg a web application) may be composed from multiple instances. In APM, we recently introduced dedicated UI for showing JVM specific metrics. Currently, in order to drill down to see specific JVMs, you would need to use the query bar with other fields, that do not always guarantee uniqueness of JVMs (ie service instances). We now want to further enhance the Metrics feature and provide better options for drilling down to individual JVMs. For this purpose, we want to store a unique service instance name.

We are already using the agent fields and in terms of data-correctness we could use agent.id, but it doesn't really reflect the meaning of that field, which is naming the service instance, and not the agent, so we are not keen on using that. Besides, service instance fields may be valid for agent-less scenarios as well.

Would adding service.instance.name be proper? If so, I assume it would make sense to add service.instance.id as well, but we currently only need the name (we are going to require uniqueness, but it should be a human readable name, therefore - a name rather than an ID).

@ruflin
Copy link
Member

ruflin commented Sep 3, 2019

Few questions:

  • If a service instance is restarted, will the id / name change?
  • How is this different from service.id?

If we introduce service.instance.name we should definitively introduce service.instance.id and I would even suggest to rely on the id field if you expect it to be unique.

@webmat
Copy link
Contributor

webmat commented Sep 3, 2019

By "instance", do you mean a host? If so, I would recommend using the following:

host.hostname (machine's hostname)
host.name (defaults to hostname, but is user configurable)
host.id

Or are you trying to capture information about distinct processes, since the JVM can run more than once on a given host.

@eyalkoren
Copy link
Contributor Author

Or are you trying to capture information about distinct processes, since the JVM can run more than once on a given host.

Yes! There are cases where multiple JVMs of the same service are installed on the same host, eg for redundancy purposes, so that you can restart one without affecting the service uptime.

We have two problems with using the hostname- one is uniqueness (as described above) and the other is meaningfulness. The host.name solves the meaningfulness problem. If you feel that the uniqueness problem is rare enough (and is likely to become rarer) not to be included in ECS, we can use it only for our purposes. However, if it is a valid case, it is relevant for some Beats as well.

@ruflin
Copy link
Member

ruflin commented Sep 4, 2019

I initially thought service.id could help here but this is not possible because of our description:

Unique identifier of the running service. If the service is comprised of many nodes, the service.id should be the same for all nodes.

We faced a similar challenge with monitoring Elasticsearch. You might have multiple Elasticsearch cluster and each has it's own service.id. But each node also needs to be identified uniquely. There we went with the service specific elasticsearch.node.id key. But I think having a generic approach here is better.

We already use instance.id in cloud:

- name: instance.id
I'm +1 on introducing service.instance.id and service.instance.name. It might also be that we need instance.ephemeral_id in addition like we have for the agent:
- name: ephemeral_id

@eyalkoren What happens on restart, will the id stay the same?

@eyalkoren
Copy link
Contributor Author

What happens on restart, will the id stay the same?

This one- yes, that's its purpose.
We do use agent.ephemeral_id, which fits what we needed it for.

It makes sense to add service.instance.ephemeral_id, for example if you want to be able to do per-service-instance query without forcing a persistent unique identifier (which is normally harder to produce, or requires user's configuration).

@ruflin
Copy link
Member

ruflin commented Sep 4, 2019

+1 one from my side to move to the PR stage :-)

@roncohen
Copy link
Contributor

roncohen commented Sep 4, 2019

instance is super generic. Can we get a little closer? What we really mean in the APM case is app server instance name, but i don't think there's something like an app server concept in ECS. node.name perhaps? Depending on how you look at it, node can be anything. What do you call the smallest abstract unit you have in your infrastructure?

@eyalkoren
Copy link
Contributor Author

This seems to fit any case of a service that is scaled out (not sure why this is APM-specific), so if we think about it as a service cluster, node seems appropriate.

@webmat
Copy link
Contributor

webmat commented Sep 4, 2019

So the proposal is to add the following, correct?

  • service.node.id
  • service.node.name
  • service.node.ephemeral_id

I would like to know how these will be populated. What's user-provided & what's derived automatically.

In your implementation, will you be using all 3?

I think service.node.id & service.node.name make a lot of sense.

I'm not sure if we need to make the distinction already between .id and .ephemeral_id, however. If you think so, I'd like to see a concrete example where both are needed.

On how to name this, I think both node and instance could work. Although in the case of cloud.instance, it's referring to a VM. Here we're referring to a process, of which there could be many on the same machine. So I have a clear preference for node, as I think using a different concept name here would help avoid confusion.

@eyalkoren
Copy link
Contributor Author

  • service.node.id - a persistent ID of the node. Should remain the same after node restart.
  • service.node.name - a name for that node. This is currently the only one we plan on using in APM. We still debate whether it must be user-configured, or take an automatic default if not-configured.
  • service.node.ephemeral_id - a non-persistent ID, ie changes between restarts. The use case we had for using it was doing a derivatives aggregation for JVM metrics- assume you have a counter metric (ie monotonically increasing), but you want to make a query that yields deltas. This means each data point must know its preceding data point. Now assume you have two JVMs on the same service sending the same metrics data - you need some way to know what is the preceding document for each document. The advantage of the ephemeral_id over the persistent ID is that it is real easy to come up with- just generate a random one when node is started.

Makes sense?

@webmat
Copy link
Contributor

webmat commented Sep 4, 2019

All 3 explanations make sense, thank you.

However you're saying that APM currently intends to use only service.node.name? If that's the case, I think we should only add service.node.name.

The other two also sound like good ideas in theory, but if there's no concrete usage for them, I don't think they belong in the schema.

It's very easy to add things over time (as we need them), but it's much tricker to remove or modify something that's already in. The latter is what risks happening if we add fields we don't intend to use.

@ruflin
Copy link
Member

ruflin commented Sep 4, 2019

If we go with node.id, that will also work nicely with the Elasticsearch module. (@ycombinator )

@eyalkoren It sounds like you want to use node.name but with the properties of node.id as it's expected to be unique. Do I misread this?

@eyalkoren
Copy link
Contributor Author

It sounds like you want to use node.name but with the properties of node.id as it's expected to be unique

On the one hand we want it to be unique, on the other hand we want to make it human-readable, so both are not a 100% fit, but that's not a problem- we can add our own restrictions/instructions on those, as long as we don't violate the ECS definitions, right?

Essentially, it is a name, something we want to be meaningful, but a unique name.
But let's add both- service.node.name for our use and service.node.id for your use 🙂

@axw
Copy link
Member

axw commented Sep 5, 2019

instance is super generic. Can we get a little closer?

Instance is generic, it's only meaningful when attached to something, like "service". Seems fine to me, except as others have pointed out we have "cloud.instance", which isn't really the same thing. i.e. it's not an instance of a cloud, but it's a VM instance within a cloud.

What we really mean in the APM case is app server instance name, but i don't think there's something like an app server concept in ECS.

What do you mean by app server? Something like WebLogic? That won't carry across to the example of Elasticsearch node ID. Not sure I understand what you're referring to though.

node.name perhaps? Depending on how you look at it, node can be anything. What do you call the smallest abstract unit you have in your infrastructure?

Node seems reasonable to me, since it's a pretty common term in clustering, which I think is essentially what we're talking about here.

@roncohen
Copy link
Contributor

roncohen commented Sep 5, 2019

OK, let's start with service.node.name then? Maybe we can have a separate discussion on the elasticsearch node situation. I'd rather not add service.node.id now unless we're going to make the change to the Elasticsearch module to make use of it imminently.

@simitt
Copy link
Contributor

simitt commented Sep 10, 2019

From the description of service.name:

    The name of the service is normally user given. This allows if two
    instances of the same service are running on the same machine
    they can be differentiated by the `service.name`.

    Also it allows for distributed services that run on multiple hosts to
    correlate the related instances based on the name.

To me this reads contradicting. On the one hand the service.name is supposed to be unique for multiple processes on one machine, on the other hand it should be the same so services over multiple hosts can be matched.

When adding service.node.name I suggest to change the description for service.name to remove this first paragraph from it, as this is exactly what service.node.name is introduced for.

@eyalkoren eyalkoren changed the title Add service instance name Add service node name Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants