Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS: Set host.name along with host.hostname in exported data #2502

Closed
simianhacker opened this issue Jul 29, 2019 · 19 comments · Fixed by #2531
Closed

ECS: Set host.name along with host.hostname in exported data #2502

simianhacker opened this issue Jul 29, 2019 · 19 comments · Fixed by #2531
Assignees
Milestone

Comments

@simianhacker
Copy link
Member

APM server should be shipping host.name along with host.hostname per ECS. host.name should default to host.hostname but allow the user to override it with their host identifier if it's different. This is necessary to ensure cross-linking between Infra UI and APM is consistent.

@roncohen
Copy link
Contributor

Related: elastic/ecs#498 (comment)

@roncohen
Copy link
Contributor

We should probably set this in agents instead of in APM Server

@graphaelli
Copy link
Member

In what situation does cross-linking happen based on host.name?

@simianhacker
Copy link
Member Author

Everything except APM. In the Infra UI we have to put exceptions in for APM because they using host.hostname instead of host.name. As a company, we really need to standardize on one field across everything to represent an ID for a host. For Kubernetes Pods and Docker Containers we are all using the same ID fields. Although, it looks like there is host.id in the ECS documentation but Metricbeat doesn't use it for some reason.

Work Around Examples:

@elastic/beats @ruflin Any reason why we are not setting host.id in Metricbeat/Filebeat? Seems like that should be the blessed identifier field for Hosts instead of host.name

@graphaelli graphaelli added discuss and removed bug labels Jul 29, 2019
@simitt
Copy link
Contributor

simitt commented Jul 30, 2019

Linking to some context around discussions of host.name vs. host.hostname: #1609 (comment). Haven't followed changes on host since then, but it seemed fair to leave the field empty if not user overwritten (which the agents don't support yet afaik).

@graphaelli
Copy link
Member

Thanks for digging that up @simitt, I had forgotten some of that context.

We can certainly copy host.hostname to host.name for APM, and let agents override that value. We'd still need to keep some workaround in place as the curated UIs should fall back to linking based on host.hostname if host.name is not present, to support older 7.x data.

host.id would be great if multiple processes could arrive at the same value for it independently - that's why kubernetes and docker links work so well - but I don't think there is a portable way to do that well.

@graphaelli graphaelli added this to the 7.4 milestone Jul 30, 2019
@simitt simitt self-assigned this Aug 7, 2019
simitt added a commit to simitt/apm-server that referenced this issue Aug 7, 2019
Use information from `system.name` when provided, otherwise derive
data from `host.hostname`.`

fixes elastic#2502
simitt added a commit to simitt/apm-server that referenced this issue Aug 7, 2019
Use information from `system.name` when provided, otherwise derive
data from `host.hostname`.`

fixes elastic#2502
simitt added a commit that referenced this issue Aug 7, 2019
Use information from `system.name` when provided, otherwise derive data from `host.hostname`.`

fixes #2502
simitt added a commit to simitt/apm-server that referenced this issue Aug 7, 2019
Use information from `system.name` when provided, otherwise derive data from `host.hostname`.`

fixes elastic#2502
simitt added a commit to simitt/apm-server that referenced this issue Aug 7, 2019
Use information from `system.name` when provided, otherwise derive data from `host.hostname`.`

fixes elastic#2502
simitt added a commit that referenced this issue Aug 8, 2019
Use information from `system.name` when provided, otherwise derive data from `host.hostname`.`

fixes #2502
@simitt simitt reopened this Aug 8, 2019
@simitt
Copy link
Contributor

simitt commented Aug 8, 2019

Turns out agent collected information in system.hostname can already be overwritten by the users.

Current behavior server side (after merging #2531 ):

  • map system.hostname to host.hostname if system.kubernetes is empty, otherwise set to system.kubernetes.namespace or to nil if not available.
  • map system.name to host.name if available, otherwise copy value from host.hostname.

We can change the behavior in the agents to send auto retrieved information in system.hostname and customized information in system.name only if it is actually user overwritten. That would theoretically require a major version bump or could lead to inconsistent data depending on agent version. I tend to consider this a bug fix.

Other option would be to change behavior in the server to map system.hostname to host.name and introduce new field for what should be stored as host.hostname. In case kubernetes information is sent up there would be no way for information sent from current agents to identify whether or not system.hostname is customized and therefore should overrule the kubernetes information or not.

I therefore suggest to change agent behavior and consider it a bug fix, as the customized information should have never ended up in the host.hostname information in the first place.

Suggested behavior:

  • send auto retrieved host name information in system.hostname. Server keeps current logic of using this information if no kubernetes information is provided.
  • send customized host name information only if overwritten by user configuration in system.name. Server stores this information if given, otherwise falls back to store information from host.hostname.

cc @elastic/apm-agent-devs

@watson
Copy link
Contributor

watson commented Aug 8, 2019

If the agents are updated to send the real hostname in system.hostname and the optional user-provided hostname in system.name, wouldn't we then lose the user-provided hostname in case the agent is talking to an older APM Server that doesn't yet know to read system.name?

@SergeyKleyman
Copy link
Contributor

SergeyKleyman commented Aug 8, 2019

Maybe we should leave existing system.hostname alone for backward compatibility and introduce two fields with clearer names? For example system.detected_hostname and system.configured_hostname? Then in time we can phase out system.hostname.

@simitt
Copy link
Contributor

simitt commented Aug 8, 2019

Agree with @SergeyKleyman 's proposal. In case at least one of the two new fields are set, information in system.hostname would be ignored.

simitt added a commit to simitt/apm-server that referenced this issue Aug 8, 2019
Deprecate existing `system.hostname` and add `system.detected_hostname`
and `system.configured_hostname` to the Intake API.

follow up on elastic#2502
@graphaelli
Copy link
Member

I agree on leaving system.hostname alone but would prefer shorter names for the new information, especially if they can match the agent configuration. Will the configuration for this information will be name on the agent side? How about name instead of configured_hostname and add detected_hostname, since the user will never interact with anything called that directly.

@SergeyKleyman
Copy link
Contributor

Will the configuration for this information will be name on the agent side?

According to Elastic APM Backend Agent Config Comparison the name of the configuration option is hostname (ELASTIC_APM_HOSTNAME environment variable)

@graphaelli
Copy link
Member

Thanks, missed that somehow. I'd still prefer name and detected_hostname instead of configured_hostname but am not too fussed about the intake naming if anyone feels really strongly otherwise.

@SergeyKleyman
Copy link
Contributor

I don't think it's that important but the main reason I suggested more descriptive names is to make it clear that system.detected_hostname and system.configured_hostname are independent and there is no logic on agent side trying to deduce one from the other thus deferring this logic to APM Server.

How about name instead of configured_hostname and add detected_hostname, since the user will never interact with anything called that directly.

Do you think the connection between hostname configuration option, system.name in Intake API and host.name Elasticsearch field is clear? To me the meaning of system.name would very hard to deduce without referencing the spec because it's not very clear to which entity system object refers. It seems that system object is used as an umbrella for everything outside monitored application/service.

Also I'm not sure how much of an optimization shortening system.configured_hostname would be. If most of the users rely on system.detected_hostname and don't use hostname configuration option (I don't know if it's indeed the case) then system.configured_hostname will be omitted from the payload anyway.

What is the motivation for shorter names in Intake API if the Elasticsearch fields will be different anyway? Is it mostly to reduce network traffic volume?

@graphaelli
Copy link
Member

What is the motivation for shorter names in Intake API if the Elasticsearch fields will be different anyway? Is it mostly to reduce network traffic volume?

Yes, mostly that. Also (as you well know) more verbosity makes larger events which we need to split (per api_request_size) among more payloads. Agreed it's minor in this case.

Also agreed with the rest of what you wrote. Only agent authors see the intake api so I can see making the field names make the most sense for that audience, and consider it a bonus when the persistence matches as well.

@simitt
Copy link
Contributor

simitt commented Aug 19, 2019

IMO system.detected_hostname and system.configured_hostname is way more clear than system.name and system.detected_hostname, but as already mentioned, the API is mainly used by agent devs and as long as there is a good comment explaining the meaning, the name should not matter too much.

@graphaelli want to make a final call on the naming, so we can move on?

@graphaelli
Copy link
Member

I'm convinced, let's proceed as proposed, thanks.

@simitt
Copy link
Contributor

simitt commented Aug 21, 2019

I created a dedicated issue for further discussions around host.id to keep the two issues separated #2608.

simitt added a commit that referenced this issue Aug 23, 2019
Deprecate existing `system.hostname` and add `system.detected_hostname` and `system.configured_hostname` to the Intake API.

follow up on #2502
@simitt
Copy link
Contributor

simitt commented Aug 23, 2019

Fixed by #2540

@simitt simitt closed this as completed Aug 23, 2019
simitt added a commit to simitt/apm-server that referenced this issue Aug 23, 2019
Deprecate existing `system.hostname` and add `system.detected_hostname` and `system.configured_hostname` to the Intake API.

follow up on elastic#2502
simitt added a commit that referenced this issue Aug 23, 2019
Deprecate existing `system.hostname` and add `system.detected_hostname` and `system.configured_hostname` to the Intake API.

follow up on #2502
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants