-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot start Beats - fails with error: could not get FQDN #34910
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
isn't this implementation behind a FF? Or did you enable it @andrewkroh ? |
Anyway, marking this as a P0 as it seems like a degradation. |
I did not enable the feature. I didn't use a config file or specify any options. It failed with only the |
What is interesting is that we do have a system test for the version sub-command using mockbeat that apparently isn't catching this: https://github.com/elastic/beats/blob/main/libbeat/tests/system/test_cmd_version.py Whatever the cause turns out to be, we should make sure that test can catch this before fixing it. |
@andrewkroh I just built Filebeat (OSS) from
Same with Metricbeat (OSS):
[EDIT] I also tried running both commands with no network connection, since the error you posted looks like it's doing some kind of DNS lookup. Both commands still worked for me. Hmmm... trying to figure out what else might be different between our environments so we can narrow down on exactly what needs to be setup during an automated test for this issue. What do |
|
Thanks, here's the same output for mine:
Looks like the only difference is your machine's hostname is a short one (no domain component). Let me mess around with that setup and see if I can reproduce your error locally. Thanks again! |
Yep, there it is:
Okay, I'll work on a test PR that sets up a short hostname and makes sure it fails on the current build, then work on the fix. |
This is the stack of the main goroutine when the error originates.
FWIW I think the algorithm used to determine the FQDN could use some godoc to explain what it does and why. I think this is what it's doing.
In addition I think we need to document what are the prerequisites to having the FQDN feature work. That will help when we need to support users that are asking why their Beat is not reporting the FQDN of the machine after enabling the feature flag. |
Agreed, happy to document the what and why of the algorithm. However, the "why" of the algorithm is unclear to me. In particular, why do we need to do the CNAME lookup followed by the reverse lookup. Why don't we "just" run |
Had a good chat with @leehinman about the FQDN lookup algorithm. We agreed that the algorithm is fine as-is, in that it tries to lookup the FQDN and reports an error if it fails. However, the consumers of the FQDN should not fail on error. In other words, they should treat the FQDN lookup as a "best effort" and, if it fails, log an error so we are not blind to the failure. After logging the error, they should fall back to the OS-reported hostname and continue execution. This change in behavior will require code changes in |
Added a PR to make the FQDN lookup algorithm a bit more testable: elastic/go-sysinfo#158. |
Added another PR to report FQDN errors from lookup algorithm separately so consumers can handle them with the severity they desire: elastic/go-sysinfo#159. |
I'm unable to start Metricbeat and Filebeat.
For confirmed bugs, please report:
The text was updated successfully, but these errors were encountered: