-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metricbeat monitoring of Elasticsearch and Kibana broke on recent 7.17.0-SNAPSHOT rpm #29920
Comments
The same test passed on 8.0.0-rc2-c0b0e70d which was built today and also on the latest 8.1.0 snapshot build. |
I'm just guessing here but I'm thinking maybe this PR shouldn't have been backported to 7.17? #29869 But in both the passing and failing cases I can see the legacy templates for monitoring are there. UPDATE: Scratch this, that PR was only docs. |
Pinging @elastic/integrations (Team:Integrations) |
cc: @sayden |
Those configs show elasticsearch and kibana as - module: elasticsearch
xpack.enabled: true
period: 10s
hosts: ["http://localhost:9200"] More info https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html |
I changed those false's to true and restarted metricbeat but still no .monitoring-es index is created. Is there a way to enable debug logging? I'm surprised there's nothing in the metricbeat log about which modules it's using. |
I'm back on the earlier 7.17.0-766e7ca2 build now again. Those values in metricbeat.yml are false;
But I do have monitoring indices created. Notice I have an index for logstash and beats monitoring but there isn't even a section in the metricbeat.yml for it. Those work based on the module file being not disabled.
|
FYI @marius-dr who set some of this up for the Kibana-QA team. |
I was a little confused by this too. But what I see now is that the section in |
It is a bit confusing, without doubt. More in the case of Stack Monitoring, my apologies in advance. The Generally speaking you want to maintain "cluster config" in Maybe give it a try setting a barebones module configuration for elasticsearch in the Thanks! 🙂 |
When it fails, the templates are there but no indices are created. I figured out how to turn debug logging on in metricbeat so I'll see if I get more info from that and update here. |
I think the ECK operator e2e tests are also seeing this issue https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-snapshot-versions/498//testReport with similar symptoms: the monitoring indices are not created.
This is using the stack monitoring feature in ECK https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-monitoring.html |
Similar to my case I see that @pebrc had a passing result on Jan 18th and failure on Jan 19th and all jobs since then. |
I think this is related to the ECK operator, not to Stack Monitoring. I have just done a full test and it works:
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
index.codec: best_compression
setup.kibana:
output.elasticsearch:
hosts: ["localhost:9200"]
username: "elastic"
password: "changeme"
# Module: elasticsearch
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.16/metricbeat-module-elasticsearch.html
- module: elasticsearch
xpack.enabled: true
period: 10s
hosts: ["http://localhost:9200"]
username: "elastic"
password: "changeme"
# Module: kibana
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.16/metricbeat-module-kibana.html
- module: kibana
xpack.enabled: true
period: 10s
hosts: ["http://localhost:5601"]
#basepath: ""
username: "elastic"
password: "changeme" Versions deployed: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1737912bed50 docker.elastic.co/beats/elastic-agent-complete:7.17.0-SNAPSHOT "/usr/bin/tini -- /u…" 2 hours ago Up 25 minutes (healthy) elastic-package-stack_elastic-agent_1
2e787f3a9475 docker.elastic.co/beats/elastic-agent-complete:7.17.0-SNAPSHOT "/usr/bin/tini -- /u…" 2 hours ago Up 25 minutes (healthy) 127.0.0.1:8220->8220/tcp elastic-package-stack_fleet-server_1
fc1157ca2e73 docker.elastic.co/kibana/kibana:7.17.0-SNAPSHOT "/bin/tini -- /usr/l…" 2 hours ago Up 25 minutes (healthy) 127.0.0.1:5601->5601/tcp elastic-package-stack_kibana_1
d49b643b7369 elastic-package-stack_package-registry "./package-registry …" 2 hours ago Up 25 minutes (healthy) 127.0.0.1:8080->8080/tcp elastic-package-stack_package-registry_1
d1298174fc36 docker.elastic.co/elasticsearch/elasticsearch:7.17.0-SNAPSHOT "/bin/tini -- /usr/l…" 2 hours ago Up 25 minutes (healthy) 127.0.0.1:9200->9200/tcp, 9300/tcp elastic-package-stack_elasticsearch_1 Cat response: sayden➜~» curl -XGET localhost:9200/_cat/indices?v=true -u "elastic:changeme" | grep monitoring
green open .monitoring-kibana-7-mb-2022.01.26 BzT8cZ2LTiClQvH-PZsw7A 1 0 60 0 144kb 144kb
green open .monitoring-es-7-mb-2022.01.26 _FZULYn4QICDaI9_pQrzGQ 1 0 4231 0 3.3mb 3.3mb Metricbeat version: sayden➜elastic/beats/metricbeat(7.17✗)» ./metricbeat version [13:58:27]
metricbeat version 7.17.0 (amd64), libbeat 7.17.0 [unknown built unknown] As you can see, those are default configs are they slightly differ from the ones you posted. If it's something in the config it's probably a problem in libbeat but not in SM and affecting all other modules, which is unlikely but definitely possible. |
Any chance someone could share the metricbeat logs for when the ingestion is failing. The indices are created when the first doc arrives, so if the index can't be created and data can't be ingested, this should be visible in the logs (30s stats log out are helpful). If the data cannot be collected, we should see this too in the logs. The full log file of the test suite for metricbeat that is failing would be very helpful. My initial guess was that it might be related to a permission change made in Elasticsearch. One happened for 8.0 but not 7.17 AFAIK. I'm mentioning this here in case someone has an idea on what could have changed in 7.17 related to permissions? |
I can share full logs (maybe not on GH though as it is quite a lot and contains sensitive bits?). Initially Elasticsearch does not seem to be available yet (but comes online a few moments later) and then logs contain variations of the last two messages from this excerpt repeated ad infinitum:
I can also share the metricbeat configuration if that helps. |
Thanks @pebrc, the metricbeat configuration would be useful as well, thanks. I undertand the important part of the error message that repeats itself is:
|
Yes The Metricbeat configuration is as follows:
|
++ I only see 4 commits between this first good beats commit and the second one where my test started failing. https://github.com/elastic/beats/commits/5d841312f81bb2a16e03a2feb7e2508718680aaa |
Ah... I think this is the same issue I opened up for cloud #30044 |
It seems like 7.17 will work initially but probably breaks after rollover, or possibly in the event metricbeat restarts? I'd like to focus on the filebeat issue for 8.0 so haven't dug in yet. The monitoring indices aren't expected to be aliases. They're daily-rotated raw indices for 7.x then data streams on 8. |
I have just submitted a PR: #30055 |
The PR reverting the change has been merged to 7.17. |
Please include configurations and logs if available.
For confirmed bugs, please report:
Monitoring worked on the 7.17.0-766e7ca2 SNAPSHOT build from Jan 18, 2022 10:02 AM
but fails on the 7.17.0-68da5d12 SNAPSHOT build from Jan 18, 2022 5:38 PM
and also fails on the 7.17.0-1bd53ff7 BC2 build.
I don't find anything of interest in the metricbeat log. I don't see anything about monitoring in the log with either the passing or failing tests.
All of the configuration is the same between the build that passes and the builds that fail. I've run the passing and failing builds locally with the same results as on Jenkins.
/etc/metricbeat/metricbeat.yml:
The
beats_internal
user in the config has;And
beats_reader
role is;And
beats_writer
role is;The text was updated successfully, but these errors were encountered: