Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricbeat monitoring of Elasticsearch and Kibana broke on recent 7.17.0-SNAPSHOT rpm #29920

Closed
LeeDr opened this issue Jan 19, 2022 · 25 comments · Fixed by #30058
Closed

metricbeat monitoring of Elasticsearch and Kibana broke on recent 7.17.0-SNAPSHOT rpm #29920

LeeDr opened this issue Jan 19, 2022 · 25 comments · Fixed by #30058
Assignees
Labels
blocker bug Metricbeat Metricbeat regression Team:Integrations Label for the Integrations team

Comments

@LeeDr
Copy link

LeeDr commented Jan 19, 2022

Please include configurations and logs if available.

For confirmed bugs, please report:

  • Version: 7.17.0-SNAPSHOT
  • Operating System: CentOS 7
  • Discuss Forum URL:
  • Steps to Reproduce:
  1. Install metricbeat and enable modules for monitoring Elasticsearch, Kibana, Logstash, and Beats

Monitoring worked on the 7.17.0-766e7ca2 SNAPSHOT build from Jan 18, 2022 10:02 AM
but fails on the 7.17.0-68da5d12 SNAPSHOT build from Jan 18, 2022 5:38 PM
and also fails on the 7.17.0-1bd53ff7 BC2 build.

I don't find anything of interest in the metricbeat log. I don't see anything about monitoring in the log with either the passing or failing tests.

All of the configuration is the same between the build that passes and the builds that fail. I've run the passing and failing builds locally with the same results as on Jenkins.

/etc/metricbeat/metricbeat.yml:

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.period: 10s
  reload.enabled: false
metricbeat.max_start_delay: 10s
metricbeat.modules:
- module: system
  enabled: true
  period: 10s
  metricsets:
    - cpu
    - load
    - memory
    - network
    - process
    - process_summary
  processes: ['.*']
- module: system
  enabled: true
  period: 1m
  metricsets:
    - filesystem
    - fsstat
- module: kibana
  metricsets: ["status"]
  enabled: false
  period: 10s
  hosts: ["localhost:5601"]
- module: elasticsearch
  metricsets: ["node", "node_stats"]
  enabled: false
  period: 10s
  hosts: ["localhost:9200"]


output.elasticsearch:
  hosts: ["localhost:9200"]
  protocol: https
  username: "beats_internal"
  password: "changeme"
  ssl.enabled: true
  ssl.certificate_authorities: ["/etc/metricbeat/ca.crt"]
  ssl.certificate: /etc/metricbeat/elasticsearch.crt
  ssl.key: /etc/metricbeat/elasticsearch.key

#setup.dashboards.snapshot: true

setup.kibana:
  protocol: https
  host: localhost:5601
  username: "beats_internal"
  password: "changeme"
  ssl.enabled: true
  ssl.certificate_authorities: ["/etc/metricbeat/ca.crt"]
  ssl.certificate: /etc/metricbeat/elasticsearch.crt
  ssl.key: /etc/metricbeat/elasticsearch.key

xpack.monitoring:
  enabled: false
  elasticsearch:
    url: "https://localhost:9200"
    username: elastic
    password: changeit
http.enabled: true
http.port: 5068
[root@localhost vagrant]# ls -l /etc/metricbeat/modules.d/ | grep -v disabled
total 272
-rw-r--r-- 1 root root  295 Jan 19 22:31 beat-xpack.yml
-rw-r--r-- 1 root root  456 Jan 19 22:31 elasticsearch-xpack.yml
-rw-r--r-- 1 root root  461 Jan 19 22:31 kibana-xpack.yml
-rw-r--r-- 1 root root  251 Jan 19 22:31 logstash-xpack.yml
-rw-r--r-- 1 root root  956 Jan 19 14:31 system.yml

The beats_internal user in the config has;

curl -k https://elastic:changeit@localhost:9200/_security/user/beats_internal
{"beats_internal":{"username":"beats_internal","roles":["beats_writer","beats_reader","kibana_admin","machine_learning_admin","remote_monitoring_agent"],"full_name":"beats_internal","email":"[email protected]","metadata":{},"enabled":true}}

And beats_reader role is;

[root@localhost vagrant]# curl -k https://elastic:changeit@localhost:9200/_security/role/beats_reader
{"beats_reader":{"cluster":[],"indices":[{"names":["*beat-*"],"privileges":["read","view_index_metadata"],"allow_restricted_indices":false}],"applications":[],"run_as":[],"metadata":{},"transient_metadata":{"enabled":true}}}

And beats_writer role is;

[root@localhost vagrant]# curl -k https://elastic:changeit@localhost:9200/_security/role/beats_writer
{"beats_writer":{"cluster":["manage_index_templates","monitor","manage_ilm"],"indices":[{"names":["filebeat-*","metricbeat-*","heartbeat-*","packetbeat-*","auditbeat-*","winlogbeat-*",".kibana*"],"privileges":["write","delete","create_index","manage","manage_ilm"],"allow_restricted_indices":false}],"applications":[],"run_as":[],"metadata":{},"transient_metadata":{"enabled":true}}}
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 19, 2022
@LeeDr
Copy link
Author

LeeDr commented Jan 19, 2022

The same test passed on 8.0.0-rc2-c0b0e70d which was built today and also on the latest 8.1.0 snapshot build.

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

I'm just guessing here but I'm thinking maybe this PR shouldn't have been backported to 7.17? #29869

But in both the passing and failing cases I can see the legacy templates for monitoring are there.

UPDATE: Scratch this, that PR was only docs.

@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label Jan 20, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 20, 2022
@ChrsMark
Copy link
Member

cc: @sayden

@sayden
Copy link
Contributor

sayden commented Jan 20, 2022

Those configs show elasticsearch and kibana as enabled: false. Usually you just want to add xpack.enabled: true like this:

  - module: elasticsearch
    xpack.enabled: true
    period: 10s
    hosts: ["http://localhost:9200"] 

More info https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

I changed those false's to true and restarted metricbeat but still no .monitoring-es index is created.

Is there a way to enable debug logging? I'm surprised there's nothing in the metricbeat log about which modules it's using.

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

I'm back on the earlier 7.17.0-766e7ca2 build now again. Those values in metricbeat.yml are false;

- module: kibana
  metricsets: ["status"]
  enabled: false
  period: 10s
  hosts: ["localhost:5601"]
- module: elasticsearch
  metricsets: ["node", "node_stats"]
  enabled: false
  period: 10s
  hosts: ["localhost:9200"]

But I do have monitoring indices created. Notice I have an index for logstash and beats monitoring but there isn't even a section in the metricbeat.yml for it. Those work based on the module file being not disabled.

green  open .monitoring-beats-7-mb-2022.01.20    i5J1UjhJRY-4dEn23G-lIw 1 0     8    0  98.2kb  98.2kb
green  open .monitoring-logstash-7-mb-2022.01.20 gUqEMkSKQvKEPQuQfl9rzA 1 0    45    0 114.6kb 114.6kb
green  open .monitoring-kibana-7-mb-2022.01.20   SLszlLApQZG80p0pynSArw 1 0    36    0 121.6kb 121.6kb
green  open .monitoring-es-7-mb-2022.01.20       9QEcQ8CEQM68TB3LrG1xgg 1 0   751    0     1mb     1mb

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

FYI @marius-dr who set some of this up for the Kibana-QA team.

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

Those configs show elasticsearch and kibana as enabled: false. Usually you just want to add xpack.enabled: true like this:

  - module: elasticsearch
    xpack.enabled: true
    period: 10s
    hosts: ["http://localhost:9200"] 

More info https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-metricbeat.html

I was a little confused by this too. But what I see now is that the section in /etc/metricbeat/metricbeat.yml looks a lot like the section in /etc/metricbeat/modules.d/elasticsearch-xpack.yml. In the metricbeat.yml that xpack.enabled: is false, but in the elasticsearch-xpack.yml it's true.

@sayden
Copy link
Contributor

sayden commented Jan 20, 2022

It is a bit confusing, without doubt. More in the case of Stack Monitoring, my apologies in advance. The .monitoring templates for monitoring are installed exceptionally by elasticsearch itself. The most important metric to check is the number of docs of those indices.

Generally speaking you want to maintain "cluster config" in metricbeat.yml: elasticsearch credentials, kibana, security, etc. and modules config on the files you can find in modules.d: elasticsearch, logstash and kibana modules ("module" with the same meaning of mysql module for example) because one of them will override the other (I think metricbeat.yml overrides everything in modules.d).

Maybe give it a try setting a barebones module configuration for elasticsearch in the elasticsearch-xpack.yml file on modules.d and removing everything related to elasticsearch module from metricbeat.yml. If that doesn't work, please share the config you have on the active files in modules.d.

Thanks! 🙂

@LeeDr
Copy link
Author

LeeDr commented Jan 20, 2022

When it fails, the templates are there but no indices are created. I figured out how to turn debug logging on in metricbeat so I'll see if I get more info from that and update here.

@pebrc
Copy link

pebrc commented Jan 21, 2022

I think the ECK operator e2e tests are also seeing this issue https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-snapshot-versions/498//testReport with similar symptoms: the monitoring indices are not created.

  "version" : {
    "number" : "7.17.0-SNAPSHOT",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2ae2bdd5f08e95fb247fb85b002786da15304a0b",
    "build_date" : "2022-01-18T17:43:19.069887443Z",
    "build_snapshot" : true,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },

This is using the stack monitoring feature in ECK https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stack-monitoring.html

@LeeDr
Copy link
Author

LeeDr commented Jan 24, 2022

Similar to my case I see that @pebrc had a passing result on Jan 18th and failure on Jan 19th and all jobs since then.

@LeeDr LeeDr added the blocker label Jan 25, 2022
@sayden
Copy link
Contributor

sayden commented Jan 26, 2022

I think this is related to the ECK operator, not to Stack Monitoring. I have just done a full test and it works:

metricbeat.yml

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml

  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 1
  index.codec: best_compression

setup.kibana:

output.elasticsearch:
  hosts: ["localhost:9200"]

  username: "elastic"
  password: "changeme"

elasticsearch-xpack.yml

# Module: elasticsearch
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.16/metricbeat-module-elasticsearch.html

- module: elasticsearch
  xpack.enabled: true
  period: 10s
  hosts: ["http://localhost:9200"]
  username: "elastic"
  password: "changeme"

kibana-xpack.yml

# Module: kibana
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/7.16/metricbeat-module-kibana.html

- module: kibana
  xpack.enabled: true
  period: 10s
  hosts: ["http://localhost:5601"]
  #basepath: ""
  username: "elastic"
  password: "changeme"

Versions deployed:

CONTAINER ID   IMAGE                                                            COMMAND                  CREATED       STATUS                    PORTS                                NAMES
1737912bed50   docker.elastic.co/beats/elastic-agent-complete:7.17.0-SNAPSHOT   "/usr/bin/tini -- /u…"   2 hours ago   Up 25 minutes (healthy)                                        elastic-package-stack_elastic-agent_1
2e787f3a9475   docker.elastic.co/beats/elastic-agent-complete:7.17.0-SNAPSHOT   "/usr/bin/tini -- /u…"   2 hours ago   Up 25 minutes (healthy)   127.0.0.1:8220->8220/tcp             elastic-package-stack_fleet-server_1
fc1157ca2e73   docker.elastic.co/kibana/kibana:7.17.0-SNAPSHOT                  "/bin/tini -- /usr/l…"   2 hours ago   Up 25 minutes (healthy)   127.0.0.1:5601->5601/tcp             elastic-package-stack_kibana_1
d49b643b7369   elastic-package-stack_package-registry                           "./package-registry …"   2 hours ago   Up 25 minutes (healthy)   127.0.0.1:8080->8080/tcp             elastic-package-stack_package-registry_1
d1298174fc36   docker.elastic.co/elasticsearch/elasticsearch:7.17.0-SNAPSHOT    "/bin/tini -- /usr/l…"   2 hours ago   Up 25 minutes (healthy)   127.0.0.1:9200->9200/tcp, 9300/tcp   elastic-package-stack_elasticsearch_1

Cat response:

sayden➜~» curl -XGET localhost:9200/_cat/indices?v=true -u "elastic:changeme"  | grep monitoring          
                                                                          
green  open   .monitoring-kibana-7-mb-2022.01.26                                BzT8cZ2LTiClQvH-PZsw7A   1   0         60            0      144kb          144kb
green  open   .monitoring-es-7-mb-2022.01.26                                    _FZULYn4QICDaI9_pQrzGQ   1   0       4231            0      3.3mb          3.3mb

Metricbeat version:

sayden➜elastic/beats/metricbeat(7.17✗)» ./metricbeat version                                                                                                                                                                                                                                                       [13:58:27]
metricbeat version 7.17.0 (amd64), libbeat 7.17.0 [unknown built unknown]

As you can see, those are default configs are they slightly differ from the ones you posted. If it's something in the config it's probably a problem in libbeat but not in SM and affecting all other modules, which is unlikely but definitely possible.

@ruflin
Copy link
Contributor

ruflin commented Jan 26, 2022

Any chance someone could share the metricbeat logs for when the ingestion is failing. The indices are created when the first doc arrives, so if the index can't be created and data can't be ingested, this should be visible in the logs (30s stats log out are helpful). If the data cannot be collected, we should see this too in the logs. The full log file of the test suite for metricbeat that is failing would be very helpful.

My initial guess was that it might be related to a permission change made in Elasticsearch. One happened for 8.0 but not 7.17 AFAIK. I'm mentioning this here in case someone has an idea on what could have changed in 7.17 related to permissions?

@pebrc
Copy link

pebrc commented Jan 26, 2022

I can share full logs (maybe not on GH though as it is quite a lot and contains sensitive bits?). Initially Elasticsearch does not seem to be available yet (but comes online a few moments later) and then logs contain variations of the last two messages from this excerpt repeated ad infinitum:

2022-01-26T13:27:11.404Z        INFO    instance/beat.go:686    Home path: [/usr/share/metricbeat] Config path: [/usr/share/metricbeat] Data path: [/usr/s
hare/metricbeat/data] Logs path: [/usr/share/metricbeat/logs] Hostfs Path: [/]
2022-01-26T13:27:11.408Z        INFO    instance/beat.go:694    Beat ID: b8f63808-ac58-41f2-8b3b-568f5db7e48a
2022-01-26T13:27:11.411Z        WARN    [add_cloud_metadata]    add_cloud_metadata/provider_aws_ec2.go:95       error when check request status for gettin
g IMDSv2 token: http request status 405. No token in the metadata request will be used.
2022-01-26T13:27:11.413Z        INFO    [seccomp]       seccomp/seccomp.go:124  Syscall filter successfully installed
2022-01-26T13:27:11.413Z        INFO    [beat]  instance/beat.go:1040   Beat info       {"system_info": {"beat": {"path": {"config": "/usr/share/metricbea
t", "data": "/usr/share/metricbeat/data", "home": "/usr/share/metricbeat", "logs": "/usr/share/metricbeat/logs"}, "type": "metricbeat", "uuid": "b8f63808-
ac58-41f2-8b3b-568f5db7e48a"}}}
2022-01-26T13:27:11.413Z        INFO    [beat]  instance/beat.go:1049   Build info      {"system_info": {"build": {"commit": "b55f1ceac49d295c6050c02e336b
56911f580abd", "libbeat": "7.17.0", "time": "2022-01-23T01:51:29.000Z", "version": "7.17.0"}}}
2022-01-26T13:27:11.413Z        INFO    [beat]  instance/beat.go:1052   Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":8,
"version":"go1.17.5"}}}
2022-01-26T13:27:11.414Z        INFO    [beat]  instance/beat.go:1056   Host info       {"system_info": {"host": {"architecture":"x86_64","boot_time":"202
2-01-04T15:48:06Z","containerized":true,"name":"test-es-mon-a-fftl-es-masterdata-0","ip":["127.0.0.1/8","10.73.50.171/24"],"kernel_version":"5.4.150+","ma
c":["d2:b5:d2:f3:3a:af"],"os":{"type":"linux","family":"debian","platform":"ubuntu","name":"Ubuntu","version":"20.04.3 LTS (Focal Fossa)","major":20,"mino
r":4,"patch":3,"codename":"focal"},"timezone":"UTC","timezone_offset_sec":0}}}
2022-01-26T13:27:11.414Z        INFO    [beat]  instance/beat.go:1085   Process info    {"system_info": {"process": {"capabilities": {"inheritable":["chow
n","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot","mknod","audit_write","setfcap"],"permitt
ed":null,"effective":null,"bounding":["chown","dac_override","fowner","fsetid","kill","setgid","setuid","setpcap","net_bind_service","net_raw","sys_chroot
","mknod","audit_write","setfcap"],"ambient":null}, "cwd": "/usr/share/metricbeat", "exe": "/usr/share/metricbeat/metricbeat", "name": "metricbeat", "pid"
: 7, "ppid": 1, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2022-01-26T13:27:10.570Z"}}}
2022-01-26T13:27:11.414Z        INFO    instance/beat.go:328    Setup Beat: metricbeat; Version: 7.17.0
2022-01-26T13:27:11.414Z        INFO    [index-management]      idxmgmt/std.go:184      Set output.elasticsearch.index to 'metricbeat-7.17.0' as ILM is en
abled.
2022-01-26T13:27:11.415Z        WARN    [cfgwarn]       tlscommon/config.go:100 DEPRECATED: Treating the CommonName field on X.509 certificates as a host 
name when no Subject Alternative Names are present is going to be removed. Please update your certificates if needed. Will be removed in version: 8.0.0
2022-01-26T13:27:11.415Z        INFO    [esclientleg]   eslegclient/connection.go:102   elasticsearch url: https://test-es-mon-metrics-p2rd-es-http.e2e-me
rcury.svc:9200
2022-01-26T13:27:11.415Z        INFO    [publisher]     pipeline/module.go:113  Beat name: test-es-mon-a-fftl-es-masterdata-0
2022-01-26T13:27:11.428Z        INFO    [add_cloud_metadata]    add_cloud_metadata/add_cloud_metadata.go:105    add_cloud_metadata: hosting provider type 
detected as gcp, metadata={"cloud":{"account":{"id":"..."},"availability_zone":"europe-west1-c","instance":{"id":"2933194355478483660","name
":"gke-pebrc-dev-cluster-default-pool-12721212-l8gk"},"machine":{"type":"n1-standard-8"},"project":{"id":"elastic-cloud-dev"},"provider":"gcp","service":{
"name":"GCE"}},"orchestrator":{"cluster":{"name":"pebrc-dev-cluster","url":"https://...."}}}
2022-01-26T13:27:11.447Z        INFO    [monitoring]    log/log.go:142  Starting metrics logging every 30s
2022-01-26T13:27:11.449Z        INFO    instance/beat.go:492    metricbeat start running.
2022-01-26T13:27:13.583Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.shard: error determining if connected Elas
ticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:13.788Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.index_summary: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp [::1]:9200: connect: cannot assign requested address
2022-01-26T13:27:14.584Z        INFO    [publisher_pipeline_output]     pipeline/output.go:143  Connecting to backoff(elasticsearch(https://test-es-mon-metrics-p2rd-es-http.e2e-mercury.svc:9200))
2022-01-26T13:27:14.584Z        INFO    [publisher]     pipeline/retry.go:219   retryer: send unwait signal to consumer
2022-01-26T13:27:14.584Z        INFO    [publisher]     pipeline/retry.go:223     done
2022-01-26T13:27:14.712Z        INFO    [esclientleg]   eslegclient/connection.go:282   Attempting to connect to Elasticsearch version 7.17.0-SNAPSHOT
2022-01-26T13:27:14.794Z        INFO    [esclientleg]   eslegclient/connection.go:282   Attempting to connect to Elasticsearch version 7.17.0-SNAPSHOT
2022-01-26T13:27:14.821Z        INFO    [index-management]      idxmgmt/std.go:261      Auto ILM enable success.
2022-01-26T13:27:14.885Z        INFO    [index-management.ilm]  ilm/std.go:180  ILM policy metricbeat successfully created.
2022-01-26T13:27:14.885Z        INFO    [index-management]      idxmgmt/std.go:401      Set setup.template.name to '{metricbeat-7.17.0 {now/d}-000001}' as
 ILM is enabled.
2022-01-26T13:27:14.885Z        INFO    [index-management]      idxmgmt/std.go:406      Set setup.template.pattern to 'metricbeat-7.17.0-*' as ILM is enab
led.
2022-01-26T13:27:14.885Z        INFO    [index-management]      idxmgmt/std.go:440      Set settings.index.lifecycle.rollover_alias in template to {metric
beat-7.17.0 {now/d}-000001} as ILM is enabled.
2022-01-26T13:27:14.885Z        INFO    [index-management]      idxmgmt/std.go:444      Set settings.index.lifecycle.name in template to {metricbeat {"pol
icy":{"phases":{"hot":{"actions":{"rollover":{"max_age":"30d","max_size":"50gb"}}}}}}} as ILM is enabled.
2022-01-26T13:27:14.892Z        INFO    template/load.go:197    Existing template will be overwritten, as overwrite is enabled.
2022-01-26T13:27:15.321Z        INFO    template/load.go:131    Try loading template metricbeat-7.17.0 to Elasticsearch
2022-01-26T13:27:15.544Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.ml_job: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:15.644Z        INFO    template/load.go:123    Template with name "metricbeat-7.17.0" loaded.
2022-01-26T13:27:15.644Z        INFO    [index-management]      idxmgmt/std.go:297      Loaded index template.
2022-01-26T13:27:15.884Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.index: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:16.302Z        INFO    [index-management.ilm]  ilm/std.go:140  Index Alias metricbeat-7.17.0 successfully created.
2022-01-26T13:27:16.304Z        INFO    [publisher_pipeline_output]     pipeline/output.go:151  Connection to backoff(elasticsearch(https://test-es-mon-metrics-p2rd-es-http.e2e-mercury.svc:9200)) established
2022-01-26T13:27:17.078Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.index_recovery: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:18.327Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.enrich: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:18.546Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.cluster_stats: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp [::1]:9200: connect: cannot assign requested address
2022-01-26T13:27:18.604Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.ccr: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp [::1]:9200: connect: cannot assign requested address
2022-01-26T13:27:20.691Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.node_stats: error making http request: Get "https://localhost:9200/_nodes/_local/stats": dial tcp [::1]:9200: connect: cannot assign requested address
2022-01-26T13:27:23.584Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.shard: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:23.790Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.index_summary: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:25.544Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.ml_job: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:25.886Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.index: error determining if connected Elasticsearch node is master: error making http request: Get "https://localhost:9200/_nodes/_local/nodes": dial tcp 127.0.0.1:9200: connect: connection refused
2022-01-26T13:27:28.443Z        WARN    [elasticsearch] elasticsearch/client.go:422     Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.January, 26, 13, 27, 27, 79486756, time.Local), Meta:{"index":".monitoring-es-7-mb"}, Fields:{"agent":{"ephemeral_id":"623de1bf-2611-45a5-86de-dab0355950e7","hostname":"test-es-mon-a-fftl-es-masterdata-0","id":"b8f63808-ac58-41f2-8b3b-568f5db7e48a","name":"test-es-mon-a-fftl-es-masterdata-0","type":"metricbeat","version":"7.17.0"},"cloud":{"account":{"id":"...."},"availability_zone":"europe-west1-c","instance":{"id":"2933194355478483660","name":"gke-pebrc-dev-cluster-default-pool-12721212-l8gk"},"machine":{"type":"n1-standard-8"},"project":{"id":"elastic-cloud-dev"},"provider":"gcp","service":{"name":"GCE"}},"cluster_uuid":"xW1xX2dKQXuIOMK3scivoA","ecs":{"version":"1.12.0"},"event":{"dataset":"elasticsearch.index.recovery","duration":354978007,"module":"elasticsearch"},"host":{"architecture":"x86_64","containerized":true,"hostname":"test-es-mon-a-fftl-es-masterdata-0","ip":["10.73.50.171"],"mac":["d2:b5:d2:f3:3a:af"],"name":"test-es-mon-a-fftl-es-masterdata-0","os":{"codename":"focal","family":"debian","kernel":"5.4.150+","name":"Ubuntu","platform":"ubuntu","type":"linux","version":"20.04.3 LTS (Focal Fossa)"}},"index_recovery":{"shards":[{"id":0,"index":{"files":{"percent":"100.0%","recovered":0,"reused":16,"total":16},"size":{"percent":"100.0%","recovered_from_snapshot_in_bytes":0,"recovered_in_bytes":0,"reused_in_bytes":42290519,"total_in_bytes":42290519},"source_throttle_time_in_millis":0,"target_throttle_time_in_millis":0,"total_time_in_millis":4},"index_name":".geoip_databases","primary":true,"source":{"bootstrap_new_history_uuid":false},"stage":"DONE","start_time_in_millis":1643203646750,"stop_time_in_millis":1643203647213,"target":{"host":"10.73.50.171","id":"bH8-2OueRnWIKg2Ve0awYg","ip":"10.73.50.171","name":"test-es-mon-a-fftl-es-masterdata-0","transport_address":"10.73.50.171:9300"},"total_time_in_millis":463,"translog":{"percent":"100.0%","recovered":0,"total":0,"total_on_start":0,"total_time_in_millis":419},"type":"EXISTING_STORE","verify_index":{"check_index_time_in_millis":0,"total_time_in_millis":0}}]},"interval_ms":10000,"metricset":{"name":"index_recovery","period":10000},"orchestrator":{"cluster":{"name":"pebrc-dev-cluster","url":"https://...."}},"service":{"address":"https://localhost:9200","type":"elasticsearch"},"timestamp":"2022-01-26T13:27:27.434Z","type":"index_recovery"}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=404): {"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.26] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.26] is not an alias","index_uuid":"_na_","index":".monitoring-es-7-mb-2022.01.26"}, dropping event!
2022-01-26T13:27:28.443Z        WARN    [elasticsearch] elasticsearch/client.go:422     Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(2022, time.January, 26, 13, 27, 28, 328201889, time.Local), Meta:{"index":".monitoring-es-7-mb"}, Fields:{"agent":{"ephemeral_id":"623de1bf-2611-45a5-86de-dab0355950e7","hostname":"test-es-mon-a-fftl-es-masterdata-0","id":"b8f63808-ac58-41f2-8b3b-568f5db7e48a","name":"test-es-mon-a-fftl-es-masterdata-0","type":"metricbeat","version":"7.17.0"},"cloud":{"account":{"id":"...."},"availability_zone":"europe-west1-c","instance":{"id":"2933194355478483660","name":"gke-pebrc-dev-cluster-default-pool-12721212-l8gk"},"machine":{"type":"n1-standard-8"},"project":{"id":"elastic-cloud-dev"},"provider":"gcp","service":{"name":"GCE"}},"cluster_uuid":"xW1xX2dKQXuIOMK3scivoA","ecs":{"version":"1.12.0"},"enrich_coordinator_stats":{"executed_searches_total":0,"node_id":"bH8-2OueRnWIKg2Ve0awYg","queue_size":0,"remote_requests_current":0,"remote_requests_total":0},"event":{"dataset":"elasticsearch.enrich","duration":102546677,"module":"elasticsearch"},"host":{"architecture":"x86_64","containerized":true,"hostname":"test-es-mon-a-fftl-es-masterdata-0","ip":["10.73.50.171"],"mac":["d2:b5:d2:f3:3a:af"],"name":"test-es-mon-a-fftl-es-masterdata-0","os":{"codename":"focal","family":"debian","kernel":"5.4.150+","name":"Ubuntu","platform":"ubuntu","type":"linux","version":"20.04.3 LTS (Focal Fossa)"}},"interval_ms":10000,"metricset":{"name":"enrich","period":10000},"orchestrator":{"cluster":{"name":"pebrc-dev-cluster","url":"https://....."}},"service":{"address":"https://localhost:9200","type":"elasticsearch"},"timestamp":"2022-01-26T13:27:28.430Z","type":"enrich_coordinator_stats"}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=404): {"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.26] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.26] is not an alias","index_uuid":"_na_","index":".monitoring-es-7-mb-2022.01.26"}, dropping event!

I can also share the metricbeat configuration if that helps.

@andresrc
Copy link
Contributor

Thanks @pebrc, the metricbeat configuration would be useful as well, thanks.

I undertand the important part of the error message that repeats itself is:

{"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.26] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.26] is not an alias"

@pebrc
Copy link

pebrc commented Jan 26, 2022

I undertand the important part of the error message that repeats itself is:

{"type":"index_not_found_exception","reason":"no such index [.monitoring-es-7-mb-2022.01.26] and [require_alias] request flag is [true] and [.monitoring-es-7-mb-2022.01.26] is not an alias"

Yes

The Metricbeat configuration is as follows:

metricbeat:
  modules:
  - hosts:
    - https://localhost:9200
    metricsets:
    - ccr
    - cluster_stats
    - enrich
    - index
    - index_recovery
    - index_summary
    - ml_job
    - node_stats
    - pending_tasks
    - shard
    module: elasticsearch
    password: ----
    period: 10s
    ssl:
      certificate_authorities:
      - /mnt/elastic-internal/kb-monitoring/e2e-mercury/test-es-mon-a-fftl/certs/ca.crt
      verification_mode: certificate
    username: elastic-internal-monitoring
    xpack:
      enabled: true
output:
  elasticsearch:
    hosts:
    - https://test-es-mon-metrics-p2rd-es-http.e2e-mercury.svc:9200
    password: ----
    ssl:
      certificate_authorities:
      - /mnt/elastic-internal/es-monitoring-association/e2e-mercury/test-es-mon-metrics-p2rd/certs/ca.crt
    username: e2e-mercury-test-es-mon-a-fftl-e2e-mercury-test-es-mon-metrics-p2rd-beat-es-mon-user
processors:
- add_cloud_metadata: null
- add_host_metadata: null

@ruflin
Copy link
Contributor

ruflin commented Jan 26, 2022

This seems to come from #29879 @kvch

@kvch kvch self-assigned this Jan 26, 2022
@LeeDr
Copy link
Author

LeeDr commented Jan 26, 2022

++ I only see 4 commits between this first good beats commit and the second one where my test started failing.
beats 4c4400e
beats 5d84131

https://github.com/elastic/beats/commits/5d841312f81bb2a16e03a2feb7e2508718680aaa

@matschaffer
Copy link
Contributor

matschaffer commented Jan 27, 2022

Ah... I think this is the same issue I opened up for cloud #30044

@matschaffer
Copy link
Contributor

It seems like 7.17 will work initially but probably breaks after rollover, or possibly in the event metricbeat restarts?

I'd like to focus on the filebeat issue for 8.0 so haven't dug in yet.

The monitoring indices aren't expected to be aliases. They're daily-rotated raw indices for 7.x then data streams on 8.

@Mpdreamz
Copy link
Member

@ruflin @kvch do you have an update on this issue? Do we have a handle on the root cause and possible fix?

@kvch
Copy link
Contributor

kvch commented Jan 27, 2022

I have just submitted a PR: #30055

@kvch
Copy link
Contributor

kvch commented Jan 27, 2022

The PR reverting the change has been merged to 7.17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Metricbeat Metricbeat regression Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants