elasticsearch plugin: add a tag for node role #2158

animageofmine · 2016-12-14T01:58:53Z

Bug report

We have a cluster of 9 nodes in elasticsearch. 5 data nodes, 3 master and 1 client. We use KairosDB for storing telegraf data and Grafana for graphs. One of the problems we are facing is to group metrics by role (master, client or data). However, it looks like each node in elasticsearch cluster returns payload for whole cluster. For example, if I want to monitor JVM Heap (mem_heap_used_in_bytes) for only data nodes, I can't seem to find a way to do that because each node returns JVM Heap for all the nodes that includes data, master and client nodes (because each node is cluster aware via Zen Discovery).

Not sure if I am doing anything wrong here or my understanding is incorrect, but I wanted to check if there is way to deal with this problem (I really hope I am doing something silly). Please see telegraf.conf below

Relevant telegraf.conf:

[agent]
  hostname = "<OneoftheNodesInESCluster"
  interval = "30s"
  round_interval = true
  metric_buffer_limit = 1000
  flush_buffer_when_full = true
  collection_jitter = "1s"
  flush_interval = "30s"
  flush_jitter = "5s"
  debug = false
  quiet = false

OUTPUTS:
[[outputs.opentsdb]]
  debug = false
  host = <somekairosdbhost>
  port = 4244
  prefix = "telegraf."

INPUTS:
[[inputs.docker]]
  interval = "2m"
  timeout = "30s"
[[inputs.elasticsearch]]
  cluster_health = true
  servers = ["http://localhost:9200"]
[[inputs.statsd]]
  allowed_pending_messages = 10000

System info:

Linux elasticsearchNodeData1 3.10.0-327.36.1.el7.x86_64 #1 SMP Sun Sep 18 13:04:29 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Telegraf version: Telegraf - version 1.0.0-beta3
All the nodes are dockerized with debian based build.

Steps to reproduce:

Master has 8GB with 4 cores
Client node has 4GB with 2 cores
Data has 32GB with 16 cores

If we fetch some metric, say "mem_heap_used_in_bytes", this seems to fetch data from all the node types (data, client and master). Can't seem to find a way to isolate stats from each role.

There is something in config called "local=true", not sure what is it for".

Please let me know if you have any questions or need more info. Thank you.

The text was updated successfully, but these errors were encountered:

sparrc · 2016-12-14T10:03:43Z

If you set local = true then I believe that the plugin will only collect stats on the host specified, rather than on the entire cluster.

This plugin doesn't tag metrics by role. Unfortunately elasticsearch doesn't make the node role available via their cluster API. From what I can tell we might be able to get this info via a "cat nodes" query

animageofmine · 2016-12-14T16:45:12Z

Thank you for looking into this. Actually the cluster/node API does expose the role. See example below (look for roles).

BTW, would turning on local flag use a different API or the same?

"uZ8dyuLbQnG-ljlT35RQgA": {
         "timestamp": 1481673501955,
         "name": "es-datanode1",
         "transport_address": "10.10.10.26:9300",
         "host": "10.10.10.26",
         "ip": "10.10.10.26:9300",
         **"roles": [
            "data",
            "ingest"
         ],**
         "indices": {
            "docs": {
               "count": 2601120,
               "deleted": 0
            },
            "store": {
               "size_in_bytes": 3983729207,
               "throttle_time_in_millis": 0
            },

sparrc · 2016-12-14T17:05:19Z

@animageofmine, that is just a blob of JSON.....where did it come from? which API? can you provide a full request/response example?

animageofmine · 2016-12-14T18:08:41Z

@sparrc Sure. Following is the information

Query: curl localhost:9200/_nodes/_local
Payload: I just executed on my local box since the payload from the cluster was really large to paste.

Let me know if you need more info. BTW, I can't seem to find a metric that reports cluster health status (green, yellow, red). Any idea?

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "ZOwb1f4DTVCQbuQpVu1jrw": {
      "name": "elk4node01",
      "transport_address": "10.2.240.172:9300",
      "host": "10.2.240.172",
      "ip": "10.2.240.172",
      "version": "5.0.1",
      "build_hash": "080bb47",
      "total_indexing_buffer": 426010214,
      "roles": [
        "master",
        "data",
        "ingest"
      ],
      "settings": {
        "pidfile": "/var/run/elasticsearch/elasticsearch.pid",
        "cluster": {
          "name": "elasticsearch"
        },
        "node": {
          "name": "elk4node01"
        },
        "path": {
          "conf": "/etc/elasticsearch",
          "data": [
            "/var/lib/elasticsearch"
          ],
          "logs": "/var/log/elasticsearch",
          "home": "/usr/share/elasticsearch"
        },
        "client": {
          "type": "node"
        },
        "http": {
          "type": {
            "default": "netty4"
          }
        },
        "transport": {
          "type": {
            "default": "netty4"
          }
        },
        "network": {
          "host": "0.0.0.0",
          "publish_host": "10.2.240.172"
        }
      },
      "os": {
        "refresh_interval_in_millis": 1000,
        "name": "Linux",
        "arch": "amd64",
        "version": "4.4.27-moby",
        "available_processors": 4,
        "allocated_processors": 4
      },
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 45,
        "mlockall": false
      },
      "jvm": {
        "pid": 45,
        "version": "1.8.0_111",
        "vm_name": "OpenJDK 64-Bit Server VM",
        "vm_version": "25.111-b14",
        "vm_vendor": "Oracle Corporation",
        "start_time_in_millis": 1481701191724,
        "mem": {
          "heap_init_in_bytes": 4294967296,
          "heap_max_in_bytes": 4260102144,
          "non_heap_init_in_bytes": 2555904,
          "non_heap_max_in_bytes": 0,
          "direct_max_in_bytes": 4260102144
        },
        "gc_collectors": [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools": [
          "Code Cache",
          "Metaspace",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers": "true"
      },
      "thread_pool": {
        "force_merge": {
          "type": "fixed",
          "min": 1,
          "max": 1,
          "queue_size": -1
        },
        "fetch_shard_started": {
          "type": "scaling",
          "min": 1,
          "max": 8,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "listener": {
          "type": "fixed",
          "min": 2,
          "max": 2,
          "queue_size": -1
        },
        "index": {
          "type": "fixed",
          "min": 4,
          "max": 4,
          "queue_size": 200
        },
        "refresh": {
          "type": "scaling",
          "min": 1,
          "max": 2,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "generic": {
          "type": "scaling",
          "min": 4,
          "max": 128,
          "keep_alive": "30s",
          "queue_size": -1
        },
        "warmer": {
          "type": "scaling",
          "min": 1,
          "max": 2,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "search": {
          "type": "fixed",
          "min": 7,
          "max": 7,
          "queue_size": 1000
        },
        "flush": {
          "type": "scaling",
          "min": 1,
          "max": 2,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "fetch_shard_store": {
          "type": "scaling",
          "min": 1,
          "max": 8,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "management": {
          "type": "scaling",
          "min": 1,
          "max": 5,
          "keep_alive": "5m",
          "queue_size": -1
        },
        "get": {
          "type": "fixed",
          "min": 4,
          "max": 4,
          "queue_size": 1000
        },
        "bulk": {
          "type": "fixed",
          "min": 4,
          "max": 4,
          "queue_size": 50
        },
        "snapshot": {
          "type": "scaling",
          "min": 1,
          "max": 2,
          "keep_alive": "5m",
          "queue_size": -1
        }
      },
      "transport": {
        "bound_address": [
          "[::]:9300"
        ],
        "publish_address": "10.2.240.172:9300",
        "profiles": {}
      },
      "http": {
        "bound_address": [
          "[::]:9200"
        ],
        "publish_address": "10.2.240.172:9200",
        "max_content_length_in_bytes": 104857600
      },
      "plugins": [
        {
          "name": "repository-s3",
          "version": "5.0.1",
          "description": "The S3 repository plugin adds S3 repositories",
          "classname": "org.elasticsearch.plugin.repository.s3.S3RepositoryPlugin"
        }
      ],
      "modules": [
        {
          "name": "aggs-matrix-stats",
          "version": "5.0.1",
          "description": "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
          "classname": "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin"
        },
        {
          "name": "ingest-common",
          "version": "5.0.1",
          "description": "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
          "classname": "org.elasticsearch.ingest.common.IngestCommonPlugin"
        },
        {
          "name": "lang-expression",
          "version": "5.0.1",
          "description": "Lucene expressions integration for Elasticsearch",
          "classname": "org.elasticsearch.script.expression.ExpressionPlugin"
        },
        {
          "name": "lang-groovy",
          "version": "5.0.1",
          "description": "Groovy scripting integration for Elasticsearch",
          "classname": "org.elasticsearch.script.groovy.GroovyPlugin"
        },
        {
          "name": "lang-mustache",
          "version": "5.0.1",
          "description": "Mustache scripting integration for Elasticsearch",
          "classname": "org.elasticsearch.script.mustache.MustachePlugin"
        },
        {
          "name": "lang-painless",
          "version": "5.0.1",
          "description": "An easy, safe and fast scripting language for Elasticsearch",
          "classname": "org.elasticsearch.painless.PainlessPlugin"
        },
        {
          "name": "percolator",
          "version": "5.0.1",
          "description": "Percolator module adds capability to index queries and query these queries by specifying documents",
          "classname": "org.elasticsearch.percolator.PercolatorPlugin"
        },
        {
          "name": "reindex",
          "version": "5.0.1",
          "description": "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
          "classname": "org.elasticsearch.index.reindex.ReindexPlugin"
        },
        {
          "name": "transport-netty3",
          "version": "5.0.1",
          "description": "Netty 3 based transport implementation",
          "classname": "org.elasticsearch.transport.Netty3Plugin"
        },
        {
          "name": "transport-netty4",
          "version": "5.0.1",
          "description": "Netty 4 based transport implementation",
          "classname": "org.elasticsearch.transport.Netty4Plugin"
        }
      ],
      "ingest": {
        "processors": [
          {
            "type": "append"
          },
          {
            "type": "convert"
          },
          {
            "type": "date"
          },
          {
            "type": "date_index_name"
          },
          {
            "type": "dot_expander"
          },
          {
            "type": "fail"
          },
          {
            "type": "foreach"
          },
          {
            "type": "grok"
          },
          {
            "type": "gsub"
          },
          {
            "type": "join"
          },
          {
            "type": "json"
          },
          {
            "type": "lowercase"
          },
          {
            "type": "remove"
          },
          {
            "type": "rename"
          },
          {
            "type": "script"
          },
          {
            "type": "set"
          },
          {
            "type": "sort"
          },
          {
            "type": "split"
          },
          {
            "type": "trim"
          },
          {
            "type": "uppercase"
          }
        ]
      }
    }
  }
}

Akshaykapoor · 2017-01-13T04:58:24Z

+1
It'll be good to have node_roles value in tags.

sybrandy · 2017-01-18T19:30:31Z

The /_nodes/stats endpoint has the roles as well. It gets the information for all of the nodes in the cluster.

MatthewOHaraTR · 2017-01-18T20:07:38Z

@sparrc Part of the problem is that the parser was set up to ignore strings and only process numeric data as metrics. In my recent merge that you accepted, I added the capability for the plugin to get the string data too (I needed it in the new API calls I was making). But I kept the node stats unchanged to avoid changing the behavior for anyone else. Maybe another plugin option could be used to control this behavior?

sparrc · 2017-01-21T22:59:18Z

we don't need to add a config option to add a tag to the metrics

eesprit · 2018-03-28T08:01:36Z

Node role as a tag would be really useful, what is actually blocking for adding it ?
Need a PR ?

danielnelson · 2018-03-28T18:10:17Z

@eesprit Yes, I think we just need a PR

cyberaa · 2018-05-14T10:08:51Z

+1
Would be a great addition to have, since roles are more common now.

This adds node_roles as a tag to the exported elasticsearch metrics. For example: node_roles=master\,data\

dupondje · 2019-07-02T08:04:24Z

Pushed a possible fix in the commit above (tests not adjusted yet).
Is it this kind of output we want? Cause a node can have multiple roles, so we need a tag with multiple values here.

I don't know if we want an option to include/exclude them? (And what default do we use)?

danielnelson · 2019-07-02T19:08:23Z

Is it this kind of output we want? Cause a node can have multiple roles, so we need a tag with multiple values here.

I don't like it, but I think this is our only/best option. Would be really nice if we could send multiple values for a tag. My only suggestion is to sort the roles in the list so they will be in a stable order.

sparrc changed the title ~~Group metrics by node role~~ elasticsearch plugin: add a tag for node role Dec 14, 2016

sparrc added this to the Future Milestone milestone Dec 14, 2016

danielnelson added the area/elasticsearch label May 8, 2017

danielnelson removed this from the Future Milestone milestone Jun 14, 2017

dupondje pushed a commit to dupondje/telegraf that referenced this issue Jul 2, 2019

Add node_roles as tag (Closes influxdata#2158)

ae16792

This adds node_roles as a tag to the exported elasticsearch metrics. For example: node_roles=master\,data\

dupondje mentioned this issue Jul 3, 2019

Elasticsearch: add node_roles tag #6064

Merged

3 tasks

danielnelson closed this as completed in #6064 Jul 3, 2019

danielnelson added this to the 1.12.0 milestone Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elasticsearch plugin: add a tag for node role #2158

elasticsearch plugin: add a tag for node role #2158

animageofmine commented Dec 14, 2016 •

edited by sparrc

Loading

sparrc commented Dec 14, 2016

animageofmine commented Dec 14, 2016 •

edited

Loading

sparrc commented Dec 14, 2016

animageofmine commented Dec 14, 2016 •

edited by sparrc

Loading

Akshaykapoor commented Jan 13, 2017

sybrandy commented Jan 18, 2017

MatthewOHaraTR commented Jan 18, 2017

sparrc commented Jan 21, 2017

eesprit commented Mar 28, 2018

danielnelson commented Mar 28, 2018

cyberaa commented May 14, 2018 •

edited

Loading

dupondje commented Jul 2, 2019

danielnelson commented Jul 2, 2019

elasticsearch plugin: add a tag for node role #2158

elasticsearch plugin: add a tag for node role #2158

Comments

animageofmine commented Dec 14, 2016 • edited by sparrc Loading

Bug report

Relevant telegraf.conf:

System info:

Steps to reproduce:

sparrc commented Dec 14, 2016

animageofmine commented Dec 14, 2016 • edited Loading

sparrc commented Dec 14, 2016

animageofmine commented Dec 14, 2016 • edited by sparrc Loading

Akshaykapoor commented Jan 13, 2017

sybrandy commented Jan 18, 2017

MatthewOHaraTR commented Jan 18, 2017

sparrc commented Jan 21, 2017

eesprit commented Mar 28, 2018

danielnelson commented Mar 28, 2018

cyberaa commented May 14, 2018 • edited Loading

dupondje commented Jul 2, 2019

danielnelson commented Jul 2, 2019

animageofmine commented Dec 14, 2016 •

edited by sparrc

Loading

animageofmine commented Dec 14, 2016 •

edited

Loading

animageofmine commented Dec 14, 2016 •

edited by sparrc

Loading

cyberaa commented May 14, 2018 •

edited

Loading