Split ELK role into standalone Elasticsearch and Kibana roles #1481

ryane · 2016-05-24T21:02:52Z

Installs cleanly on a fresh build of most recent master branch
Upgrades cleanly from the most recent release
Updates documentation relevant to the changes

This separates the ELK role into standalone Elasticsearch and Kibana roles. The ELK role is now just a meta role that includes the Elasticsearch, Kibana, and Logstash roles. This allows users to more easily deploy a standalone Elasticsearch cluster for purposes other than the standard Mantl log collection with the ELK stack. This can be used for the System Assurance Elasticsearch cluster.

Testing

Testing with the default configuration will require at least 4 worker nodes, each having at least 1 full CPU and 1 GB of memory available to Mesos. In addition, each worker node will need to have at least 5 GBs of free disk space.

Install an Elasticsearch cluster

ansible-playbook -e @security.yml addons/elasticsearch.yml

After several minutes, you should see:

A healthy mantl/elasticsearch app in marathon
A healthy mantl/elasticsearch-client app in marathon
A running elasticsearch.mantl task running in Mesos. This is the Elasticsearch Mesos framework.
3 running elasticsearch-executor-mantl tasks running in Mesos. These are the 3 Elasticsearch nodes running in your cluster.

An elasticsearch-client.mantl task running in Mesos. This is an Elasticsearch client node that acts as a smart load balancer for the Elasticsearch cluster. It will be listening on well-known Elasticsearch ports 9200 (http) and 9300 (transport). You can verify the health of the Elasticsearch cluster by running a command like:

$ curl -s elasticsearch-client-mantl.service.consul:9200/_cluster/health | jq .
{
  "cluster_name": "mantl",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 4,
  "number_of_data_nodes": 3,
  "active_primary_shards": 5,
  "active_shards": 15,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

The following healthy services registered in consul.
- elasticsearch-mantl (the Elasticsearch Mesos framework)
- elasticsearch-executor-mantl (the Elasticsearch nodes launched by the Mesos framework)
  - each service will also have a client_port and a transport_port tag that can be used to discover the corresponding ports
- elasticsearch-client-mantl (the Elasticsearch client node)
Consul can be used to discover the IPs and ports of the different services if needed. Otherwise, elasticsearch-client-mantl.service.consul:9200 is available as a convenient entry point into the cluster.
The Elasticsearch Mesos framework UI is available via Mantl UI (requires browser refresh).

Install Kibana

ansible-playbook -e @security.yml addons/kibana.yml

After several minutes, you should see:

A healthy mantl/kibana app in marathon
A kibana.mantl task running in Mesos. This is the Kibana Mesos framework.
A kibana-mantl.task running in Mesos. This is the actual Kibana application running in Mesos.
The following healthy services registered in consul:
- kibana-mantl (the Kibana Mesos framework)
- kibana-mantl-task (the Kibana application)
The Kibana UI is available via Mantl UI (requires browser refresh). By default, Kibana connects to an Elasticsearch client node identified by consul service named elasticsearch-client-mantl. You may see an error in the Kibana UI since the Elasticsearch cluster does not contain any indexes.

Uninstall Kibana

ansible-playbook -e @security.yml -e 'kibana_uninstall=true' addons/kibana.yml

After a few minutes, you should see that:

The mantl/kibana app is no longer running in marathon.
The kibana.mantl and kibana.mantl.task tasks should no longer be running in Mesos.
The kibana-mantl and kibana-mantl-task services should no longer be registered in Consul.
The Kibana UI should no longer be visible in Mantl UI (requires browser refresh).

Uninstall Elasticsearch

ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml

After a few minutes, you should see that:

The mantl/elasticsearch app is no longer running in marathon.
The mantl/elasticsearch-client app is no longer running in marathon.
The elasticsearch.mantl, elasticsearch-executor-mantl, and elasticsearch-client.mantl tasks should no longer be running in Mesos.
The elasticsearch-mantl, elasticsearch-executor-mantl, and elasticsearch-client-mantl services should no longer be registered in Consul.
The Elasticsearch Mesos framework UI should no longer be visible in Mantl UI (requires browser refresh).
This example includes elasticsearch_remove_data=true which will also remove the Elasticsearch data from every node. You can verify that the directory is removed with the following command:
```
ansible all -s -m shell -a 'ls -al /var/lib/mesos/slave/elasticsearch/mantl'
```
You should get No such file or directory for every node. You can also test without elasticsearch_remove_data set (or set to false) and those directories should still exist on a few of your worker nodes after the uninstall is complete.

Install the full ELK stack

ansible-playbook -e @security.yml addons/elk.yml

This is a meta role that installs the Elasticsearch, Kibana, and Logstash roles at one time. After several minutes, you should see that:

An Elasticsearch search cluster is installed. See "Install an Elasticsearch cluster" for the Elasticsearch verification steps.
Kibana is installed. See "Install Kibana" for the Kibana verification steps.
Logstash should be running on every node (verify with systemctl status logstash locally on each node or with ansible)
When you visit the Kibana UI, you should see that it is receiving logs from each node.

Uninstall the full ELK stack

ansible-playbook -e @security.yml -e 'elk_uninstall=true elasticsearch_remove_data=true' addons/elk.yml

After a few minutes, you should see:

That everything included in the "Uninstall Kibana" and the "Uninstall Elasticsearch" sections were completed.

Install a custom Elasticsearch cluster

ansible-playbook -e @security.yml -e 'elasticsearch_nodes=4' addons/elasticsearch.yml

In this example, we are launching 4 Elasticsearch data nodes via the Mesos framework. You can verify everything in the "Install an Elasticsearch cluster" section. The only difference is that there should be 4 elasticsearch-executor-mantl tasks running in Mesos and visible in the Elasticsearch Mesos framework UI. View the Elasticsearch role documentation for all of the configuration variables. You can uninstall this cluster by running:

ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml

tpolekhin · 2016-05-25T10:55:05Z

@ryane hello. Tried to install elasticsearch on a fresh mantl cluster. Scheduler went up and executors spawned well, but elasticsearch-client-mantl fails every 3 minutes and proxy don't work.
stderr:

+ echo 'attempt: 1'
+ sleep 30
+ wait_for_service
+ /usr/local/bin/consul-template -config /consul-template/config.d -log-level warn -wait 30s:60s -once -consul consul.service.consul:8500 -ssl -ssl-verify=false
2016/05/25 10:47:06 [WARN] (runner) disabling consul SSL verification
2016/05/25 10:47:06 [ERR] (view) "service(transport_port.elasticsearch-executor-mantl [any])" health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-executor-mantl?stale=&tag=transport_port&wait=60000ms: http: server gave HTTP response to HTTPS client
2016/05/25 10:47:06 [ERR] (runner) watcher reported error: health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-executor-mantl?stale=&tag=transport_port&wait=60000ms: http: server gave HTTP response to HTTPS client
Consul Template returned errors:
health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-executor-mantl?stale=&tag=transport_port&wait=60000ms: http: server gave HTTP response to HTTPS client+ grep discovery.zen.ping.unicast.hosts /usr/share/elasticsearch/config/elasticsearch.yml
+ '[' 2 -eq 5 ']'
+ echo 'waiting for transport_port.elasticsearch-executor-mantl service...'
+ echo 'attempt: 2'

stdout:

--container="mesos-25aa81dc-6108-4478-8421-4ef831b7e24d-S2.3d2dde0b-6fbb-4d22-a7ad-97681ecdc447" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slaves/25aa81dc-6108-4478-8421-4ef831b7e24d-S2/frameworks/25aa81dc-6108-4478-8421-4ef831b7e24d-0000/executors/mantl_elasticsearch-client.e1de1ca5-2265-11e6-b612-0242ef758ce7/runs/3d2dde0b-6fbb-4d22-a7ad-97681ecdc447" --stop_timeout="0ns"
--container="mesos-25aa81dc-6108-4478-8421-4ef831b7e24d-S2.3d2dde0b-6fbb-4d22-a7ad-97681ecdc447" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/var/lib/mesos/slaves/25aa81dc-6108-4478-8421-4ef831b7e24d-S2/frameworks/25aa81dc-6108-4478-8421-4ef831b7e24d-0000/executors/mantl_elasticsearch-client.e1de1ca5-2265-11e6-b612-0242ef758ce7/runs/3d2dde0b-6fbb-4d22-a7ad-97681ecdc447" --stop_timeout="0ns"
Registered docker executor on mantl-worker-002
Starting task mantl_elasticsearch-client.e1de1ca5-2265-11e6-b612-0242ef758ce7
waiting for transport_port.elasticsearch-executor-mantl service...
attempt: 0
waiting for transport_port.elasticsearch-executor-mantl service...
attempt: 1
waiting for transport_port.elasticsearch-executor-mantl service...
attempt: 2
waiting for transport_port.elasticsearch-executor-mantl service...
attempt: 3
waiting for transport_port.elasticsearch-executor-mantl service...
attempt: 4
transport_port.elasticsearch-executor-mantl not found.

mantl-control-01:

[cloud-user@mantl-control-01 ~]$ curl 'http://consul.service.consul:8500/v1/catalog/service/elasticsearch-executor-mantl'
[{"Node":"mantl-worker-001","Address":"10.10.10.68","ServiceID":"mesos-consul:10.10.10.68:elasticsearch-executor-mantl:4000","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["CLIENT_PORT"],"ServiceAddress":"10.10.10.68","ServicePort":4000,"ServiceEnableTagOverride":false,"CreateIndex":1972,"ModifyIndex":1972},{"Node":"mantl-worker-001","Address":"10.10.10.68","ServiceID":"mesos-consul:10.10.10.68:elasticsearch-executor-mantl:4001","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["TRANSPORT_PORT"],"ServiceAddress":"10.10.10.68","ServicePort":4001,"ServiceEnableTagOverride":false,"CreateIndex":1973,"ModifyIndex":1973},{"Node":"mantl-worker-004","Address":"10.10.10.67","ServiceID":"mesos-consul:10.10.10.67:elasticsearch-executor-mantl:4000","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["CLIENT_PORT"],"ServiceAddress":"10.10.10.67","ServicePort":4000,"ServiceEnableTagOverride":false,"CreateIndex":1969,"ModifyIndex":1969},{"Node":"mantl-worker-004","Address":"10.10.10.67","ServiceID":"mesos-consul:10.10.10.67:elasticsearch-executor-mantl:4001","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["TRANSPORT_PORT"],"ServiceAddress":"10.10.10.67","ServicePort":4001,"ServiceEnableTagOverride":false,"CreateIndex":1970,"ModifyIndex":1970},{"Node":"mantl-worker-005","Address":"10.10.10.66","ServiceID":"mesos-consul:10.10.10.66:elasticsearch-executor-mantl:4000","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["CLIENT_PORT"],"ServiceAddress":"10.10.10.66","ServicePort":4000,"ServiceEnableTagOverride":false,"CreateIndex":1978,"ModifyIndex":1978},{"Node":"mantl-worker-005","Address":"10.10.10.66","ServiceID":"mesos-consul:10.10.10.66:elasticsearch-executor-mantl:4001","ServiceName":"elasticsearch-executor-mantl","ServiceTags":["TRANSPORT_PORT"],"ServiceAddress":"10.10.10.66","ServicePort":4001,"ServiceEnableTagOverride":false,"CreateIndex":1979,"ModifyIndex":1979}]

ryane · 2016-05-25T11:10:20Z

2016/05/25 10:47:06 [ERR] (view) "service(transport_port.elasticsearch-executor-mantl [any])" health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-executor-mantl?stale=&tag=transport_port&wait=60000ms: http: server gave HTTP response to HTTPS client
2016/05/25 10:47:06 [ERR] (runner) watcher reported error: health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-executor-mantl?stale=&tag=transport_port&wait=60000ms: http: server gave HTTP response to HTTPS client

It is failing to talk to consul to discover the elasticsearch cluster due to SSL issues. Are you using default Mantl security settings?

tpolekhin · 2016-05-25T12:37:04Z

@ryane no, i set up mantl with ./security-setup --enable=false

ryane · 2016-05-25T13:05:49Z

ok looks like you found an issue with the elasticsearch-client app running when consul SSL is turned off. I created CiscoCloud/mantl-universe#32 with a fix. If you want to test it, you can just update the Consul KVP at mantl-install/repository/0/repo/packages/E/elasticsearch-client/0/marathon.json with the contents of https://raw.githubusercontent.com/CiscoCloud/mantl-universe/c733e688509f2f0f7b7847c2dd40d90c9e3b09d1/repo/packages/E/elasticsearch-client/1/marathon.json, delete the mantl/elasticsearch-client app from marathon, and then re-run the addons/elasticsearch.yml playbook.

Meanwhile, I'll setup a clean test on my end.

tpolekhin · 2016-05-25T14:20:44Z

@ryane Is that by design -Xms256m -Xmx1g for 512mb container? :)

ryane · 2016-05-25T14:41:33Z

no just the default. we should expose it in the configuration. what would you suggest as a better default?

tpolekhin · 2016-05-25T14:43:05Z

@ryane as long as this node will be actual search node for es cluster, i suggest it to have at least 1cpu and 1-2gb ram

tpolekhin · 2016-05-25T14:46:36Z

@ryane seems like Kibana addon need this SSL fix too

+ echo 'attempt: 0'
+ sleep 10
Unable to launch health process: Only command health check is supported now.
+ wait_for_config
+ /usr/local/bin/consul-template -config /consul-template/config.d/kibana.cfg -log-level warn -wait 2s:10s -once -consul consul.service.consul:8500 -ssl -ssl-verify=false
2016/05/25 14:45:07 [WARN] (runner) disabling consul SSL verification
2016/05/25 14:45:07 [ERR] (view) "service(elasticsearch-client-mantl [any])" health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-client-mantl?stale=&wait=60000ms: http: server gave HTTP response to HTTPS client
2016/05/25 14:45:07 [ERR] (runner) watcher reported error: health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-client-mantl?stale=&wait=60000ms: http: server gave HTTP response to HTTPS client
Consul Template returned errors:
health services: error fetching: Get https://consul.service.consul:8500/v1/health/service/elasticsearch-client-mantl?stale=&wait=60000ms: http: server gave HTTP response to HTTPS client+ grep elasticsearch.url /opt/kibana/config/kibana.yml
+ '[' 1 -eq 6 ']'
+ echo 'waiting for Kibana configuration...'
+ cat /opt/kibana/config/kibana.yml
+ echo 'attempt: 1'

ryane · 2016-05-25T14:49:20Z

ok, working on these changes...

- can set JAVA_OPTS for elasticsearch and elasticsearch-client - elasticsearch-client defaults to 1 cpu + 1 gb mem - fixes issue where install tasks might not run on correct node

ryane · 2016-05-26T00:06:30Z

kibana should work when ssl is disabled now. + java_opts are configurable. if you want to test on an existing cluster, you should be able to resync the repository (run from a control node):

consul-cli kv-delete --recurse mantl-install
curl -XPOST http://localhost:18080/v2/apps/mantl-api/restart

let me know if you see anything else that needs tweaking.

tpolekhin · 2016-05-26T09:11:06Z

@ryane looks good. Only "issue" that i see is with Kibana trying to display logstash-* index from elastic on load by default, but there isn't one, because i didn't installed logstash role. I think we should update Kibana role to load as default, and then move this pre-defined view load to logstash role. Is this possible? Thanks

ryane · 2016-05-26T13:15:48Z

Yep, good call, it should be possible. will post a new commit when ready

it is turned on when you install the elk addon but it is off you install the standalone kibana addon

ryane · 2016-05-27T18:30:59Z

this is now controlled by the kibana_logstash_config variable. By default, the logstash config is not applied if you install the standalone kibana role but it does happen if you install the full elk addon.

SergeyNosko · 2016-06-02T15:02:36Z

@ryane, seems there is a feature\bug at kibana role. If you apply the role and then uninstall kibana the indexes added by kibana to elasticsearch are not flushed. Stumbled upon this during upgrade to last commit - "kibana logstash config is optional". Might be a good idea to add such option to uninstall to flush indexes at elasticsearch

kbroughton · 2016-06-02T15:26:20Z

I've been playing with logstash.conf to support logging to s3 and more inputs. I noticed we are running logstash 1.5.3, pretty old.

But it would be nice to get x-pack as an optional install as well. Still alpha, but it has most of features required for production. https://www.elastic.co/v5

Could we make this update forward looking to include easy version bumping and x-pack support?

Also, i'm still getting kibana.mantl failures every hour or so on a 3 week old mantl deployment. I only have 3 workers so that might be a problem (4 are recommended?). And i believe destroying kibana in marathon may trigger the docker hangs i've experienced. Perhaps we could test that as well with the new refactor.

ryane · 2016-06-02T17:16:09Z

@SergeyNosko I'd want to be careful about deleting indexes or any other data on uninstall. Perhaps we could make it optional and not the default. Or, at the very least, document the process. Do you mind opening a separate issue for it?

@kbroughton we are going to be replacing the logstash agent on all nodes with filebeat. Another team is working on creating a logstash role (possibly a mesos framework) that can be used as a central place for processing before sending to Elasticsearch (see #1203). So, it's likely that you'll be able to do something like filebeat -> elasticsearch or filebeat -> logstash -> elasticsearch depending on what you want, your scale, etc.

x-pack looks nice but it also looks like it might not be compatible with the version of elasticsearch and kibana we are currently deploying. can you create another issue for that?

Finally, I wonder if your kibana instance is running out of memory. Are you using the default settings? Can you try increasing the amount of memory assigned to the application in marathon and see if you have better results?

langston-barrett · 2016-06-07T13:15:22Z

Test results

Tested on AWS with security enabled, four m3.xlarge worker nodes, three m3.large control nodes, one m3.medium edge node, and one m3.large kubeworker.

Install an Elasticsearch cluster

Kibana

A healthy mantl/kibana app in marathon
A kibana.mantl task running in Mesos. This is the Kibana Mesos framework.
A kibana-mantl.task running in Mesos. This is the actual Kibana application running in Mesos.
The following healthy services registered in consul:
- kibana-mantl (the Kibana Mesos framework)
- kibana-mantl-task (the Kibana application)
The Kibana UI is available via Mantl UI (requires browser refresh). By default, Kibana connects to an Elasticsearch client node identified by consul service named elasticsearch-client-mantl. You may see an error in the Kibana UI since the Elasticsearch cluster does not contain any indexes.

Uninstall Kibana

The mantl/kibana app is no longer running in marathon.
The kibana.mantl and kibana.mantl.task tasks should no longer be running in Mesos.
The kibana-mantl and kibana-mantl-task services should no longer be registered in Consul.
The Kibana UI should no longer be visible in Mantl UI (requires browser refresh).

Uninstall Elasticsearch

The mantl/elasticsearch app is no longer running in marathon.
The mantl/elasticsearch-client app is no longer running in marathon.
The elasticsearch.mantl, elasticsearch-executor-mantl, and elasticsearch-client.mantl tasks should no longer be running in Mesos.
The elasticsearch-mantl, elasticsearch-executor-mantl, and elasticsearch-client-mantl services should no longer be registered in Consul.
The Elasticsearch Mesos framework UI should no longer be visible in Mantl UI (requires browser refresh).
This example includes elasticsearch_remove_data=true which will also remove the Elasticsearch data from every node.

ELK stack

An Elasticsearch search cluster is installed. See "Install an Elasticsearch cluster" for the Elasticsearch verification steps.
Kibana is installed. See "Install Kibana" for the Kibana verification steps.
Logstash should be running on every node (verify with systemctl status logstash locally on each node or with ansible)
When you visit the Kibana UI, you should see that it is receiving logs from each node.

Uninstall the full ELK stack

That everything included in the "Uninstall Kibana" and the "Uninstall Elasticsearch" sections were completed.

Install a custom Elasticsearch cluster

ansible-playbook -e @security.yml -e 'elasticsearch_nodes=4' addons/elasticsearch.yml

In this example, we are launching 4 Elasticsearch data nodes via the Mesos framework. You can verify everything in the "Install an Elasticsearch cluster" section. The only difference is that there should be 4 elasticsearch-executor-mantl tasks running in Mesos and visible in the Elasticsearch Mesos framework UI. View the Elasticsearch role documentation for all of the configuration variables. You can uninstall this cluster by running:

ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml

langston-barrett · 2016-06-07T14:37:32Z

While almost everything worked, there is an itermittent problem with removing frameworks. The mantl-api logs look like this:

time="2016-06-07T14:09:40Z" level=debug msg="DELETE /1/install" 
time="2016-06-07T14:09:40Z" level=debug msg="GET https://marathon.service.consul:8080/v2/apps/" 
time="2016-06-07T14:09:40Z" level=debug msg="DELETE https://marathon.service.consul:8080/v2/apps/mantl/elasticsearch-client" 
time="2016-06-07T14:09:40Z" level=debug msg="DELETE /1/install" 
time="2016-06-07T14:09:40Z" level=debug msg="GET https://marathon.service.consul:8080/v2/apps/" 
time="2016-06-07T14:09:40Z" level=debug msg="DELETE https://marathon.service.consul:8080/v2/apps/mantl/elasticsearch" 
time="2016-06-07T14:09:40Z" level=debug msg="Looking for mantl/elasticsearch framework" 
time="2016-06-07T14:09:40Z" level=debug msg="GET http://lb0-control-02:15050/master/state.json" 
time="2016-06-07T14:09:40Z" level=debug msg="Framework mantl/elasticsearch not active"

See CiscoCloud/mantl-api#46.

I'm not sure if we want to consider that a blocker for this PR, it seems like an issue in mantl-api that would affect current deployments just as often.

fixes issue described in #1481 (comment)

crumley · 2016-06-14T16:39:36Z

fyi #1539

tpolekhin · 2016-06-15T14:16:54Z

Kibana fails to launch on mantl master

+ exec java -Xms32m -Xmx128m -jar /tmp/mesosframework.jar --spring.application.name=kibana-mantl --mesos.framework.name=kibana-mantl --mesos.master=zk://sa20-control-01:2181,sa20-control-02:2181,sa20-control-03:2181,sa20-control-04:2181,sa20-control-05:2181/mesos --mesos.zookeeper.server=sa20-control-01:2181,sa20-control-02:2181,sa20-control-03:2181,sa20-control-04:2181,sa20-control-05:2181 --mesos.resources.cpus=0.50 --mesos.resources.mem=512 --mesos.resources.count=1 --mesos.resources.ports.UI_5601.host=ANY --mesos.resources.ports.UI_5601.container=5601 --mesos.docker.image=ciscocloud/mantl-kibana:4.3.2.1 --mesos.docker.network=BRIDGE '--mesos.command=export ELASTICSEARCH_SERVICE=elasticsearch-client-mantl; export KIBANA_SERVICE=kibana-mantl-task; export KIBANA_LOGSTASH_CONFIG=false; tini -s -- /launch.sh' --logging.level.com.containersolutions.mesos=WARN --elasticsearch.http=http://elasticsearch-executor.service.consul:4000 --server.port=31100

not sure why --elasticsearch.http=http://elasticsearch-executor.service.consul:4000
shouldn't it be like --elasticsearch.http=http://elasticsearch-client-mantl.service.consul:9200 ?

ryane · 2016-06-15T15:46:47Z

see #1550. what error are you getting?

tpolekhin · 2016-06-16T08:19:04Z

@ryane no, i have a different one. It looked just like framework is dying to a timeout.
I tried to increase resources on both scheduler and executor and it helped!
So i suggest to review the default resources for Kibana framework.

Right now it works for me with 1cpu and 1024ram, but I'm sure we can crank it down a bit.

tpolekhin · 2016-06-16T08:44:14Z

also, @ryane i just saw that you're using docker container mode for elasticsearch framework.
Im heavily suggesting switching to non-docker mode for elasticsearch framework.
I've been using ES framework for over half a year now, and its not stable at all in docker mode.
BTW, i noticed that just because 2 of my nodes died after only 10 hours uptime, with 72msg/sec load

Again, please consider switching to non-docker framework mode.
2 of my SA clusters run non-docker ES framework with more than 2 month uptime

tpolekhin · 2016-06-16T12:20:25Z

FYI, another 2 instances of ES just failed.
So we have a total of 4 instances out of 5 failed in 13 hours :)

ryane · 2016-06-17T16:30:22Z

what do you see in the logs when the nodes fail? I have seen the same thing but only due to resource issues - a node tries to use more memory than we allotted for it in the framework and mesos stops it. also, can you open a new issue for this so that we can track it better?

tpolekhin · 2016-06-20T07:55:28Z

@ryane i've seen this so many times on SA cluster - no matter how many resources you specify for the node, it will eventually die to OOM killer. Its just a matter of time. I had a 9-node ES cluster running with 4cpus and 16GB ram each, and i get a failure every day or two on average.

Non-docker mode, on the other hand, was stable for several month now.

SergeyNosko · 2016-06-20T09:14:47Z

@ryane What is an endpoint on kibana side? Registered one at marathon looks like https://mantl-worker-003:31100
1st - seems it's not httpS it's http
2nd - it gives me 404
$ curl http://mantl-worker-003:31100 {"timestamp":1466412718912,"status":404,"error":"Not Found","message":"No message available","path":"/"}

If I hit Kibana from Mantl GUI ( https://mantl-control-01/kibana) it works like a charm and proxies to kibana. I'm working on adding quicklinks to pangea and would like to understand where I can find an endpoint for Kibana GUI?

langston-barrett · 2016-06-20T09:20:14Z

@tymofii-polekhin @SergeyNosko Please make separate issues, rather than posting on this (closed) PR.

fixes issue described in #1481 (comment)

ryane added 3 commits May 24, 2016 12:15

elk: rename elk role to elasticsearch

4d17eb9

elk: split role into standalone es / kibana roles

a7b9511

elk: elasticsearch + kibana readme notes

e70d062

ryane added enhancement addons/elk labels May 24, 2016

ryane added this to the 1.2 milestone May 24, 2016

ryane added the has pull request label May 24, 2016

elk: fix kibana no ssl + more config opts

83f0a70

- can set JAVA_OPTS for elasticsearch and elasticsearch-client - elasticsearch-client defaults to 1 cpu + 1 gb mem - fixes issue where install tasks might not run on correct node

elk: kibana logstash config is optional

1c4d6d1

it is turned on when you install the elk addon but it is off you install the standalone kibana addon

langston-barrett merged commit dd6267d into master Jun 7, 2016

langston-barrett deleted the feature/split-elk-role branch June 7, 2016 15:29

metahertz removed the has pull request label Jun 7, 2016

ryane added a commit that referenced this pull request Jun 9, 2016

mantl-api: support custom configuration + v0.2.2

6586619

fixes issue described in #1481 (comment)

ryane mentioned this pull request Jun 9, 2016

mantl-api: support custom configuration + v0.2.2 #1531

Merged

3 tasks

ryane mentioned this pull request Jun 17, 2016

elk: increase default resources #1569

Merged

3 tasks

ryane added a commit that referenced this pull request Jun 20, 2016

mantl-api: support custom configuration + v0.2.2

10f4599

fixes issue described in #1481 (comment)

langston-barrett pushed a commit that referenced this pull request Jun 22, 2016

mantl-api: support custom configuration + v0.2.2 (#1531)

eace45f

fixes issue described in #1481 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split ELK role into standalone Elasticsearch and Kibana roles #1481

Split ELK role into standalone Elasticsearch and Kibana roles #1481

ryane commented May 24, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

ryane commented May 26, 2016

tpolekhin commented May 26, 2016

ryane commented May 26, 2016

ryane commented May 27, 2016

SergeyNosko commented Jun 2, 2016

kbroughton commented Jun 2, 2016

ryane commented Jun 2, 2016

langston-barrett commented Jun 7, 2016 •

edited

Loading

langston-barrett commented Jun 7, 2016 •

edited

Loading

crumley commented Jun 14, 2016

tpolekhin commented Jun 15, 2016 •

edited

Loading

ryane commented Jun 15, 2016

tpolekhin commented Jun 16, 2016

tpolekhin commented Jun 16, 2016

tpolekhin commented Jun 16, 2016

ryane commented Jun 17, 2016

tpolekhin commented Jun 20, 2016

SergeyNosko commented Jun 20, 2016

langston-barrett commented Jun 20, 2016

Split ELK role into standalone Elasticsearch and Kibana roles #1481

Split ELK role into standalone Elasticsearch and Kibana roles #1481

Conversation

ryane commented May 24, 2016

Testing

Install an Elasticsearch cluster

Install Kibana

Uninstall Kibana

Uninstall Elasticsearch

Install the full ELK stack

Uninstall the full ELK stack

Install a custom Elasticsearch cluster

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

tpolekhin commented May 25, 2016

tpolekhin commented May 25, 2016

ryane commented May 25, 2016

ryane commented May 26, 2016

tpolekhin commented May 26, 2016

ryane commented May 26, 2016

ryane commented May 27, 2016

SergeyNosko commented Jun 2, 2016

kbroughton commented Jun 2, 2016

ryane commented Jun 2, 2016

langston-barrett commented Jun 7, 2016 • edited Loading

Test results

Install an Elasticsearch cluster

Kibana

Uninstall Kibana

Uninstall Elasticsearch

ELK stack

Uninstall the full ELK stack

Install a custom Elasticsearch cluster

langston-barrett commented Jun 7, 2016 • edited Loading

crumley commented Jun 14, 2016

tpolekhin commented Jun 15, 2016 • edited Loading

ryane commented Jun 15, 2016

tpolekhin commented Jun 16, 2016

tpolekhin commented Jun 16, 2016

tpolekhin commented Jun 16, 2016

ryane commented Jun 17, 2016

tpolekhin commented Jun 20, 2016

SergeyNosko commented Jun 20, 2016

langston-barrett commented Jun 20, 2016

langston-barrett commented Jun 7, 2016 •

edited

Loading

langston-barrett commented Jun 7, 2016 •

edited

Loading

tpolekhin commented Jun 15, 2016 •

edited

Loading