-
Notifications
You must be signed in to change notification settings - Fork 414
Split ELK role into standalone Elasticsearch and Kibana roles #1481
Conversation
@ryane hello. Tried to install elasticsearch on a fresh mantl cluster. Scheduler went up and executors spawned well, but
stdout:
mantl-control-01:
|
It is failing to talk to consul to discover the elasticsearch cluster due to SSL issues. Are you using default Mantl security settings? |
@ryane no, i set up mantl with |
ok looks like you found an issue with the elasticsearch-client app running when consul SSL is turned off. I created CiscoCloud/mantl-universe#32 with a fix. If you want to test it, you can just update the Consul KVP at Meanwhile, I'll setup a clean test on my end. |
@ryane Is that by design |
no just the default. we should expose it in the configuration. what would you suggest as a better default? |
@ryane as long as this node will be actual search node for es cluster, i suggest it to have at least 1cpu and 1-2gb ram |
@ryane seems like Kibana addon need this SSL fix too
|
ok, working on these changes... |
- can set JAVA_OPTS for elasticsearch and elasticsearch-client - elasticsearch-client defaults to 1 cpu + 1 gb mem - fixes issue where install tasks might not run on correct node
kibana should work when ssl is disabled now. + java_opts are configurable. if you want to test on an existing cluster, you should be able to resync the repository (run from a control node): consul-cli kv-delete --recurse mantl-install
curl -XPOST http://localhost:18080/v2/apps/mantl-api/restart let me know if you see anything else that needs tweaking. |
@ryane looks good. Only "issue" that i see is with Kibana trying to display logstash-* index from elastic on load by default, but there isn't one, because i didn't installed logstash role. I think we should update Kibana role to load as default, and then move this pre-defined view load to logstash role. Is this possible? Thanks |
Yep, good call, it should be possible. will post a new commit when ready |
it is turned on when you install the elk addon but it is off you install the standalone kibana addon
this is now controlled by the |
@ryane, seems there is a feature\bug at kibana role. If you apply the role and then uninstall kibana the indexes added by kibana to elasticsearch are not flushed. Stumbled upon this during upgrade to last commit - "kibana logstash config is optional". Might be a good idea to add such option to uninstall to flush indexes at elasticsearch |
I've been playing with logstash.conf to support logging to s3 and more inputs. I noticed we are running logstash 1.5.3, pretty old. But it would be nice to get x-pack as an optional install as well. Still alpha, but it has most of features required for production. https://www.elastic.co/v5 Could we make this update forward looking to include easy version bumping and x-pack support? Also, i'm still getting kibana.mantl failures every hour or so on a 3 week old mantl deployment. I only have 3 workers so that might be a problem (4 are recommended?). And i believe destroying kibana in marathon may trigger the docker hangs i've experienced. Perhaps we could test that as well with the new refactor. |
@SergeyNosko I'd want to be careful about deleting indexes or any other data on uninstall. Perhaps we could make it optional and not the default. Or, at the very least, document the process. Do you mind opening a separate issue for it? @kbroughton we are going to be replacing the logstash agent on all nodes with filebeat. Another team is working on creating a logstash role (possibly a mesos framework) that can be used as a central place for processing before sending to Elasticsearch (see #1203). So, it's likely that you'll be able to do something like x-pack looks nice but it also looks like it might not be compatible with the version of elasticsearch and kibana we are currently deploying. can you create another issue for that? Finally, I wonder if your kibana instance is running out of memory. Are you using the default settings? Can you try increasing the amount of memory assigned to the application in marathon and see if you have better results? |
Test resultsTested on AWS with security enabled, four m3.xlarge worker nodes, three m3.large control nodes, one m3.medium edge node, and one m3.large kubeworker. Install an Elasticsearch cluster
Kibana
Uninstall Kibana
Uninstall Elasticsearch
ELK stack
Uninstall the full ELK stack
Install a custom Elasticsearch clusteransible-playbook -e @security.yml -e 'elasticsearch_nodes=4' addons/elasticsearch.yml In this example, we are launching 4 Elasticsearch data nodes via the Mesos framework. You can verify everything in the "Install an Elasticsearch cluster" section. The only difference is that there should be 4 ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml |
While almost everything worked, there is an itermittent problem with removing frameworks. The mantl-api logs look like this:
I'm not sure if we want to consider that a blocker for this PR, it seems like an issue in mantl-api that would affect current deployments just as often. |
fixes issue described in #1481 (comment)
fyi #1539 |
Kibana fails to launch on mantl master
not sure why |
see #1550. what error are you getting? |
@ryane no, i have a different one. It looked just like framework is dying to a timeout. Right now it works for me with 1cpu and 1024ram, but I'm sure we can crank it down a bit. |
also, @ryane i just saw that you're using docker container mode for elasticsearch framework. Again, please consider switching to non-docker framework mode. |
what do you see in the logs when the nodes fail? I have seen the same thing but only due to resource issues - a node tries to use more memory than we allotted for it in the framework and mesos stops it. also, can you open a new issue for this so that we can track it better? |
@ryane i've seen this so many times on SA cluster - no matter how many resources you specify for the node, it will eventually die to OOM killer. Its just a matter of time. I had a 9-node ES cluster running with 4cpus and 16GB ram each, and i get a failure every day or two on average. Non-docker mode, on the other hand, was stable for several month now. |
@ryane What is an endpoint on kibana side? Registered one at marathon looks like https://mantl-worker-003:31100 If I hit Kibana from Mantl GUI ( https://mantl-control-01/kibana) it works like a charm and proxies to kibana. I'm working on adding quicklinks to pangea and would like to understand where I can find an endpoint for Kibana GUI? |
@tymofii-polekhin @SergeyNosko Please make separate issues, rather than posting on this (closed) PR. |
fixes issue described in #1481 (comment)
fixes issue described in #1481 (comment)
This separates the ELK role into standalone Elasticsearch and Kibana roles. The ELK role is now just a meta role that includes the Elasticsearch, Kibana, and Logstash roles. This allows users to more easily deploy a standalone Elasticsearch cluster for purposes other than the standard Mantl log collection with the ELK stack. This can be used for the System Assurance Elasticsearch cluster.
Testing
Testing with the default configuration will require at least 4 worker nodes, each having at least 1 full CPU and 1 GB of memory available to Mesos. In addition, each worker node will need to have at least 5 GBs of free disk space.
Install an Elasticsearch cluster
After several minutes, you should see:
A healthy
mantl/elasticsearch
app in marathonA healthy
mantl/elasticsearch-client
app in marathonA running
elasticsearch.mantl
task running in Mesos. This is the Elasticsearch Mesos framework.3 running
elasticsearch-executor-mantl
tasks running in Mesos. These are the 3 Elasticsearch nodes running in your cluster.An
elasticsearch-client.mantl
task running in Mesos. This is an Elasticsearch client node that acts as a smart load balancer for the Elasticsearch cluster. It will be listening on well-known Elasticsearch ports 9200 (http) and 9300 (transport). You can verify the health of the Elasticsearch cluster by running a command like:The following healthy services registered in consul.
Consul can be used to discover the IPs and ports of the different services if needed. Otherwise,
elasticsearch-client-mantl.service.consul:9200
is available as a convenient entry point into the cluster.The Elasticsearch Mesos framework UI is available via Mantl UI (requires browser refresh).
Install Kibana
After several minutes, you should see:
mantl/kibana
app in marathonkibana.mantl
task running in Mesos. This is the Kibana Mesos framework.kibana-mantl.task
running in Mesos. This is the actual Kibana application running in Mesos.elasticsearch-client-mantl
. You may see an error in the Kibana UI since the Elasticsearch cluster does not contain any indexes.Uninstall Kibana
ansible-playbook -e @security.yml -e 'kibana_uninstall=true' addons/kibana.yml
After a few minutes, you should see that:
mantl/kibana
app is no longer running in marathon.kibana.mantl
andkibana.mantl.task
tasks should no longer be running in Mesos.kibana-mantl
andkibana-mantl-task
services should no longer be registered in Consul.Uninstall Elasticsearch
ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml
After a few minutes, you should see that:
The
mantl/elasticsearch
app is no longer running in marathon.The
mantl/elasticsearch-client
app is no longer running in marathon.The
elasticsearch.mantl
,elasticsearch-executor-mantl
, andelasticsearch-client.mantl
tasks should no longer be running in Mesos.The
elasticsearch-mantl
,elasticsearch-executor-mantl
, andelasticsearch-client-mantl
services should no longer be registered in Consul.The Elasticsearch Mesos framework UI should no longer be visible in Mantl UI (requires browser refresh).
This example includes
elasticsearch_remove_data=true
which will also remove the Elasticsearch data from every node. You can verify that the directory is removed with the following command:ansible all -s -m shell -a 'ls -al /var/lib/mesos/slave/elasticsearch/mantl'
You should get
No such file or directory
for every node. You can also test without elasticsearch_remove_data set (or set tofalse
) and those directories should still exist on a few of your worker nodes after the uninstall is complete.Install the full ELK stack
This is a meta role that installs the Elasticsearch, Kibana, and Logstash roles at one time. After several minutes, you should see that:
systemctl status logstash
locally on each node or with ansible)Uninstall the full ELK stack
ansible-playbook -e @security.yml -e 'elk_uninstall=true elasticsearch_remove_data=true' addons/elk.yml
After a few minutes, you should see:
Install a custom Elasticsearch cluster
ansible-playbook -e @security.yml -e 'elasticsearch_nodes=4' addons/elasticsearch.yml
In this example, we are launching 4 Elasticsearch data nodes via the Mesos framework. You can verify everything in the "Install an Elasticsearch cluster" section. The only difference is that there should be 4
elasticsearch-executor-mantl
tasks running in Mesos and visible in the Elasticsearch Mesos framework UI. View the Elasticsearch role documentation for all of the configuration variables. You can uninstall this cluster by running:ansible-playbook -e @security.yml -e 'elasticsearch_uninstall=true elasticsearch_remove_data=true' addons/elasticsearch.yml