-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES java very high CPU usage #4288
Comments
Hey there, checking out your hot threads output, the last third of the output is the important part, as you can clearly see, that a lot of queries are eating up your CPU. Do you run some complex queries including fuzzy support. Is your system acting different if you do not query it at all? And last question to get a better overview: What is your setup? number of nodes, etc? Looks like the cluster state might not be up to date on that node, where the client is trying to rejoin (due to load).. but this is just a wild assumption, which is not backed by facts. |
I agree with @spinscale this seems very much like you shooting some queries against elasticsearch and they cause load? Can you identify if that is abnormal? |
I suspected the same so I dropped all traffic to ES and the load came down right away and then enabled the traffic again load jumped up. My users query ES via kibana interface. I am not familiar with fuzzy query concept. ES is totally new for me. I am running just one node, two shards. I am to add another node to ES. Any suggestions as to how I should profile my ES setup to order new hardware? |
Hey, there is a slow search log mechanism in elasticsearch. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-slowlog.html#search-slow-log Furthermore you should talk to the kibana users and see what kind of queries they execute (almost every kibana widget can reveal the query being executed). Also make sure, that they are not reloading their data in kibana too often, maybe that already helps a bit (without getting to the cause of the problem, maybe it is just one query, you would have to try out the queries one by one). There is no possibility to give any hardware recommendations without knowing data, configuration, mapping, documents, query load, index load etc... Adding more nodes and replicated the same data to them most likely will allow to at least spread the load. |
I have same problem with high cpu usage. Here some tips: index.number_of_shards: 1 For 1 index with 185k docs my cpu load is 2.5-5% for ES java process Also plugins makes HUGE performance reduce. I think that yml config is the solution. P.S Also i have server (Ubuntu 12.04, openjdk 7, 2 xeon cores for 2.4Ghz and 2Gb ram) with default config, with plugins (marvel and analysis-morphology) which works fine for 1 500 000 docs on index. |
Don't forget that if you installed Marvel to delete all indexes after you removed, since it doesn't do that. |
@gentunian did you have the errors while the marvel UI was open in a browser, or also without. Would be great to get the out put of http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html#cluster-nodes-hot-threads to see what the node is doing. |
@bleskes on both cases I ran out of heap space. I could see in
I had a lot of shards, maybe more than 500 (logstash defaults to 5 shards for every index and I have more than 100). In the marvel stats, the marvel indices were on top of each column (index rate, search query, etc. Sorry if the names are wrong, I'm typing this from memory). All this in QA. Tomorrow after finding out the CPU usage, I'm planning a production deployment. |
@gentunian I can think of a couple of things that may be going on. Maybe open a new issue with your details, as your primary concern is memory and not CPU? |
Hi @ACV2 This issue is closed. I suggest you follow the advice given above (ie check the hot threads output, enable the slow query log), and see what happens at the time of day when your cluster goes wild. If the answer isn't immediately obvious, I'd suggest asking about it in the forum: discuss.elastic.co |
@AVC2 Did you learn anything since your post about this? We're seeing more or less, the same problem. Sometimes daily, sometimes only one a week (it's that random) one of our ES hosts decides to go crazy. We usually restart the instance when we see this because it affects our end users response times but it's starting to be a problem. In our situation, these hosts aren't idle, they're trucking along with 25-30% CPU and then BAM, 70-100%. Sometimes they correct themselves after 24-48 hours, other times they don't. I was just curious is you learned anything that could help point us in the right direction. I'll see about checking the hot threads output and slowlogs. |
Hey there, @travis. Actually it was a journey... Elastic Search is everything but human friendly, after this post where I actually checked every single configuration trying to optimize the performance to avoid this behavior, with not success... I reduce the query time from 12 sec to 2-3 in a period of 10-15 minutes still huge and unacceptable, however we are working to optimize our back end to stabilize the performance, that being said.. My findings: -merge process Can you post your server configurations so I can help you a little bit more :) Regards
|
Hi @ACV2 Thanks for the reply. We noticed this was usually tied to when we ran some big re-indexes, when the indexes were merging lots. During the re-indexes, we still have a full request load. At the end of the re-index I added a task to run an optimize the index and low and behold, we haven't seen this problem crop up in the past ~12 days. Forcing an optimize to fix the runaway CPU problem doesn't feel like an actual fix but it does alleviate the issue for us. ES seems to get stuck in some kind of a weird state after these re-indexes and the optimize seems to kick it out of that state. |
I had a same issue, high cpu and high memory, for me it was marvel creating a new big index everyday. |
Yes, the daily optimize reduced our CPU in a similar amount (~60% ➞ ~10%). We still run our daily optimize task and haven't seen this issue since enabling it at the beginning of June. |
have someone tried to upgrade to ES 2.+ ? does it fix this cpu issue? |
Hi @Alino, for us the problem was actually the fact we were using G1GC. As soon as we switched to CMS the problem went away (our ES cluster has been up for 51 days now!) What I mentioned in my last post ^ was still true, it greatly reduced the times this happened but the runaway CPU problems continued to happen every ~7 days or so. There's a long and interesting thread of some other users and I discussing this on the Elastic discussion board here: https://discuss.elastic.co/t/indexing-performance-degrading-over-time/40229 |
@travisbell Hi Travis, I've been seeing very similar patters. Everything runs perfectly for a full week, then all of a sudden we get really high CPU usage, and elsaticsearch is unresponsive. Most operations fail. Can you expand a bit on what you did to fix this issue? How do you switch from G1GC to CMS? I have a feeling that this is caused by a node running out of java heap space. But I would imagine that would lead to a crash, not 100% CPU and I don't see OutOfMemory errors in the logs, so I'm not sure. |
I see the following in the logs, right before I get 100% CPU, which is why I suspect it's an out of memory issue.
(formatted for easier reading) |
Not quite, but very close.
This indicates a completely ineffective garbage collection cycle lasting for over a minute. The JVM is consuming CPU trying to collect when collecting is ineffective. You'll likely hit GC overhead limits soon at which point you will see It's best to open a new post on the Elastic Discourse forum. |
@jasontedor I've started a discussion here. Any help on the matter would be appreciated. |
@ACV2 what tool are you using for the visual monitoring data? |
@ACV2 this issue also happen to me when search
Btw, I am using ElasticSearch Service in AWS |
i also meet the problem,and the _nodes/hot_threads command return as flowing: 62.4% (312ms out of 500ms) cpu usage by thread 'elasticsearch[dmsnode-192][transport_client_worker][T#15]{New I/O worker #15}' |
I am having similar issues as described on this thread. #1940
Although leap second already happened in June but I tried resetting the time anyway. Didn't work.
This is causing a lot of frustration. Any help is much appreciated.
I have upgraded ES to the latest (elasticsearch-0.90.7-1). Have restarted ES, front end, logstash. no joy at all.
java -version
java version "1.7.0_45"
OpenJDK Runtime Environment (rhel-2.4.3.2.el6_4-x86_64 u45-b15)
OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
After the upgrade I am seeing these in the logs, which seems like a totally different issues.
[2013-11-28 03:48:32,796][WARN ][discovery.zen ] [LA Logger] received a join request for an existing node [[Night Thrasher][ecqZvZhDSTGiVkrjj6G_hw][inet[/192.168.1 28.146:9300]]{client=true, data=false}]
[2013-11-28 03:48:36,006][WARN ][discovery.zen ] [LA Logger] received a join request for an existing node [[Night Thrasher][ecqZvZhDSTGiVkrjj6G_hw][inet[/192.168.1 28.146:9300]]{client=true, data=false}]
Hot thread output: https://gist.github.com/ydnitin/7687098
The text was updated successfully, but these errors were encountered: