-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packetbeat stops logging data with too many open files for pids #226
Comments
I see a missing close in FindSocketsOfPid. Which publisher have you configured. Can you check with lsof if your packetbeat instance holds loads of file descriptors into proc? |
@urso, checking ... well, I just restarted it about 25 minutes ago and I don't see a lot of open files to
Here is the
|
BTW, I am using a 6-node ES cluster, but I see 7 connections, the first node has two connections to it. |
@urso, on one of my other nodes I see close to 100 open FDs to the ES instances. We have had to reboot some of the ES nodes, perhaps the connections are not be closed down properly? |
For sending requests we use the golang net/http package. The publisher tries to put data round robin on configured notes. If one notes becomes unavailable round robin load balancing happens on remaining nodes. Not sure if actual open sockets are reused or just a new connection is made if one insert takes much too long. If one packetbeat has 100+ parallel running connections it might be an indicator of packetbeat producing more data then ES can consume right now. Makes me wonder about overal balance in your system. Are these 100 open FDs mostly to ES? Are they about evenly distributed between your ES instances or is one getting slow? Can you also check your ES instances for number of open sockets, memory and cpu usage? Have you considered this kind of setup: https://www.elastic.co/guide/en/beats/packetbeat/current/packetbeat-logstash.html? |
For the six nodes, here are the totals, not quite 100, but currently 77:
We just rebooted yesterday 172.18.40.3, so that may be why there is only one to that node. So the ES instances are mostly idle, though at times there are spurts of indexing going on, but it is not too heavy. Here is the open socket distribution on the ES instances to clients:
Here is the memory and CPU usage, very low:
We have considered that kind of setup, the redis-->logstash-->ES, we are probably going to use Kafka instead of redis at this point. |
That's a load of connections. Might be the beat is creating new HTTP connections all the time to push data concurrently due to network latencies (TCP connection per request can become quite expensive). |
@urso, we had to restart our ES cluster due to hardware maintenance, so I no longer have it running in this state. I'll keep an eye out for this state in the future, and post updates if you think it will be helpful. Thanks! |
Thanks. If possible check for excessive HTTP connections (TCP-SYN and Fin packets between packetbeat and elasticsearch servers.). |
@urso, roger that. |
@urso, an update: have not seen problems with packetbeat for the past month. I suspect that the original problems were do to losing connections to the ElasticSearch instances. Still watching, you can close this issue if you'd like. |
@portante Good to hear. I closed the issue. Please reopen the issue in case the problems occur again. |
I'd like to keep this open. packetbeat seems misbehaving by open an unbounded number of connections to elasticsearch basically bypassing TCP congestion control by generating more traffic. After some time packetbeat is killed by OS for resource exhaustion (number of file descriptors). That is a bad network state or a failing elasticsearch instance can bring down packetbeat. Recent additions to libbeat for lumberjack can be leveraged to implement a better load balancing behavior with exactly one connection per configured elasticsearch host. |
@urso, I'll recheck my packetbeat instances today to see what their current open connections are like. |
I have fifteen nodes running packetbeat to a 6-node ES cluster, where I configured each packetbeat host to talk to all 6 ES nodes directly. Well, I am also running packetbeat on the 6 ES nodes and they are only talking to their localhost instance. I would expect 1 connection to each of the 6-nodes from each non-local packetbeat instance, and only 1 connection to each local packetbeat instance. However, I am finding that there are 7 connections established on all the non-local hosts, where the first node in the configuration list is connected twice. And on the local hosts, I am seeing three established connections. There is one exception to the above: one host has 13 connections, two to each ES node, with the first ES node in the configuration list connected three times. |
@portante I kinda expected this. So whenever ES or network generates some backpreasure, packetbeat opens a new connection to ES (without limiting number of connections). |
@urso, and it appears to do this right from the beginning with the first connection. |
maybe due to TCP slow-start? |
I just updated to -beta4 and I am now seeing errors like:
I see over 500+ files open for /proc//fd and /proc//net/tcp from one or two of the instances of packetbeat deployed. |
@urso, I'll open a new issue for the too many open files issue. As for this issue, I have not seen any of the extra connections. |
ok, thanks. |
@portante can you link the new issue here? |
respect * debug selector in IsDebug
Saw this on my install of packetbeat, "
Packetbeat version 1.0.0-beta2 (amd64)
" (theamd64
is wrong, it is anx86_64 CPU
):The text was updated successfully, but these errors were encountered: