-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v4.2 does not reconnect to logstash in some cases #85
Comments
Are you actively logging messages? or using keep alive (via The LogstashTcpSocketAppender won't detect that the socket has been disconnected until it attempts to send new events, or sends the keep alive message. I've also noticed it might take sending a couple messages before Java realizes the socket is disconnected. Unfortunately, there's nothing that the TCP appender can do about that. |
I generate logging messages with a rate at about 5 per second. (I also added If i shut down logstash, and then start it up immediately again, the LogstashTcpSocketAppender still does not detect the disconnected socket: # On the client connecting to logstash
$ sudo netstat -npt | grep 4560
tcp 1 376785 ::ffff:10.0.1.184:33470 ::ffff:10.0.0.48:4560 CLOSE_WAIT 8476/java I waited about 13 minutes, continuously generating log events at a rate of about 5 per second, and then the logs say:
So, it seems something got the client to detect the socket was closed. Buffer full generating a write? What could it be? Any ideas? |
The buffer should be flushed fairly often. (Way more often than 13 minutes) By default, it is flushed after every event. The appender will only detect that the socket is closed and attempt reconnection if an exception occurs while writing to the socket. If Java's socket doesn't detect that the socket is closed, there's not much the appender can do. I'm not sure exactly what is going on in your case. I haven't been able to reproduce this. From what I have seen, the appending thread detects the socket disconnect after an event or two (when Java's socket throws an exception on write), and then reconnects and continues appending. My suggestion for a next step would be to take a few thread dumps after you have cycled logstash to see what the appending thread is doing. By default, the appending thread is be named Use Alternatively, if you are comfortable with a java debugger, you could just hook up a debugger to see what's up. |
When i do
Additionally, I ran
So, is the socket accepting writes to a send queue even though the state is CLOSE_WAIT? |
I also tried to remove logback from the equation, and only connect to the logstash service with $ nc -v 10.0.0.48 4560
Connection to 10.0.0.48 4560 port [tcp/*] succeeded!
# released immediately after logstash service is shut down Are there any places where an error or exception on a write is not handled? Could not find any obvious problem reading the code. |
The stack showing...
...is actually perfectly normal under working conditions. With a low frequency of log messages, the thread will spend most of its time waiting on events to be added to the ringbuffer. However, it does not rule out any problems, such as... I've been trying this for a while using netcat as the server locally and haven't been able to reproduce the problem. Every time I shutdown netcat, the tcp appender will detect it after an event or two, and attempt to reconnect. The socket will be in CLOSE_WAIT until writing an event causes the socket to throw an exception, which causes the appender to detect the problem and go into a reconnect loop. Can you try adding a status listener to your config to see any status messages from the appender? <statusListener class="ch.qos.logback.core.status.OnConsoleStatusListener" /> The following message (from that listener) indicates that the appender has detected a broken socket...
After the broken socket is detected, if the server is still down, you should see the following message...
If you still don't see any messages after adding the Specifically, I'd like to know what happens in AbstractLogstashTcpSocketAppender on Line 298 and Line 307. If neither of those lines are throwing an exception, then the appender will not detect the socket is broken. |
I've created a debug branch that has extra status messages for:
If you want to build and use that for debugging, it will show what the code is doing better. Note that you must have the |
I'll give that a try. I do, however, think that we will se all events being handled and successfully written to the socket since Send-Q fills up. I think the problem is that the writer should have detected that the connection is down and stop writing. So, are there any places to add debugging to know if any exceptions or return values from writing to the socket is handled properly? And yep, my Thanks for you help. |
I've added debugging in the places that I'm aware of. Writing to the socket is done inside If that throws an exception, then you should see the "unable to send event" status line from here If it doesn't throw an exception, then you should see the "Sent event" status line from here The only way that I know of to detect if the "other end" has closed the socket is to write to it, which is exactly what is happening. If it's not throwing an exception when writing, then I'm stumped. I've been trying to reproduce this, and have not been successful. If I could reproduce it, I could debug it. |
Btw, I found when compiling the library, that it will not run on Java 6, since the System.lineSeparator() did not exist on that version, http://docs.oracle.com/javase/6/docs/api/java/lang/System.html. |
I just ran the debug branch of the code, with printouts, while shutting down the logstash service and bringing it up again. Also note that keepAlive is not enabled here. These printouts are around the time that the Logstash service is shut down. Events until CCC seems to be sent, but from DDD the events are never dequeued.
These printouts are about 12 minutes later, when the appender from some reason realises it cannot send anything.
Was the writing thread stuck in sending the event? Does this help in any way? Is there something else I should run through? |
Did another run using only keepAlive events, i.e. no other events then keepAlive events were sent from the client to the Logstash service.
|
Hi @kretz, Sorry for delay, I've been away for a while. I've removed the call to System.lineSeparator, so java 6 should work now. I've still been trying to reproduce this, without success. I know this isn't much help, but every time I try, the appender is able to reconnect and sends the enqueued messages. For example:
The main difference I see is that I receive I'm a little puzzled by your log when you are not using keepAlive. A few questions:
Regarding this question...
The writing thread will attempt reconnection by polling. The writing thread is the thread that logs the |
In this specific test I can't remember if I ever saw the And the reason you do not see any |
Hopefully this will stop it getting killed
Hi @kretz, are you still having this issue? Do you have any additional details that might help me reproduce this? |
Hi, I tried zeroing in on what the problem might be, and get additional data, And recently I have not had the time to look at it, and accepted the On Mon, May 18, 2015 at 3:40 AM philsttr [email protected] wrote:
|
I'm seeing this with 4.3 as well. (1) If logstash restart it looses the connection immediately and does not reconnect. |
Hi @phillro Since I haven't been able to reproduce this myself, I'm going to need to gather some information from you...
In addition, let me know anything special that is on your network between your application and logstash (load balancer, etc). Thanks |
I am experiencing similar behavior with 4.4 at AWS with logstash behind the ELB with a TCP forwarder on 4560. When I redeploy logstash, the logs stop getting to logstash until I restart the application and it reconnects. I've waited half an hour and nothing. What about a workaround? Every few minutes, create a new connection, then replace the existing one with the new one, and terminate the old one. Then even if a connection does get stuck, it won't require human intervention to recover. |
Hi Avery, I'll consider implementing the workaround. However, I'd really like to know the root cause. Can you help me to debug since I haven't been able to reproduce this? I've added the 4 steps that will help me in the comment above. In addition, if you have java debugging experience, I'd really appreciate it if you could step through the application and let me know what you see. Particularly in Thanks |
…ed the end of the stream. This fixes a condition where the server-side terminates the connection, but the client-side wasn't detecting it, leaving the client socket in CLOSE_WAIT.
To detect that the server-side has closed the socket, I have added a reader thread that just reads from the socket input stream (the server-side's output stream). When the input stream reaches the end of the stream, the code now makes the assumption that the socket has been closed on the server side (rather than just being in a half-open state), and reconnects to the server. This will work as long as logstash's tcp input doesn't shutdown its output stream while keeping the input stream open. I have tested this against Amazon's ELB, and it seems to solve the problem. i.e. the appender will reconnect when an instance behind the ELB closes. I tested this with "Connection Draining" disabled on the ELB. Not sure if that makes a difference. The fix will be in version 4.5, which I'll release in the next week or so. |
Released in 4.5 |
I have a problem with the LogstashTcpSocketAppender - it does not detect that the logstash service has been shut down. (This could be a duplicate of #76, but since I'm not sure, I'm creating a new issue.)
If I start my service using LogstashTcpSocketAppender without a logstash service running and then start the logstash service the logs (correctly) say:
However, when I then shut down the logstash service there is nothing in the logs. And I guess it is because LogstashTcpSocketAppender thinks the logstash service is till there.
One thing to add here is that I close the logstash service very abruptly - it runs in a docker container which is killed - so the socket might look like it is connected from the LogstashTcpSocketAppender side.
Another thing to add is that I run on AWS with my logstash service on EC2, my services deployed via Elastic Beanstalk, communicating directly with logstash over private IPs without ELB.
How can I help to debug this?
The text was updated successfully, but these errors were encountered: