Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogstashTcpSocketAppender fails to shutdown #114

Closed
sawyercade opened this issue Jul 22, 2015 · 6 comments
Closed

LogstashTcpSocketAppender fails to shutdown #114

sawyercade opened this issue Jul 22, 2015 · 6 comments

Comments

@sawyercade
Copy link

I have logstash running on a remote server, and a simple test app to verify that logs are being written:

public class App {
    private static final Logger LOGGER = LoggerFactory.getLogger(App.class);

    public static void main(String[] args) throws Exception {
        LOGGER.info("Info message");
        LOGGER.warn("Warn message");
        LOGGER.error("Error message");

        LoggerContext loggerContext = (LoggerContext) LoggerFactory.getILoggerFactory();
        loggerContext.stop();
    }
}

logback.xml excerpt:

<appender name="STASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
    <remoteHost>megalodon01.ttagg.com</remoteHost>
    <port>4712</port>
    <keepAliveDuration>5 minutes</keepAliveDuration>

    <!-- encoder is required -->
    <encoder class="net.logstash.logback.encoder.LogstashEncoder" />
</appender>

<root level="debug">
    <appender-ref ref="STDOUT" />
    <appender-ref ref="STASH" />
</root>

Logs are being written as expected, but App hangs during the loggerContext.stop() call. Here are the last few lines of my log output:

14:13:45,043 |-INFO in net.logstash.logback.appender.LogstashTcpSocketAppender[STASH] - Log destination megalodon01.ttagg.com:4712: connection established.
14:13:49.982 [main] INFO  com.datarank.App - Info message
14:13:49.985 [main] WARN  com.datarank.App - Warn message
14:13:49.985 [main] ERROR com.datarank.App - Error message
// Long pause/hang here, then:
14:14:50,040 |-WARN in net.logstash.logback.appender.LogstashTcpSocketAppender[STASH] - Some queued events have not been logged due to requested shutdown

Like I said, all logs are being written just fine, but it would be nice if shutting down didn't cause such a long hang and a WARN. Am I shutting down incorrectly or something?

@philsttr
Copy link
Collaborator

You are shutting down correctly. It should not pause/hang during shutdown.

Can you take a thread stack dump during the hang/pause, and post it here?

jstack <pid>

@sawyercade
Copy link
Author

I think I timed this right!

Charybdis:~ sawyer$ jstack 10761
2015-07-22 15:40:11
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.31-b07 mixed mode):

"Attach Listener" #12 daemon prio=9 os_prio=31 tid=0x00007fb1e28d0800 nid=0x5303 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"logback-async-disruptor-appender-megalodon01.ttagg.com:4712-2" #11 daemon prio=5 os_prio=31 tid=0x00007fb1e1937000 nid=0x5103 waiting on condition [0x000000012365e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000076bf5ff98> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"logback-async-disruptor-appender-megalodon01.ttagg.com:4712-1" #10 daemon prio=5 os_prio=31 tid=0x00007fb1e28d1800 nid=0x5007 waiting on condition [0x000000012355b000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000076bf5ff98> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

"Service Thread" #8 daemon prio=9 os_prio=31 tid=0x00007fb1e189c000 nid=0x4703 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #7 daemon prio=9 os_prio=31 tid=0x00007fb1e1029000 nid=0x4503 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=31 tid=0x00007fb1e101f800 nid=0x4303 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=31 tid=0x00007fb1e101f000 nid=0x4103 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=31 tid=0x00007fb1e101e000 nid=0x3413 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=31 tid=0x00007fb1e1018000 nid=0x2d03 in Object.wait() [0x000000012119b000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000076ab062f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:142)
    - locked <0x000000076ab062f8> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:158)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=31 tid=0x00007fb1e1807000 nid=0x2b03 in Object.wait() [0x0000000121098000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x000000076ab05d68> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:502)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)
    - locked <0x000000076ab05d68> (a java.lang.ref.Reference$Lock)

"main" #1 prio=5 os_prio=31 tid=0x00007fb1e1009800 nid=0x1303 waiting on condition [0x000000010661f000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000076bf60058> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
    at net.logstash.logback.appender.AsyncDisruptorAppender.stop(AsyncDisruptorAppender.java:339)
    at net.logstash.logback.appender.AbstractLogstashTcpSocketAppender.stop(AbstractLogstashTcpSocketAppender.java:572)
    - locked <0x000000076b3b49f8> (a net.logstash.logback.appender.LogstashTcpSocketAppender)
    at ch.qos.logback.core.spi.AppenderAttachableImpl.detachAndStopAllAppenders(AppenderAttachableImpl.java:104)
    at ch.qos.logback.classic.Logger.detachAndStopAllAppenders(Logger.java:209)
    at ch.qos.logback.classic.Logger.recursiveReset(Logger.java:338)
    at ch.qos.logback.classic.LoggerContext.reset(LoggerContext.java:213)
    at ch.qos.logback.classic.LoggerContext.stop(LoggerContext.java:336)
    at com.datarank.LoggingTest.main(LoggingTest.java:22)

"VM Thread" os_prio=31 tid=0x00007fb1e2805000 nid=0x2903 runnable

"GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007fb1e1015800 nid=0x2103 runnable

"GC task thread#1 (ParallelGC)" os_prio=31 tid=0x00007fb1e1800800 nid=0x2303 runnable

"GC task thread#2 (ParallelGC)" os_prio=31 tid=0x00007fb1e1809000 nid=0x2503 runnable

"GC task thread#3 (ParallelGC)" os_prio=31 tid=0x00007fb1e1806800 nid=0x2703 runnable

"VM Periodic Task Thread" os_prio=31 tid=0x00007fb1e2813000 nid=0x4903 waiting on condition

JNI global references: 56

philsttr added a commit that referenced this issue Jul 27, 2015
Configure the ScheduledThreadPoolExecutor to not wait for cancelled tasks when it is shutdown.
This only works on java 1.7+.
Java 1.6 and lower will continue to wait on cancelled tasks (or until the timeout occurs), which ever comes first.

Also upgrade to disruptor 3.3.2
@philsttr
Copy link
Collaborator

This only occurred when keepAlive was enabled.

The pause during shutdown was caused by the ScheduledThreadPoolExecutor waiting for cancelled tasks (which it does by default) during shutdown.

I fixed this by configuring ScheduledThreadPoolExecutor.setRemoveOnCancelPolicy(true) so that cancelled tasks are removed from its internal queue, which means that it won't wait on them before shutting down.

This fix will only work when running on java 1.7+, since that is when the setRemoveOnCancelPolicy method was added.

When running <= java 1.6, the shutdown logic will continue waiting until it times out (after 1 min)

The fix will be available in the next release (4.5).

Note that I also saw that the BatchEventProcessor thread used in the disruptor fails to shutdown if it is not started up prior to shutdown. This is extremely rare, and will only take place on fast terminating applications. I could not find a way to fix this, other than inserting a sleep after startup. :/ But again, this is rare, and only occurs on fast terminating applications.

@MikeN123
Copy link

Do you have an idea of when 4.5 will be released? We're waiting for this fix, and I would like to avoid creating an own build if you will be releasing this soon.

@philsttr
Copy link
Collaborator

Hi Mike, I plan on releasing 4.5 before Sept 1st. I'm planning on completing #115 for 4.5, but nothing else unless a bug report comes in before then.

@philsttr
Copy link
Collaborator

Released in 4.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants