Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jetty with HTTP/2 enabled has CLOSE_WAIT Sockets and High CPU Issue #7072

Closed
naveenk26 opened this issue Nov 2, 2021 · 13 comments
Closed

Jetty with HTTP/2 enabled has CLOSE_WAIT Sockets and High CPU Issue #7072

naveenk26 opened this issue Nov 2, 2021 · 13 comments
Labels
Bug For general bugs on Jetty side

Comments

@naveenk26
Copy link

We are using Jetty 9.4.40 with dropwizard, java version 1.8.0_302. Trying to enable HTTP/2. When we enable the HTTP/2 support, the application works without any issues for 5 to 6 hours, after that, we see a lot of CLOSE_WAIT connections and CPU usage also goes up.

Jetty is behind the ELB. When we analyzed the thread dump, the thread which was consuming a high CPU had the following stack trace.

stackTrace:
java.lang.Thread.State: RUNNABLE
at org.eclipse.jetty.http2.parser.HeadersBodyParser.parse(HeadersBodyParser.java:225)
at org.eclipse.jetty.http2.parser.Parser.parseBody(Parser.java:198)
at org.eclipse.jetty.http2.parser.Parser.parse(Parser.java:127)
at org.eclipse.jetty.http2.parser.ServerParser.parse(ServerParser.java:115)
at org.eclipse.jetty.http2.HTTP2Connection$HTTP2Producer.produce(HTTP2Connection.java:261)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:183)
at org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:138)
at org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:361)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:540)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:395)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:161)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.Invocable.invokeNonBlocking(Invocable.java:69)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.invokeTask(EatWhatYouKill.java:348)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:303)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)

The CLOSE_WAIT connection and CPU usage increase are not gradual, it happens in one minute. The host doesn't recover from this problem. we are forced to restart the application.

@naveenk26 naveenk26 added the Bug For general bugs on Jetty side label Nov 2, 2021
@sbordet
Copy link
Contributor

sbordet commented Nov 2, 2021

we see a lot of CLOSE_WAIT connections

Where do you see those CLOSE_WAIT connections? Between the ELB and Jetty? On the ELB side or on the Jetty side?

The stack trace seems irrelevant, it shows that it's parsing a HEADERS frame.

Would be great if you can grab debug logs that shows what happens to a connection that remains in CLOSE_WAIT, or a network trace with Wireshark.

@joakime
Copy link
Contributor

joakime commented Nov 2, 2021

Release version 9.4.40 has some security advisories listed against it.
See https://www.eclipse.org/jetty/security_reports.php

Are you using logback-access by any chance?
See other issue with someone reporting CPU usage issue on Amazon - #6973 (comment)
Can you upgrade, to at least 9.4.44 and see if this still happening?

@naveenk26
Copy link
Author

We see the CLOSE_WAIT connections between ELB and Jetty on the Jetty side. I added the stack trace because that thread was consuming almost 90%+ of CPU.

@sbordet
Copy link
Contributor

sbordet commented Nov 2, 2021

We see the CLOSE_WAIT connections between ELB and Jetty on the Jetty side.

Then we need the DEBUG logs or network trace.
What's your idle timeout setting?

@naveenk26
Copy link
Author

Idle timeout is 30 sec, will try to get the debug logs.

@naveenk26
Copy link
Author

We are not using the logback-access, just checked.

@naveenk26
Copy link
Author

naveenk26 commented Nov 4, 2021

I am in the process of extracting the debug logs. The threads which were taking more CPU time were last printed these lines and they continued to be in the runnable state for a long time

Thread 1: dw-81361
2021-11-03T22:13:24Z
EatWhatYouKill@7bac2477/HTTP2Producer@753d8dea/PRODUCING/p=false/InstrumentedQueuedThreadPool[dw]@11ab8594{STARTED,1<=92<=2048,i=13,r=-1,q=0}[ReservedThreadExecutor@e66575{s=2/2,p=0}][pc=0,pic=0,pec=0,epc=0]@2021-11-03T22:13:24.684Z tryProduce false

Thread 2: dw-81576
2021-11-03T22:13:24Z
EatWhatYouKill@51af255b/HTTP2Producer@2ed9df2a/PRODUCING/p=false/InstrumentedQueuedThreadPool[dw]@11ab8594{STARTED,1<=92<=2048,i=10,r=-1,q=0}[ReservedThreadExecutor@e66575{s=0/2,p=2}][pc=0,pic=0,pec=0,epc=0]@2021-11-03T22:13:24.878Z tryProduce false

@sbordet
Copy link
Contributor

sbordet commented Nov 4, 2021

Given the stack trace you reported and these lines, Jetty is actively parsing HEADERS frames from the network.

The fact that the threads consume CPU is normal, if there is traffic and bytes to parse.

These connections however, have [pc=0,pic=0,pec=0,epc=0], which means that no task has been produced as a result of parsing those frames, and that is strange.

A theory to explain this could be that the load balancer sends bad HEADERS frames, but again we need logs or network trace to tell.

@naveenk26
Copy link
Author

Hi

Here is the jetty debug log for the issue: https://github.com/naveenk26/jetty-debug
Couple of observations we made.

  1. Host reaches 100% CPU after processing ~120K HTTP/2 requests
  2. When the CPU is 100%, we stopped the requests entering that host, in a couple of scenarios host recovered, but not always.
  3. We tested with 9.4.44, the same issue is observed

@sbordet
Copy link
Contributor

sbordet commented Nov 15, 2021

@naveenk26 I don't see any evidence of threads spinning, nor of close issues in the logs that you provided.
Are you sure this is a Jetty issue?

We need a clear correlation between the logs and at least one of the threads that is spinning the CPU at 100%, and a clear correlation between the logs and at least one of the socket in CLOSE_WAIT.

In the logs there are 29 open for sockets on port 8081 and 29 closes, as well as 251 opens for ports 900x and 271 closes.
In CLOSE_WAIT scenarios, we typically see many opens and almost no closes.

Regarding CPU usage, if there was a spin for a Jetty thread, I would have seen millions of log lines for that thread, and many many less lines for the other non-spinning threads, but that is not the case: the threads in the logs are more or less equally distributed with no thread having logged the majority of lines.

Please provide the correlation. If you can get the thread ID of the spinning thread, and correlate that with the DEBUG logs that will be great.
Same for the CLOSE_WAIT connection: please find the connection in CLOSE_WAIT and verify that there is a match in the DEBUG logs.

Please attach new logs with their correlations, and we'll take a look.

@naveenk26
Copy link
Author

@sbordet Thank you for the response, will get back to you with different debug logs and thread ids.

@janbartel
Copy link
Contributor

@naveenk26 nudge? Any progress on this issue?

@naveenk26
Copy link
Author

We resolved the issue by disabling the GraalVM just-in-time compilation.

@sbordet sbordet closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
None yet
Development

No branches or pull requests

4 participants