Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Excavator: Upgrade dependencies #5233

Merged
merged 18 commits into from
Feb 8, 2021
Merged

Conversation

svc-excavator-bot
Copy link
Collaborator

excavator is a bot for automating changes across repositories.

Changes produced by the roomba/versions-props-latest check.

To enable or disable this check, please contact the maintainers of Excavator.

@jkozlowski
Copy link
Contributor

So as far as I can see tests are probably flaking because of this:

server-9088 WARN  [2021-02-05 15:11:45,082] com.palantir.paxos.PaxosQuorumChecker: We encountered an error while trying to request an acknowledgement from another paxos node. This could mean the node is down, or we cannot connect to it for some other reason.
! org.apache.hc.core5.http.NoHttpResponseException: The target server failed to respond
! at org.apache.hc.core5.http.impl.io.DefaultHttpResponseParser.createConnectionClosedException(DefaultHttpResponseParser.java:87)
! at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:243)
! at org.apache.hc.core5.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:53)
! at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:199)
! at com.palantir.dialogue.hc5.InstrumentedManagedHttpClientConnection.receiveResponseHeader(InstrumentedManagedHttpClientConnection.java:105)
! at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:175)
! at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:218)
! at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:596)
! at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:215)
! at org.apache.hc.client5.http.impl.classic.MainClientExec.execute(MainClientExec.java:107)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement$1.proceed(ExecChainElement.java:57)
! at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:181)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement$1.proceed(ExecChainElement.java:57)
! at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:172)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement$1.proceed(ExecChainElement.java:57)
! at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement$1.proceed(ExecChainElement.java:57)
! at com.palantir.dialogue.hc5.TracingExecChainHandler.execute(TracingExecChainHandler.java:36)
! at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
! at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:178)
! at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:75)
! at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:89)
! at com.palantir.dialogue.hc5.ApacheHttpClientBlockingChannel.execute(ApacheHttpClientBlockingChannel.java:94)
! at com.palantir.dialogue.blocking.BlockingChannelAdapter$BlockingChannelAdapterChannel$BlockingChannelAdapterTask.run(BlockingChannelAdapter.java:139)
! at com.palantir.tracing.Tracers$TracingAwareRunnable.run(Tracers.java:501)
! at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
! at java.util.concurrent.FutureTask.run(FutureTask.java:266)
! at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
! at com.palantir.tritium.metrics.TaggedMetricsThreadFactory$InstrumentedTask.run(TaggedMetricsThreadFactory.java:72)

I wonder if it's this change somehow?https://github.com/palantir/dialogue/pull/1089/files

@jkozlowski
Copy link
Contributor

jkozlowski commented Feb 5, 2021

These particular tests run through WireMock (I believe, but need to double check); conjure-java-runtime 5->6 bump replaces the feign okhttp client with dialogue. When the server unexpecteadly, prematurely closes a persistent connection, the dialogue http client will through a NoHttpResponseException: this should be retried (if we're not retrying it, we should);

It kinda feels like the fact that some of these tests use WireMock proxy might be the smoking gun here. The hypothesis is that WireMock is killing all connections in a weird way and that blows through our retries?

@jkozlowski
Copy link
Contributor

@carterkozak suggested that clearing the clients after killing the server should do it, and indeed it seems to have. I suspect jetty (dropwizard) does not cleanly close connections upon service stop?

@carterkozak
Copy link
Contributor

The java socket api doesn’t give us a way to tell if a socket is closed without doing a read or write, I expect the connections are being closed gracefully, we just don’t (can’t) notice until we try to use them.
We may be able to purge all pooled connections to a route that has bounced if we receive a couple subsequent no-http-response-exceptions. The pool is fifo, so it’s relatively safe to assume remaining connections are less likely to be alive than the last. We need to be careful to avoid scenarios where a single connection can flake and we trash an otherwise healthy connection pool.

@jkozlowski
Copy link
Contributor

Yeah, I just tried bumping the graceful shutdown, that didn't work.

@jkozlowski
Copy link
Contributor

I suppose not counting broken connections as a retry doesn't really work?

@carterkozak
Copy link
Contributor

It would be interesting to record the connection pool idle connection count when we get a nohttpresponseexception, if there are more pooled closed connections than retries, we’re doomed

@jkozlowski
Copy link
Contributor

@sudiksha27 / @gmaretic can you approve? I have committed a workaround, and we'll see if we can make changes to dialogue, but I wouldn't want to block on that.

Copy link
Contributor

@gmaretic gmaretic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for digging into this!

@gmaretic gmaretic merged commit 0af4a67 into develop Feb 8, 2021
@svc-autorelease
Copy link
Collaborator

Released 0.293.0

@delete-merged-branch delete-merged-branch bot deleted the roomba/versions-props-latest branch February 8, 2021 14:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants