Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server restart exhausts retries #1119

Open
jkozlowski opened this issue Feb 8, 2021 · 3 comments
Open

Server restart exhausts retries #1119

jkozlowski opened this issue Feb 8, 2021 · 3 comments

Comments

@jkozlowski
Copy link
Contributor

What happened?

Tests in palantir/atlasdb#5233 would fail because of NoHttpResponseException. The nature of this test is that it gracefully shuts down servers to test resiliency to that scenario. The dialogue clients are setup with a single URL only, meaning that all previously open connections are closed, and the request fails because it blows through it's retry limit.

What did you want to happen?

The request should succeed: we know the server is up because the tests await for that.

@jkozlowski
Copy link
Contributor Author

jkozlowski commented Feb 8, 2021

Idea: if we get a NoHttpResponseException, and it took less than 100ms, we should retry with another connection. Max to 100 times? Up to 1 sec?

This would basically live at the per-host channel level, we'd need a retry loop there, that's different to the RetryingChannel.

@Regan-Koopmans
Copy link

@jkozlowski I wonder whether we could use exponential backoff to increase the retry delay?

@jkozlowski
Copy link
Contributor Author

We already have exponential backoff in RetryingChannel; the problem here is that all the connections are blown, because the server restarted. But we cannot know that without doing a read sadly.

I think this isn't much of a problem in prod, because you'll most likely land on another host, but in this test it fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants