Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Fix race condition causing 'pool not open' errors #5882

Merged
merged 4 commits into from
Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
import com.palantir.logsafe.exceptions.SafeIllegalStateException;
import com.palantir.logsafe.logger.SafeLogger;
import com.palantir.logsafe.logger.SafeLoggerFactory;
import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.util.HashMap;
import java.util.HashSet;
Expand All @@ -50,6 +51,7 @@
import java.util.concurrent.ScheduledFuture;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;
import java.util.stream.Collectors;
import org.apache.cassandra.thrift.NotFoundException;
import org.apache.cassandra.thrift.TokenRange;

Expand Down Expand Up @@ -318,10 +320,16 @@ private void setServersInPoolTo(Set<InetSocketAddress> desiredServers) {
serversToRemove.forEach(cassandra::removePool);

if (!(serversToAdd.isEmpty() && serversToRemove.isEmpty())) { // if we made any changes
// Log IP addresses along with hostnames
Set<InetAddress> addressesToAdd =
serversToAdd.stream().map(InetSocketAddress::getAddress).collect(Collectors.toSet());
Set<InetAddress> addressesToRemove =
serversToRemove.stream().map(InetSocketAddress::getAddress).collect(Collectors.toSet());

log.info(
"Servers to add and remove, inside the if block",
SafeArg.of("serversToAdd", serversToAdd),
SafeArg.of("serversToRemove", serversToRemove));
"Added and removed servers from the client pool",
SafeArg.of("serversAdded", addressesToAdd),
SafeArg.of("serversRemoved", addressesToRemove));
sanityCheckRingConsistency();
cassandra.refreshTokenRangesAndGetServers();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -399,15 +399,16 @@ public void addPool(InetSocketAddress server) {

public void removePool(InetSocketAddress removedServerAddress) {
blacklist.remove(removedServerAddress);
CassandraClientPoolingContainer containerToRemove = currentPools.get(removedServerAddress);
currentPools.remove(removedServerAddress);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove returns the value removed, so we shouldn't need the get as well.

try {
currentPools.get(removedServerAddress).shutdownPooling();
containerToRemove.shutdownPooling();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically doesn't fully solve the issue: what if there are in-flight requests on the pool-to-be-closed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yes - ideally we'd have something that will:
a) block new requests from being started
b) wait for existing requests to complete (or time out)
c) clean up by shutting down pooling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like once we've called borrowObject (in CassandraClientPoolingContainer.runWithGoodResource), we should be good?

I'm not averse to adding something like this if it's proven to be necessary (and I can follow up this PR with logging to help us to determine this), but for now I think in-flight requests should continue to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's pretty nice, actually!

} catch (Exception e) {
log.warn(
"While removing a host ({}) from the pool, we were unable to gently cleanup resources.",
SafeArg.of("removedServerAddress", CassandraLogHelper.host(removedServerAddress)),
e);
}
currentPools.remove(removedServerAddress);
}

public void cacheInitialCassandraHosts() {
Expand Down
6 changes: 6 additions & 0 deletions changelog/@unreleased/pr-5882.v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
type: fix
fix:
description: Fixed a race condition where we would potentially attempt to use a
Cassandra client pool that we had already closed.
links:
- https://github.com/palantir/atlasdb/pull/5882