Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Try every host when attempting to get the ip to hostname mapping if a failure is hit. #6014

Merged
merged 10 commits into from
May 3, 2022

Conversation

Sam-Kramer
Copy link
Contributor

Goals (and why):
==COMMIT_MSG==
Try every host when attempting to get the ip to hostname mapping if a failure is hit.
==COMMIT_MSG==

Implementation Description (bullets):

  • Obtain the list of Cassandra servers which are alive
  • Iterate through the servers, until (most likely 1st) returns a valid response (i.e. not a TimeOutException)
  • Return result

Testing (What was existing testing like? What have you done to improve it?):

  • Unit tests, which should be feasible enough to test that retry occurs on multiple hosts..

Concerns (what feedback would you like?):

  • Trying every host can be expensive, and querying a quorum of hosts is not much better. In the worst case scenario, it could take timeout * N for the task to fail to update the addresses. This would be problematic on very large clusters, running in Kubernetes (as the IP changes are significant here), restarting an entire rack a time. However, this failure condition is so unlikely, I'd argue it should be overlooked (for now).
    As basically: N * timeout > time to start all nodes in rack Y, but that assumes we are hitting the timeout exception, rather than something else (i.e. connection refused, which is almost instant).

Where should we start reviewing?:

Priority (whenever / two weeks / yesterday):
Yesterday

@changelog-app
Copy link

changelog-app bot commented Apr 29, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Try every host when attempting to get the ip to hostname mapping if a failure is hit.

Check the box to generate changelog(s)

  • Generate changelog entry

@@ -104,7 +105,8 @@ public CassandraService(
this.blacklist = blacklist;
this.poolMetrics = poolMetrics;

Supplier<Map<String, String>> hostnamesByIpSupplier = new HostnamesByIpSupplier(this::getRandomGoodHost);
Supplier<Map<String, String>> hostnamesByIpSupplier =
new HostnamesByIpSupplier(this::getAllHostsUnlessBlacklisted);
this.hostnameByIpSupplier = Suppliers.memoizeWithExpiration(hostnamesByIpSupplier::get, 2, TimeUnit.MINUTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this 2 minute memoization factor into cassandra nodes coming into and leaving cluster, especially during rolling restarts and upgrades?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the historical context on why the 2 minutes was picked here, but from what I have seen, it takes roughly 2 (usually 3) minutes for a Cassandra node to leave/re-join a cluster, so this should be safe. Although, maybe we knock this down to 1 minute? Question for the Atlas folks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions on the memoisation length - it was probably picked arbitrarily, honestly. Yeah we can make this 1.

Comment on lines 66 to 67
.map(Optional::get)
.findFirst()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're grabbing the first result, should we filter out any empty responses to grab the first non-empty view of hostnamesByIp?

Suggested change
.map(Optional::get)
.findFirst()
.map(Optional::get)
.filter(hostnamesByIp -> !hostnamesByIp.isEmpty())
.findFirst()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this originally, but the (subtle) case that an empty map is a valid response, in the case that they mapping keyspace doesn't exist, table doesn't exist, or there are no rows in the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, you end up querying every single node, which isn't needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the mapping keyspace or table not exist for a given node (e.g. is it only on bootstrap)? Aside from duration, why wouldn't we want to continue querying nodes to try and find a non-empty view from other nodes?

Related to the duration & 60 second timeout mentioned above, it feels like we might want to have something managing cluster state in background thread and possibly Refreshable mechanisms for updating & subscribing suppliers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the keyspace/table are an optional dependency, and are only needed in the case that AtlasDb is ran in a split environment (i.e. in two separate VPCs). As a result, we can expect that sometimes these tables do not exist, and in those cases, the IP addresses should be sufficient for communicating with.

As for querying other nodes for non-empty views, the read itself is done at quorum, so there should be no difference in querying the 1 or 2 nodes, as that is what we're effectively doing under-the-hood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the duration & 60 second timeout mentioned above, it feels like we might want to have something managing cluster state in background thread and possibly Refreshable mechanisms for updating & subscribing suppliers Agreed -- this is already ran in a background thread, so this doesn't block any critical read/write paths.

…cassandra/pool/CassandraService.java

Co-authored-by: David Schlosnagle <[email protected]>
log.warn("Could not get hostnames by ip from Cassandra", e);
return ImmutableMap.of();
}
return hosts.get().stream()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should unroll this into a loop, and track a best-effort timeout here. I.e., if we take over 60s for this to run, we should just fail, in the case that we prevent an actual update occurring where hostnames are not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @jeremyk-91 I decided to go through each host rather than a quorum of hosts, as the blacklist filter removes an unknown number of hosts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the potential expense of resolving these, it seems like we might want this all to be updating & heartbeating in a background thread as opposed to potentially impacting on regular atlas transaction execution thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good flag - I think this should be on a background thread already though, as following up the places where this supplier is called seems to all point back to the background task that updates the token ring (unless this PR changed that, of course).

…cassandra/pool/HostnamesByIpSupplier.java

Co-authored-by: David Schlosnagle <[email protected]>
@@ -367,6 +369,12 @@ public Optional<CassandraClientPoolingContainer> getRandomGoodHostForPredicate(
return randomLivingHost.map(pools::get);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a separate consideration, but for com.palantir.atlasdb.keyvalue.cassandra.pool.CassandraService#getReachableProxies should we be including the inputHost as one of the reachable proxies as that may be the client addressable hostname/IP? I was thinking something along the lines of:

    private Set<InetSocketAddress> getReachableProxies(String inputHost) throws UnknownHostException {
        int knownPort = getKnownPort();

        ImmutableSet.Builder<InetSocketAddress> proxies = ImmutableSet.builder();
        proxies.add(new InetSocketAddress(inputHost, knownPort));
        for (InetAddress address : InetAddress.getAllByName(inputHost)) {
            // It is okay to have reachable proxies that do not have a hostname
            proxies.add(new InetSocketAddress(address, knownPort));
        }
        return proxies.build();
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me, though not super aware of the network-related implications

@@ -104,7 +105,8 @@ public CassandraService(
this.blacklist = blacklist;
this.poolMetrics = poolMetrics;

Supplier<Map<String, String>> hostnamesByIpSupplier = new HostnamesByIpSupplier(this::getRandomGoodHost);
Supplier<Map<String, String>> hostnamesByIpSupplier =
new HostnamesByIpSupplier(this::getAllHostsUnlessBlacklisted);
this.hostnameByIpSupplier = Suppliers.memoizeWithExpiration(hostnamesByIpSupplier::get, 2, TimeUnit.MINUTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions on the memoisation length - it was probably picked arbitrarily, honestly. Yeah we can make this 1.

@@ -367,6 +369,14 @@ public Optional<CassandraClientPoolingContainer> getRandomGoodHostForPredicate(
return randomLivingHost.map(pools::get);
}

public List<PoolingContainer<CassandraClient>> getAllHostsUnlessBlacklisted() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getAllNonBlacklistedHosts? but yeah, the main part looks fine

log.warn("Could not get hostnames by ip from Cassandra", e);
return ImmutableMap.of();
}
return hosts.get().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good flag - I think this should be on a background thread already though, as following up the places where this supplier is called seems to all point back to the background task that updates the token ring (unless this PR changed that, of course).

@@ -367,6 +369,12 @@ public Optional<CassandraClientPoolingContainer> getRandomGoodHostForPredicate(
return randomLivingHost.map(pools::get);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me, though not super aware of the network-related implications

Sam-Kramer and others added 3 commits April 29, 2022 18:31
…cassandra/pool/HostnamesByIpSupplier.java

Co-authored-by: David Schlosnagle <[email protected]>
@jeremyk-91
Copy link
Contributor

👍 from my end (no strong preference on the duration compares, though I think I agree with @schlosna on that one)

Sam-Kramer and others added 2 commits May 3, 2022 13:08
@Sam-Kramer Sam-Kramer merged commit b2aa074 into develop May 3, 2022
@delete-merged-branch delete-merged-branch bot deleted the skramer/try-all-hosts-if-hostname-query-fails branch May 3, 2022 13:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants