-
Notifications
You must be signed in to change notification settings - Fork 15
[PDS-321180] Permit Cluster Switches/Additions, Where New Nodes Agree With Old Nodes #6415
Conversation
Generate changelog in
|
…sdb into jkong/servers-and-the-pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good -- we can still get hit by this bug if a new hosts join, and we restart all nodes at the same time exactly when the new hosts join. This is not a realistic edge-case, so I think we can ignore that. Left some style nits/preferences, but obviously feel free to ignore!
Optional<ConsistentClusterTopology> maybeTopology = maybeGetConsistentClusterTopology( | ||
newServersWithoutSoftFailures) | ||
.agreedTopology(); | ||
if (maybeTopology.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can still use Optional::map
here if you'd like :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of even Optional.ifPresent(pastConsistentTopology::set); return mabeTopology.map(topology -> Sets.difference(newServersWithoutSoftFailures.keySet(), topology.serversInConsensus())).orElseGet(newServersWithoutSoftFailures::keySet);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the second version is good. I start to find these can get very hard to read especially if pipelines have side-effects, but declaring that upfront is fine.
return newServersWithoutSoftFailures.keySet(); | ||
} | ||
ConsistentClusterTopology newNodesAgreedTopology = maybeTopology.get(); | ||
if (!newNodesAgreedTopology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gut check: will ImmutableSet compare each element regardless of order? If we really want to make this future proof, I guess we could do Sets.difference
instead? Totally up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - I assume your concern here is that two sets that have the same elements but different order might end up not-being-equal, but equals()
is actually defined to avoid this kind of issue:
Compares the specified object with this set for equality. Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, TIL!
} | ||
|
||
@Value.Immutable | ||
public interface ClusterTopologyResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had something similar in a previous version of this PR, where we instead had ConsistentClusterTopology
have the type, and provided the static constructors there.
Map<CassandraServer, CassandraClientPoolingContainer> newHosts = EntryStream.of(allHosts) | ||
.filterKeys(key -> NEW_HOSTS.contains(key.cassandraHostName())) | ||
.toMap(); | ||
Set<CassandraServer> oldCassandraServers = filterServers(allHosts, OLD_HOSTS::contains); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can just do oldHosts.keySet()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right! yeah I kind of rushed the tests a bit, admittedly
Set<CassandraServer> oldCassandraServers = filterServers(allHosts, OLD_HOSTS::contains); | ||
Set<CassandraServer> newCassandraServers = filterServers(allHosts, NEW_HOSTS::contains); | ||
Set<String> hostsOffline = ImmutableSet.of(OLD_HOST_ONE); | ||
setHostIds(filterContainers(allHosts, hostsOffline::contains), HostIdResult.hardFailure()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove filterContainers you could just do oldHosts.get(OLD_HOST_ONE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly OLD_HOST_ONE
is a String though and the map is keyed on CassandraServer
, so not sure that'll work :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh yes that's right, this makes sense then!
Set<String> hostsOffline = ImmutableSet.of(OLD_HOST_ONE); | ||
setHostIds(filterContainers(allHosts, hostsOffline::contains), HostIdResult.hardFailure()); | ||
|
||
setHostIds(filterContainers(newHosts, server -> !hostsOffline.contains(server)), HostIdResult.success(UUIDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah fair you need it here! you can also do Predicate.not(hostsOffline::contains)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Yeah makes sense
|
||
public CassandraTopologyValidator(CassandraTopologyValidationMetrics metrics) { | ||
this.metrics = metrics; | ||
this.pastConsistentTopology = new AtomicReference<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the comment on line 59 indicating the change in our algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can still get hit by this bug if a new hosts join, and we restart all nodes at the same time exactly when the new hosts join.
Yeah that's unfortunate: don't have a good way around that :( thanks for the review!
return newServersWithoutSoftFailures.keySet(); | ||
} | ||
ConsistentClusterTopology newNodesAgreedTopology = maybeTopology.get(); | ||
if (!newNodesAgreedTopology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - I assume your concern here is that two sets that have the same elements but different order might end up not-being-equal, but equals()
is actually defined to avoid this kind of issue:
Compares the specified object with this set for equality. Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.
Optional<ConsistentClusterTopology> maybeTopology = maybeGetConsistentClusterTopology( | ||
newServersWithoutSoftFailures) | ||
.agreedTopology(); | ||
if (maybeTopology.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the second version is good. I start to find these can get very hard to read especially if pipelines have side-effects, but declaring that upfront is fine.
Map<CassandraServer, CassandraClientPoolingContainer> newHosts = EntryStream.of(allHosts) | ||
.filterKeys(key -> NEW_HOSTS.contains(key.cassandraHostName())) | ||
.toMap(); | ||
Set<CassandraServer> oldCassandraServers = filterServers(allHosts, OLD_HOSTS::contains); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right! yeah I kind of rushed the tests a bit, admittedly
Set<CassandraServer> oldCassandraServers = filterServers(allHosts, OLD_HOSTS::contains); | ||
Set<CassandraServer> newCassandraServers = filterServers(allHosts, NEW_HOSTS::contains); | ||
Set<String> hostsOffline = ImmutableSet.of(OLD_HOST_ONE); | ||
setHostIds(filterContainers(allHosts, hostsOffline::contains), HostIdResult.hardFailure()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoyingly OLD_HOST_ONE
is a String though and the map is keyed on CassandraServer
, so not sure that'll work :(
Set<String> hostsOffline = ImmutableSet.of(OLD_HOST_ONE); | ||
setHostIds(filterContainers(allHosts, hostsOffline::contains), HostIdResult.hardFailure()); | ||
|
||
setHostIds(filterContainers(newHosts, server -> !hostsOffline.contains(server)), HostIdResult.success(UUIDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Yeah makes sense
Took another look, no more comments, lgtm! |
Failed to push a commit onto develop to move @unreleased changelogs
|
General
Before this PR:
If a Cassandra cluster [A, B, C] changes its IPs to [D, E, F] within two minutes / the Cassandra client pool refresh interval, it is possible that the cluster will enter a stuck state where it is not possible to recover without bouncing the service. This is because the cluster correctly re-discovers [D, E, F] from configuration, but the host validation check we have will reject this change because there is no quorum among the cluster (we have three topology proposals and three unreachables).
After this PR:
==COMMIT_MSG==
In the event there is no quorum, we check the new nodes for their perspective of what the cluster should look like. If new nodes are in agreement and their agreement matches the agreed-upon host IDs that the old nodes agreed on, we consider them to be consistent.
==COMMIT_MSG==
Priority: P2 - there is a temporary workaround for the immediate burning issue of the day around preventing fast rolls, but this is pretty high still.
Concerns / possible downsides (what feedback would you like?):
Is documentation needed?: No
Compatibility
Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?: No
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?: No
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.): Yes - Cassandra client pooling logic is in-memory on each node.
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?: Not really?
Does this PR need a schema migration? No
Testing and Correctness
What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?: Not much
What was existing testing like? What have you done to improve it?: Added unit tests for relevant cases.
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.: No complex asynchronous code here. The topology validator is called from a single-threaded executor.
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?: It doesn't?
Execution
How would I tell this PR works in production? (Metrics, logs, etc.): We can recover from an ABC -> DEF change; also, there's plenty of logging.
Has the safety of all log arguments been decided correctly?: Yes, I think so.
Will this change significantly affect our spending on metrics or logs?: No
How would I tell that this PR does not work in production? (monitors, etc.): ABC -> DEF is still bad
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?: Rollback
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC): -
Scale
Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.: No
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?: No
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?: No
Development Process
Where should we start reviewing?: CassandraTopologyValidator
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?: No
Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju