-
Notifications
You must be signed in to change notification settings - Fork 15
[Timelock Corruption]: Final Wiring part 1 #5071
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just picked up some nits in a drive-by.
import java.util.List; | ||
|
||
public class CorruptionHealthCheck { | ||
private final List<CorruptionDetector> corruptionDetectors; | ||
private final LocalCorruptionDetector localCorruptionDetector; | ||
private final List<CorruptionDetector> localAndRemoteCorruptionDetectors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels a bit weird (although maybe it's an Atlas thing?) - that is, using a List
to represent two and exactly two distinct things here. I'd prefer if you just made a small Immutables to hold the local and remote, then you only need one field and can call something like corruptionDetectors.local()
(although naturally this doesn't lend itself trivially to your streams).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at LocalAndRemotes
? It's a bit more powerful than what you need here, but has a nice way of accessing all or specifically the local.
@@ -31,46 +32,63 @@ | |||
|
|||
private final ScheduledExecutorService executor = PTExecutors.newSingleThreadScheduledExecutor( | |||
new NamedThreadFactory(CORRUPTION_DETECTOR_THREAD_PREFIX, true)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary diff.
@@ -32,7 +32,7 @@ public JerseyCorruptionFilter(CorruptionHealthCheck healthCheck) { | |||
|
|||
@Override | |||
public void filter(ContainerRequestContext requestContext) { | |||
if (!healthCheck.isHealthy()) { | |||
if (healthCheck.shootTimeLock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a fan of shootTimelock
- it feels pretty ambiguous (particularly as this is a boolean!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct. Main things:
shootTimeLock()
looks like a command, I'd suggest making it clearer that it is a predicate (e.g.shouldRejectUserRequests()
)- I don't really like the findFirst() when identifying rogue clients, it seems we should just report all failing checks when multiple are unhappy.
@@ -17,5 +17,5 @@ | |||
package com.palantir.timelock.corruption.detection; | |||
|
|||
public interface CorruptionDetector { | |||
boolean hasDetectedCorruption(); | |||
boolean shootTimeLock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe shouldRejectRequests()
? The impression I get if I come across a method like this is that it actually makes the server stop requests and returns true if successful/false if not
import java.util.List; | ||
|
||
public class CorruptionHealthCheck { | ||
private final List<CorruptionDetector> corruptionDetectors; | ||
private final LocalCorruptionDetector localCorruptionDetector; | ||
private final List<CorruptionDetector> localAndRemoteCorruptionDetectors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at LocalAndRemotes
? It's a bit more powerful than what you need here, but has a nice way of accessing all or specifically the local.
HistoryAnalyzer::extractNamespaceAndUseCase, | ||
HistoryAnalyzer::violatedCorruptionChecksForNamespaceAndUseCase)); | ||
|
||
SetMultimap<CorruptionCheckViolation, NamespaceAndUseCase> entryEntrySetMultimap = KeyedStream.stream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name this based on what the map actually does: namespacesExhibitingViolations
or something like this.
@VisibleForTesting | ||
static List<CorruptionCheckViolation> violatedCorruptionChecksForNamespaceAndUseCase( | ||
static CorruptionCheckViolation violatedCorruptionChecksForNamespaceAndUseCase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should return List or some Collection given the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the issue is that (and I missed this while writing #5057) these checks are to be executed in an order, e.g. the check for acceptor quorum for a learned value does not hold value if the learners have diverged as we anyway do not have a learner value to compare accepted values against.
Function<CompletePaxosHistoryForNamespaceAndUseCase, CorruptionCheckViolation> divergedLearnedCheck = | ||
HistoryAnalyzer::divergedLearners; | ||
Function<CompletePaxosHistoryForNamespaceAndUseCase, CorruptionCheckViolation> learnedValueWithoutQuorum = | ||
HistoryAnalyzer::learnedValueWithoutQuorum; | ||
Function<CompletePaxosHistoryForNamespaceAndUseCase, CorruptionCheckViolation> greatestAcceptedValueNotLearned = | ||
HistoryAnalyzer::greatestAcceptedValueNotLearned; | ||
|
||
return Stream.of(divergedLearnedCheck, learnedValueWithoutQuorum, greatestAcceptedValueNotLearned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth extracting a constant for the set of checks
|
||
private void processLocalHealthReport() { | ||
localCorruptionState = getLocalCorruptionState(localCorruptionReport); | ||
if (localCorruptionState.shootTimeLock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as flagged elsewhere, this gets confusing, I think because it looks like a command, but it's not
assertThat(HistoryAnalyzer.corruptionHealthReportForHistory(historyForAll) | ||
.statusesToNamespaceAndUseCase() | ||
.size()) | ||
.isEqualTo(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Prefer isEmpty()
.isEqualTo(CorruptionCheckViolation.NONE); | ||
|
||
assertThat(HistoryAnalyzer.corruptionHealthReportForHistory(historyForAll) | ||
.statusesToNamespaceAndUseCase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we rename this violatingStatuses...
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed: yeah ordered checks make a lot of sense!
Released 0.257.0 |
Goals (and why):
Wire corruption components
Implementation Description (bullets):
Testing (What was existing testing like? What have you done to improve it?):
Modified existing tests
Concerns (what feedback would you like?):
Is it correct to initialize the PaxosHistoryProvider in PaxosResourceFactory?
Did I miss something?
Where should we start reviewing?:
Priority (whenever / two weeks / yesterday):
ASAP 🏃♀️