-
Notifications
You must be signed in to change notification settings - Fork 36
add test evaluating anomaly results #13
add test evaluating anomaly results #13
Conversation
private void verifyAnomaly(String datasetName, int intervalMinutes, int trainTestSplit, int shingleSize, | ||
double minPrecision, double minRecall, double maxError) throws Exception { | ||
|
||
RestClient client = client(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many nodes do we have for this cluster? We want both single node and multi-node cluster to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default setting is a single node. I will add a multi-node test case in a separate pr.
public class DetectionResultEvalutationIT extends ESRestTestCase { | ||
|
||
public void testDataset() throws Exception { | ||
verifyAnomaly("synthetic", 1, 1500, 8, .9, .9, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does the test take to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about a minute
Map<String, Object> response = entityAsMap(client.performRequest(request)); | ||
double anomalyGrade = (double)response.get("anomalyGrade"); | ||
if (anomalyGrade > 0) { | ||
System.out.println("LLL," + begin + "," + anomalyGrade); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed in the update
double precision = positives > 0 ? truePositives / positives : 1; | ||
assertTrue(precision >= minPrecision); | ||
|
||
double recall = anomalies.size() > 0 ? positiveAnomalies / anomalies.size() : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to explain how you compute precision and recall? It's not easy to understand as precision is computed based on points and recall is based on windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments showing the definitions in the update
+ " \"Feature1\": { \"type\": \"double\" }, \"Feature2\": { \"type\": \"double\" } } } }"; | ||
request.setJsonEntity(requestBody); | ||
client.performRequest(request); | ||
Thread.sleep(1_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that sometimes sleeping 1 second is not enough? Have you tried to run the test multiple times (say 100 times)? If yes, would it always pass? I am afraid concurrency and inter-process communication(if you are running the test in a simulated multi-node cluster) would sometimes cause the test to fail.
Also, would https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run it for many times (certainly more than 30 times and less than 100 times) without failure. And it passes github checks multiple times. From my experience, it has been more reliable than some randomized unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. A few minor suggestions.
for (int i = trainTestSplit; i < data.size(); i++) { | ||
Instant begin = Instant.from(DateTimeFormatter.ISO_INSTANT.parse(data.get(i).get("timestamp").getAsString())); | ||
Instant end = begin.plus(intervalMinutes, ChronoUnit.MINUTES); | ||
String requestBody = String.format("{ \"period_start\": %d, \"period_end\": %d }", begin.toEpochMilli(), end.toEpochMilli()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove duplicated code, it may be worth adding a function that runs a detector given a detectorID, period_start, and period end and returns the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e.printStackTrace(); | ||
} | ||
} | ||
return new double[] {positives, truePositives, positiveAnomalies.size(), errors}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will truePositives and positiveAnomalies.size() always be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because truePositives is a count of correct data points. positiveAnomalies is a count of correctly found anomaly windows. The former is no less than the latter.
} catch (Exception e ) { | ||
throw new RuntimeException(e); | ||
} }); | ||
Thread.sleep(1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 1_000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This change adds an anomaly detection test running on rest api and a synthetic dataset with known anomalies to verify the machine learning algorithms in the anomaly detection core along with some basic functionality such as creating detectors.