Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Commit

Permalink
Adding UT for FieldDataCacheRca
Browse files Browse the repository at this point in the history
  • Loading branch information
khushbr committed Jul 6, 2020
1 parent ebdbc7e commit cdec05b
Show file tree
Hide file tree
Showing 6 changed files with 192 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,21 @@ import java.util.concurrent.TimeUnit;
* healthy. The dimension we are using for this analysis is cache eviction count.
* Note : For Field Data Cache, Hit and Miss metrics aren't available.
*
* Cache eviction within Elastisearch happens in following scenarios :
* 1. Mutation to Cache (Entry Insertion/Promotion and Manual Invalidation)
* 2. Explicit call to refresh()
* <p>Cache eviction within Elasticsearch happens in following scenarios :
* <ol>
* <li>Mutation to Cache (Entry Insertion/Promotion and Manual Invalidation)
* <li>Explicit call to refresh()
* </ol>
*
* The Cache Eviction requires that either the cache weight exceeds OR the entry TTL is expired.
* <p>The Cache Eviction requires that either the cache weight exceeds OR the entry TTL is expired.
* For Field Data Cache, no expire setting is present, so only in case of cache_weight exceeding the
* max_cache_weight, eviction(removal from Cache Map and LRU linked List, entry updated to EVICTED)
* happens.
*
* The Entry Invalidation is performed manually on cache clear() and index close() invocation, with
* <p>The Entry Invalidation is performed manually on cache clear() and index close() invocation, with
* removalReason as INVALIDATED and a force eviction is performed to ensure cleanup.
*
* This RCA reads 'fieldDataCacheEvictions' from upstream metrics and maintains a collector which
* <p>This RCA reads 'fieldDataCacheEvictions' from upstream metrics and maintains a collector which
* keeps track of the time window period(tp) where we repeatedly see evictions for the last tp duration.
* This RCA is marked as unhealthy if tp we find tp is above the threshold(300 seconds).
*/
Expand Down Expand Up @@ -151,7 +153,7 @@ public class FieldDataCacheRca extends Rca<ResourceFlowUnit<HotNodeSummary>> {
}

for (Record record : flowUnit.getData()) {
double evictionCount = record.getValue(MetricsDB.SUM, Double.class);
double evictionCount = record.getValue(MetricsDB.MAX, Double.class);
if (!Double.isNaN(evictionCount)) {
if (evictionCount > 0) {
if (!hasEvictions) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,25 +45,27 @@
* Shard Request Cache RCA is to identify when the cache is unhealthy(thrashing) and otherwise,
* healthy. The dimension we are using for this analysis is cache eviction and hit count.
*
* Cache eviction within Elastisearch happens in following scenarios :
* 1. Mutation to Cache (Entry Insertion/Promotion and Manual Invalidation)
* 2. Explicit call to refresh()
* <p>Cache eviction within Elasticsearch happens in following scenarios:
* <ol>
* <li> Mutation to Cache (Entry Insertion/Promotion and Manual Invalidation)
* <li> Explicit call to refresh()
* </ol>
*
* The Cache Eviction requires that either the cache weight exceeds OR the entry TTL is expired.
* <p>The Cache Eviction requires that either the cache weight exceeds OR the entry TTL is expired.
* For Shard Request Cache, TTL is defined via `indices.requests.cache.expire` setting which is never used
* in production clusters and only provided for backward compatibility, so we will ignore time based evictions.
* The weight based evictions(removal from Cache Map and LRU linked List, entry updated to EVICTED) occur when
* the cache_weight exceeds the max_cache_weight, eviction.
*
* The Entry Invalidation is performed manually on cache clear(), index close() invocation and for cached results
* <p>The Entry Invalidation is performed manually on cache clear(), index close() invocation and for cached results
* from timed-out requests. A scheduled runnable, running every 10 minutes cleans up all the invalidated entries
* which have not been read/written to since invalidation.
*
* The Cache Hit and Eviction metric presence implies cache is undergoing frequent load and eviction or undergoing
* <p>The Cache Hit and Eviction metric presence implies cache is undergoing frequent load and eviction or undergoing
* scheduled cleanup for entries which had timed-out during execution. The 300 second time window threshold ensures
* the later is taken care of.
*
* This RCA reads 'shardRequestCacheEvictions' and 'shardRequestCacheHits' from upstream metrics and maintains
* <p>This RCA reads 'shardRequestCacheEvictions' and 'shardRequestCacheHits' from upstream metrics and maintains
* collectors which keeps track of the time window period(tp) where we repeatedly see evictions and hits for the
* last tp duration. This RCA is marked as unhealthy if tp we find tp is above the threshold(300 seconds).
*/
Expand Down Expand Up @@ -166,7 +168,7 @@ public void collect(final long currTimestamp) {
}

for (Record record : flowUnit.getData()) {
double evictionCount = record.getValue(MetricsDB.SUM, Double.class);
double evictionCount = record.getValue(MetricsDB.MAX, Double.class);
if (!Double.isNaN(evictionCount)) {
if (evictionCount > 0) {
if (!hasMetric) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.store.rca.cluster;

import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.Rca;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.flow_units.ResourceFlowUnit;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.summaries.HotNodeSummary;

public class FieldDataCacheClusterRca extends BaseClusterRca {
public static final String RCA_TABLE_NAME = FieldDataCacheClusterRca.class.getSimpleName();

public <R extends Rca<ResourceFlowUnit<HotNodeSummary>>> FieldDataCacheClusterRca(
final int rcaPeriod, final R hotNodeRca) {
super(rcaPeriod, hotNodeRca);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.store.rca.cluster;

import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.Rca;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.flow_units.ResourceFlowUnit;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.summaries.HotNodeSummary;

public class ShardRequestCacheClusterRca extends BaseClusterRca {
public static final String RCA_TABLE_NAME = ShardRequestCacheClusterRca.class.getSimpleName();

public <R extends Rca<ResourceFlowUnit<HotNodeSummary>>> ShardRequestCacheClusterRca(
final int rcaPeriod, final R hotNodeRca) {
super(rcaPeriod, hotNodeRca);
}
}
3 changes: 0 additions & 3 deletions src/main/proto/inter_node_rpc_service.proto
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,10 @@ enum MetricEnum {

// threadpool
QUEUE_REJECTION = 6 [(additional_fields).name = "queue rejection", (additional_fields).description = "rejection period in second"];
<<<<<<< HEAD

// cache
CACHE_EVICTION = 7 [(additional_fields).name = "cache eviction", (additional_fields).description = "cache eviction count"];
CACHE_HIT = 8 [(additional_fields).name = "cache hit", (additional_fields).description = "cache hit count"];
=======
>>>>>>> master
}

/*
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
/*
* Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

package com.amazon.opendistro.elasticsearch.performanceanalyzer.store.rca.cache;

import static com.amazon.opendistro.elasticsearch.performanceanalyzer.metrics.AllMetrics.ShardStatsDerivedDimension.INDEX_NAME;
import static com.amazon.opendistro.elasticsearch.performanceanalyzer.metrics.AllMetrics.ShardStatsDerivedDimension.SHARD_ID;
import static java.time.Instant.ofEpochMilli;

import com.amazon.opendistro.elasticsearch.performanceanalyzer.metricsdb.MetricsDB;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.GradleTaskForRca;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.flow_units.ResourceFlowUnit;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.metrics.MetricTestHelper;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.summaries.HotNodeSummary;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.summaries.HotResourceSummary;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.framework.api.summaries.ResourceUtil;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.rca.store.rca.cache.FieldDataCacheRca;
import com.amazon.opendistro.elasticsearch.performanceanalyzer.reader.ClusterDetailsEventProcessorTestHelper;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import org.junit.experimental.categories.Category;

import java.time.Clock;
import java.time.Duration;
import java.time.ZoneId;
import java.util.Arrays;
import java.util.List;

@Category(GradleTaskForRca.class)
public class FieldDataCacheRcaTest {

private MetricTestHelper fieldDataCacheEvictions;
private FieldDataCacheRca fieldDataCacheRca;
private List<String> columnName;

@Before
public void init() throws Exception {
fieldDataCacheEvictions = new MetricTestHelper(5);
fieldDataCacheRca = new FieldDataCacheRca(1, fieldDataCacheEvictions);
columnName = Arrays.asList(INDEX_NAME.toString(), SHARD_ID.toString(), MetricsDB.MAX);
ClusterDetailsEventProcessorTestHelper clusterDetailsEventProcessorTestHelper = new ClusterDetailsEventProcessorTestHelper();
clusterDetailsEventProcessorTestHelper.addNodeDetails("node1", "127.0.0.0", false);
clusterDetailsEventProcessorTestHelper.generateClusterDetailsEvent();
}

/**
* generate flowunit and bind the flowunit to metrics, sample record:
*
* | IndexName | ShardID | SUM | AVG | MIN | MAX |
* -----------------------------------------------
* | .kibana_1 | 0 | 15.0 | 8.0 | 2.0 | 9.0 |
*
*/
private void mockFlowUnits(int cacheEvictionCnt) {
fieldDataCacheEvictions.createTestFlowUnits(columnName,
Arrays.asList("index_1", "0", String.valueOf(cacheEvictionCnt)));
}

@Test
public void testFieldDataCache() {
ResourceFlowUnit<HotNodeSummary> flowUnit;
Clock constantClock = Clock.fixed(ofEpochMilli(0), ZoneId.systemDefault());

mockFlowUnits(0);
fieldDataCacheRca.setClock(constantClock);
flowUnit = fieldDataCacheRca.operate();
Assert.assertFalse(flowUnit.getResourceContext().isUnhealthy());

mockFlowUnits(0);
fieldDataCacheRca.setClock(Clock.offset(constantClock, Duration.ofMinutes(3)));
flowUnit = fieldDataCacheRca.operate();
Assert.assertFalse(flowUnit.getResourceContext().isUnhealthy());

mockFlowUnits(1);
fieldDataCacheRca.setClock(Clock.offset(constantClock, Duration.ofMinutes(4)));
flowUnit = fieldDataCacheRca.operate();
Assert.assertFalse(flowUnit.getResourceContext().isUnhealthy());

mockFlowUnits(1);
fieldDataCacheRca.setClock(Clock.offset(constantClock, Duration.ofMinutes(7)));
flowUnit = fieldDataCacheRca.operate();
Assert.assertFalse(flowUnit.getResourceContext().isUnhealthy());

mockFlowUnits(1);
fieldDataCacheRca.setClock(Clock.offset(constantClock, Duration.ofMinutes(10)));
flowUnit = fieldDataCacheRca.operate();
Assert.assertTrue(flowUnit.getResourceContext().isUnhealthy());

Assert.assertTrue(flowUnit.hasResourceSummary());
HotNodeSummary nodeSummary = flowUnit.getSummary();
Assert.assertEquals(1, nodeSummary.getNestedSummaryList().size());
Assert.assertEquals(1, nodeSummary.getHotResourceSummaryList().size());
HotResourceSummary resourceSummary = nodeSummary.getHotResourceSummaryList().get(0);
Assert.assertEquals(ResourceUtil.FIELD_DATA_CACHE_EVICTION, resourceSummary.getResource());
Assert.assertEquals(0.01, 6.0, resourceSummary.getValue());

mockFlowUnits(0);
fieldDataCacheRca.setClock(Clock.offset(constantClock, Duration.ofMinutes(12)));
flowUnit = fieldDataCacheRca.operate();
Assert.assertFalse(flowUnit.getResourceContext().isUnhealthy());
}
}

0 comments on commit cdec05b

Please sign in to comment.