Move caching of the store to IndexShard. #30817

jpountz · 2018-05-23T15:38:55Z

In spite of the existing caching, I have seen a number of nodes hot threads
where one thread had been spending all its cpu on computing the size of a
directory. I am proposing to move the caching of the store size to IndexShard
so that it has access to the existing logic regarding whether a shard is active
or not in order to be able to cache the store size more agressively.

The tricky bit is that an inactive shard might still be merged, which may have
a significant impact on the store size.

This should be especially useful for time-based data since most indices are
typically inactive.

In spite of the existing caching, I have seen a number of nodes hot threads where one thread had been spending all its cpu on computing the size of a directory. I am proposing to move the caching of the store size to `IndexShard` so that it has access to the existing logic regarding whether a shard is active or not in order to be able to cache the store size more agressively. The tricky bit is that an inactive shard might still be merged, which may have a significant impact on the store size. This should be especially useful for time-based data since most indices are typically inactive.

elasticmachine · 2018-05-23T15:38:58Z

Pinging @elastic/es-distributed

jpountz · 2018-05-23T15:39:07Z

WIP because of the lack of test.

s1monw

I like that this is much simpler yet, I think it's incorrect or has too many corner cases. like if a reader gets closed we don't refresh the stats since we don't see the deletes. It's not good enough. sorry for pushing down that route.

s1monw · 2018-05-24T07:53:44Z

server/src/main/java/org/elasticsearch/index/engine/ElasticsearchConcurrentMergeScheduler.java


    private final MeanMetric totalMerges = new MeanMetric();
    private final CounterMetric totalMergesNumDocs = new CounterMetric();
    private final CounterMetric totalMergesSizeInBytes = new CounterMetric();
-    private final CounterMetric currentMerges = new CounterMetric();
+    private final AtomicLong currentMerges = new AtomicLong();


why did this change?

s1monw · 2018-05-24T08:13:06Z

server/src/main/java/org/elasticsearch/index/engine/ElasticsearchConcurrentMergeScheduler.java

@@ -66,11 +69,14 @@
    private final Set<OnGoingMerge> onGoingMerges = ConcurrentCollections.newConcurrentSet();
    private final Set<OnGoingMerge> readOnlyOnGoingMerges = Collections.unmodifiableSet(onGoingMerges);
    private final MergeSchedulerConfig config;
+    private volatile long lastMergeMillis;


I don't think we need these changes. I guess it would be enough to check Engine#getMergeStats() and then do:

mergeStats.current > 0 || mergeStats.total != previousStats.total?

s1monw · 2018-05-24T08:21:24Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+            }
+
+            @Override
+            protected boolean needsRefresh() {


I am not sure why this is so complex, wouldn't it be enough to override needsRefresh()?

private MergeStats previousStats = new MergeStats(); if (super.needsRefresh()) { boolean refresh = false; if (isActive()) { MergeStats mergeStats = getEngine().getMergeStats(); refresh = mergeStats.current > 0 || mergeStats.total != previousStats.total; previousStats = mergeStats; } return refresh; }

s1monw · 2018-05-24T08:22:47Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+            if (active.get() == false) {
+                // We refresh when transitioning to an inactive state to make
+                // it easier to cache the store size.
+                refresh("transition to inactive");


-1 we should not refresh anything except of the internal reader. Visibility guarantees are important.

jpountz · 2018-05-30T14:42:16Z

Closed in favor of the original PR.

jpountz added >enhancement WIP :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. v7.0.0 v6.4.0 labels May 23, 2018

jpountz requested a review from s1monw May 23, 2018 15:38

jpountz mentioned this pull request May 23, 2018

Move caching of the size of a directory to StoreDirectory. #30581

Merged

iter

74410c7

s1monw reviewed May 24, 2018

View reviewed changes

jpountz closed this May 30, 2018

jpountz removed :Distributed Indexing/Store Issues around managing unopened Lucene indices. If it touches Store.java, this is a likely label. >enhancement WIP v6.4.0 v7.0.0 labels May 30, 2018

jpountz deleted the store_size_cache branch May 30, 2018 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move caching of the store to IndexShard. #30817

Move caching of the store to IndexShard. #30817

jpountz commented May 23, 2018

elasticmachine commented May 23, 2018

jpountz commented May 23, 2018

s1monw left a comment

s1monw May 24, 2018

s1monw May 24, 2018

s1monw May 24, 2018

s1monw May 24, 2018

jpountz commented May 30, 2018

Move caching of the store to IndexShard. #30817

Move caching of the store to IndexShard. #30817

Conversation

jpountz commented May 23, 2018

elasticmachine commented May 23, 2018

jpountz commented May 23, 2018

s1monw left a comment

Choose a reason for hiding this comment

s1monw May 24, 2018

Choose a reason for hiding this comment

s1monw May 24, 2018

Choose a reason for hiding this comment

s1monw May 24, 2018

Choose a reason for hiding this comment

s1monw May 24, 2018

Choose a reason for hiding this comment

jpountz commented May 30, 2018