Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation #66237

original-brownbear · 2020-12-13T22:29:18Z

Searchable snapshot allocator that reaches out to all data nodes to get the cached size of for a shard, similar to how it's done for normal shard Stores but simpler since we only care about the exact byte size for now, are not injecting the size into disk threshold allocators and leave out a few more tricks (see TODOs) that we do for normal allocation.

Obvious short-term TODO left here is more tests.

…sting-data-lives

original-brownbear · 2020-12-15T09:21:15Z

...java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocationIntegTests.java

+import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertAcked;
+
+@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0)
+public class SearchableSnapshotAllocationIntegTests extends BaseSearchableSnapshotsIntegTestCase {


Obviously this could use a few more tests (especially around various mixes of multiple shards in the cache and exception handling), I'm happy to add those in a follow-up. I think today it's tricky to get exhaustive testing up and running though + it makes this even longer to review.

original-brownbear · 2020-12-15T09:25:06Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocator.java

@@ -90,8 +150,60 @@ private AllocateUnassignedDecision decideAllocation(RoutingAllocation allocation
            return AllocateUnassignedDecision.no(UnassignedInfo.AllocationStatus.FETCHING_SHARD_DATA, null);
        }

-        // let BalancedShardsAllocator take care of allocating this shard
-        // TODO: once we have persistent cache, choose a node that has existing data
+        final AsyncShardFetch.FetchResult<NodeCacheFilesMetadata> fetchedCacheData = fetchData(shardRouting, allocation);


The whole implementation here is intentionally kept close to how replica and primary shard allocators work today code-wise. I think there's lots of room for drying things up here (especially if/when we want to tackle the enhancement TODOs in this that would duplicate a lot of logic in those allocators also) this way compared to going for the shortest+most specific possible implementation.

original-brownbear · 2020-12-15T09:26:32Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocator.java

+            return fetchingDataNodes.size() > 0 ? null : Map.copyOf(data);
+        }
+
+        synchronized int numberOfInFlightFetches() {


There might be better (as in more efficient) ways (certainly are) to build the synchronization here but I decided to keep it as simple as possible in the interest of saving some time today.

original-brownbear · 2020-12-15T09:28:27Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

@@ -364,4 +370,19 @@ protected XPackLicenseState getLicenseState() {
                CACHE_PREWARMING_THREAD_POOL_SETTING
            ) };
    }
+
+    public static final class CacheServiceSupplier implements Supplier<CacheService> {


Best solution I could think of for passing the cache to the transport action (which is instantiated on master and data nodes but only ever handles the fan-out request on data nodes) on master- and data-nodes without instantiating the cache on master as well.

original-brownbear · 2020-12-15T09:34:34Z

.../test/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocatorTests.java

+
+public class SearchableSnapshotAllocatorTests extends ESAllocationTestCase {
+
+    public void testAllocateToNodeWithLargestCache() {


Just like the IT lots of tests that could still be added here obviously that I'd push to a follow-up.

elasticmachine · 2020-12-15T09:42:33Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

LGTM, but I would like @tlrx to also have a look.

About the disk decider, I think that is more or less a separate concern. But I do wonder if we need a separate PR to ensure that the reserved size returned after a node restart is correct. But not in this PR.

henningandersen · 2020-12-15T10:13:41Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocator.java

+        // TODO: in the following logic, we do not account for existing cache size when handling disk space checks, should and can we
+        // reliably do this in a world of concurrent cache evictions or are we ok with the cache size just being a best effort hint
+        // here?
+        Tuple<Decision, Map<String, NodeAllocationResult>> result = canBeAllocatedToAtLeastOneNode(shardRouting, allocation);


I think a part of the purpose here is to not fetch data unnecessarily and that this should go before fetchData above? Looks like that is the case in ReplicaShardAllocator too.

++ sorry about that oversight

henningandersen · 2020-12-15T10:37:18Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocator.java

+        );
+        final DiscoveryNodes nodes = allocation.nodes();
+        final AsyncCacheStatusFetch asyncFetch = asyncFetchStore.computeIfAbsent(shardId, sid -> new AsyncCacheStatusFetch());
+        final DiscoveryNode[] dataNodes = asyncFetch.addFetches(nodes.getDataNodes().values().toArray(DiscoveryNode.class));


I am curious why we are not using the existing AsyncShardFetch for this? No need to change anything, I did not spot any issues, so purely a question to figure out if we need a follow-up later.

Yea we certainly could now. An earlier version of this worked a little differently and that made using the existing AsyncShardFetch not a great fit, but with the way it works now I think we can simplify this in a follow-up for sure.

henningandersen · 2020-12-15T10:57:13Z

...java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocationIntegTests.java

+        ensureGreen(restoredIndex);
+        internalCluster().startDataOnlyNodes(randomIntBetween(1, 4));
+
+        setAllocation(EnableAllocationDecider.Allocation.NONE);


Can we randomly use NEW_PRIMARIES too?

henningandersen · 2020-12-15T11:10:18Z

.../test/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocatorTests.java

+                        )
+                    )
+                    .numberOfShards(1)
+                    .numberOfReplicas(0)


Let us follow-up with either a test or an integration test validation that this all works for replicas too.

.../test/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocatorTests.java

…sting-data-lives

tlrx

I left few comments so that you can move forward. I'm still digesting the changes in the allocator.

tlrx · 2020-12-15T12:03:30Z

...s/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshotAllocator.java

+    private AsyncShardFetch.FetchResult<NodeCacheFilesMetadata> fetchData(ShardRouting shard, RoutingAllocation allocation) {
+        final ShardId shardId = shard.shardId();
+        final Settings indexSettings = allocation.metadata().index(shard.index()).getSettings();
+        final SnapshotId snapshotId = new SnapshotId(


should we assert SearchableSnapshotsConstants.isSearchableSnapshotStore(indexSettings), just in case?

tlrx · 2020-12-15T12:13:15Z

...-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/CacheService.java

+     * @return number of bytes cached
+     */
+    public long getCachedSize(ShardId shardId, SnapshotId snapshotId) {
+        return persistentCache.getCacheSize(shardId, snapshotId);


I think we should check that the shard does not belong to a shard that has just been deleted (see ShardEviction and evictedShards) but that would also mean to transport the snapshot index name

tlrx

LGTM. I left another comment. I'm not 100% confident in the allocator changes but I wasn't able to spot any obvious issue.

...apshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/PersistentCache.java

…sting-data-lives

original-brownbear · 2020-12-15T17:10:27Z

Thanks so much Tanguy + Henning ! I'll have to push a few of the points here to a follow-up for a lack of time today so I can get the backport in today. There were some unexpected hiccups with a few, I'll open a follow-up tomorrow fist thing.

…d Allocation (elastic#66237) Searchable snapshot allocator that reaches out to all data nodes to get the cached size of for a shard, similar to how it's done for normal shard `Store`s but simpler since we only care about the exact byte size for now, are not injecting the size into disk threshold allocators and leave out a few more tricks (see TODOs) that we do for normal allocation.

…d Allocation (#66237) (#66383) Searchable snapshot allocator that reaches out to all data nodes to get the cached size of for a shard, similar to how it's done for normal shard `Store`s but simpler since we only care about the exact byte size for now, are not injecting the size into disk threshold allocators and leave out a few more tricks (see TODOs) that we do for normal allocation.

original-brownbear added 30 commits December 8, 2020 15:29

transport action

894098c

add transport action

9ee535a

stats

524bd03

bck

37f653a

nicer

b5c5075

bck

33d3e4c

return something

e6ad935

add transport action fully

6cda863

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

17a7336

…sting-data-lives

works better

6c39efa

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

d117d88

…sting-data-lives

make cache service available

e79d769

bck

5deb5e1

and it passes still

c38d5a4

drop redundant stuff

fe9260a

bck

332535d

bcl

579486c

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

c239fd8

…sting-data-lives

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

1de7aec

…sting-data-lives

bck

ad9509d

adjustments

524ac52

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

da5c7a7

…sting-data-lives

bck

f46e2b4

worksish

49627b3

bck

a6a10e5

fix

32c03ae

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

45a4e20

…sting-data-lives

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

93ff0cf

…sting-data-lives

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

e69d281

…sting-data-lives

better test

4aece5c

original-brownbear added 4 commits December 15, 2020 10:14

efficient fetch

45ebb9e

cleanup

d764e3d

cleanup

2922b22

shorter

f3e2ba1

original-brownbear commented Dec 15, 2020

View reviewed changes

original-brownbear marked this pull request as ready for review December 15, 2020 09:42

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Dec 15, 2020

original-brownbear requested review from henningandersen and tlrx December 15, 2020 09:43

original-brownbear added v7.11.0 v8.0.0 and removed WIP labels Dec 15, 2020

henningandersen approved these changes Dec 15, 2020

View reviewed changes

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

7dd058d

…sting-data-lives

tlrx reviewed Dec 15, 2020

View reviewed changes

tlrx approved these changes Dec 15, 2020

View reviewed changes

...apshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/cache/PersistentCache.java Outdated Show resolved Hide resolved

original-brownbear added 3 commits December 15, 2020 17:03

CR; comments

f391a10

Merge remote-tracking branch 'elastic/master' into allocate-where-exi…

8f8b492

…sting-data-lives

fixes

033596e

original-brownbear merged commit 7caa471 into elastic:master Dec 15, 2020

original-brownbear deleted the allocate-where-existing-data-lives branch December 15, 2020 17:10

original-brownbear mentioned this pull request Dec 15, 2020

Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation (#66237) #66383

Merged

original-brownbear restored the allocate-where-existing-data-lives branch January 4, 2021 01:11

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation #66237

Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation #66237

original-brownbear commented Dec 13, 2020 •

edited

Loading

original-brownbear Dec 15, 2020

original-brownbear Dec 15, 2020

original-brownbear Dec 15, 2020

original-brownbear Dec 15, 2020

original-brownbear Dec 15, 2020

elasticmachine commented Dec 15, 2020

henningandersen left a comment

henningandersen Dec 15, 2020

original-brownbear Dec 15, 2020

henningandersen Dec 15, 2020

original-brownbear Dec 15, 2020

henningandersen Dec 15, 2020

henningandersen Dec 15, 2020

tlrx left a comment

tlrx Dec 15, 2020

tlrx Dec 15, 2020

tlrx left a comment

original-brownbear commented Dec 15, 2020


		public class SearchableSnapshotAllocatorTests extends ESAllocationTestCase {

		public void testAllocateToNodeWithLargestCache() {

Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation #66237

Use CacheService Persisted Cache Size during Searchable Snapshot Shard Allocation #66237

Conversation

original-brownbear commented Dec 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticmachine commented Dec 15, 2020

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

original-brownbear commented Dec 15, 2020

original-brownbear commented Dec 13, 2020 •

edited

Loading