Watcher: Changing .watches index primary/replica configuration may end up with non distributed watch execution #32636

spinscale · 2018-08-06T07:57:01Z

If you configure the .watches index to have a certain primary replica configuration, the distributed execution of the watches only happens on one shard copy.

Note: This is not the default configuration in requires some steps to actually archive (as the indexing template cannot be easily overwritten for example).

The basic premise is, that the number of shards (primaries) is equal to the number of replicas minus one. For example

2 primaries, 1 replica: watches get executed on two instead of four shards
3 primaries, 2 replicas: watches get executed on three instead of six shards
4 primaries, 3 replicas: watches get executed on 4 instead of 12 shards.

Explanation

The explanation for this requires to understand how watcher decides if a watch should be executed on a replica or on the primary shard. This is the logic to decide where a document should be indexed (not watcher specific).

By hashing the id and run a modulo operation through the number of shards we decide which primary shard to index, in this example shard zero.

If we now have the same number of copies plus the primary shard than the number of primary shards, we use the same hashing function to find out where a watch should be executed. This will result in the exact same calculation, which will return the same result for any document (otherwise it would not have ended up in that shard bucket).

We cannot tell which shard (primary or replica) will be used for execution (as we use the allocation ids to find out the order), but it is only going to be one in this particular case.

An easy workaround for now is to not have the same number of shards as replicas + 1.

This is a unit test to reproduce in 6.x/master (added to the WatcherIndexingListenerTests)

    public void testDistributionIssue() throws Exception {
        int numberOfShards = 2;
        int numberOfDocuments = 100;

        Map<ShardId, BitSet> perShardBitSet = new HashMap<>();
        IndexMetaData indexMetaData = createIndexBuilder(Watch.INDEX, numberOfShards, numberOfShards - 1).build();

        for (int currentShardId = 0; currentShardId < numberOfShards; currentShardId++) {
            ShardAllocationConfiguration sac = new ShardAllocationConfiguration(currentShardId,
                numberOfShards, Collections.emptyList());
            ShardId shardId = new ShardId(new Index(Watch.INDEX, "uuid"), currentShardId);
            perShardBitSet.computeIfAbsent(shardId, (s) -> new BitSet(numberOfDocuments));

            for (int i = 0; i < numberOfDocuments; i++) {
                String id = "watch_" + i;
                // only care for shard 0, as we have the configuration for primaries and replicas in there, which we are interested in
                if (OperationRouting.generateShardId(indexMetaData, id, null) != 0) {
                    continue;
                }

                boolean shouldBeTriggered = sac.shouldBeTriggered(id);
                if (shouldBeTriggered) {
                    perShardBitSet.get(shardId).set(i);
                }
            }
        }

        String message = String.format(Locale.ROOT, "cardinalities [%s]", perShardBitSet);
        for (BitSet shardBitSet : perShardBitSet.values()) {
            assertThat(message, shardBitSet.cardinality(), greaterThan(0));
        }
    }

A possible idea of fixing this might be to change the logic of decision and use something more than the id of watch (just adding a prefix or reversing the string might be enough for the hash function to return a different result). In order to remain BWC we could check if all nodes in the clsuter have surpassed a certain version and only then use the new logic.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-08-06T07:57:02Z

Pinging @elastic/es-core-infra

cbismuth · 2018-09-13T08:13:45Z

Looks really interesting 👍 I'll work on this one and let you know.

cbismuth · 2018-09-13T14:23:27Z

@spinscale I've added your test case to my fork and tweaked it a tiny bit to be at ease with variable names.

The fundamental change I've made is at line:

https://github.com/cbismuth/elasticsearch/blob/c53bf09cd54481cf460162440db1c15e1c3b2329/x-pack/plugin/watcher/src/test/java/org/elasticsearch/xpack/watcher/WatcherIndexingListenerTests.java#L745

Here I select the generated shard identifier, whereas you picked up the current shard identifier.

Could you please tell me why did you picked up the current shard identifier bitset?

cbismuth · 2018-09-14T07:49:39Z

I've referenced the wrong issue in the Git log message in my previous commit, sorry for that noise.

spinscale · 2018-09-14T10:03:42Z

your variant is fine as well. However with randomization there might be cases where shards will simply not contain any watches and its fine (i.e. 10 shards and 10 documents, there is no guaranteee of even distribution by design)

cbismuth · 2018-09-14T12:03:59Z

Got it, thank you, I wasn't aware of this not-always-even document distribution among shards. I've removed randomization this avoid this corner case in the unit test.

I may have missed a point, but my main issue now is that this test doesn't fail, should it?

dakrone · 2024-05-08T21:19:45Z

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

spinscale added >bug help wanted adoptme :Data Management/Watcher labels Aug 6, 2018

spinscale mentioned this issue Oct 17, 2018

Watcher: Possible areas of future work #34546

Closed

rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020

dakrone closed this as not planned Won't fix, can't repro, duplicate, stale May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watcher: Changing .watches index primary/replica configuration may end up with non distributed watch execution #32636

Watcher: Changing .watches index primary/replica configuration may end up with non distributed watch execution #32636

spinscale commented Aug 6, 2018 •

edited

Loading

elasticmachine commented Aug 6, 2018

cbismuth commented Sep 13, 2018

cbismuth commented Sep 13, 2018 •

edited

Loading

cbismuth commented Sep 14, 2018 •

edited

Loading

spinscale commented Sep 14, 2018

cbismuth commented Sep 14, 2018

dakrone commented May 8, 2024

Watcher: Changing .watches index primary/replica configuration may end up with non distributed watch execution #32636

Watcher: Changing .watches index primary/replica configuration may end up with non distributed watch execution #32636

Comments

spinscale commented Aug 6, 2018 • edited Loading

Explanation

elasticmachine commented Aug 6, 2018

cbismuth commented Sep 13, 2018

cbismuth commented Sep 13, 2018 • edited Loading

cbismuth commented Sep 14, 2018 • edited Loading

spinscale commented Sep 14, 2018

cbismuth commented Sep 14, 2018

dakrone commented May 8, 2024

spinscale commented Aug 6, 2018 •

edited

Loading

cbismuth commented Sep 13, 2018 •

edited

Loading

cbismuth commented Sep 14, 2018 •

edited

Loading