Replace health request with a state observer. #88641

idegtiarenko · 2022-07-20T11:05:36Z

This was using health request assuming yellow index is one with all primaries
initialized. This is not true as newly created index remain yellow when
primaries allocation is throttled. This was discovered while working on a new
shards allocator.

This was using health request assuming yellow index is one with all primaries initialized. This is not true as newly created index remain yellow when primaries allocation is trottled. This was discovered while working on a new shards allocator.

elasticsearchmachine · 2022-07-20T11:05:59Z

Hi @idegtiarenko, I've created a changelog YAML for you.

DaveCTurner

LGTM but I'd like Tim to take a look too.

DaveCTurner · 2022-07-20T11:39:49Z

Actually I think we should have a test that exposes this change. It looks to me like GetGlobalCheckpointsActionIT#testWaitOnPrimaryShardsReady tries to work around the problem we encountered by creating a red index and then moving it to yellow later. With this change it should be possible to remove that I think?

Tim-Brooks

The changes looks fine to me. Agree with David's comment.

elasticsearchmachine · 2022-07-21T07:05:21Z

Pinging @elastic/es-distributed (Team:Distributed)

idegtiarenko · 2022-07-21T07:09:13Z

...ernalClusterTest/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsActionIT.java

+                    Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build()
+                )
+                .get();
+        }


Inline diff is totally confusing, consider reviewing in a split mode.

In short, this test:

sets cluster.routing.allocation.node_initial_primaries_recoveries=0 so that every created index is throttled

creates an index

executes GetGlobalCheckpointsAction with a timeout

asserts it is complaining about unavailable shards rather then anything else

reverts the setting to a default value

I think I could also rewrite it to revert the setting after executing GetGlobalCheckpointsAction and assert it could get result back but I am not sure it would make the test better

Yes I think we want that. As it stands the test suite still passes on master today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.

DaveCTurner · 2022-07-21T08:06:29Z

...ernalClusterTest/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsActionIT.java

+            .cluster()
+            .prepareUpdateSettings()
+            .setPersistentSettings(
+                Settings.builder().put(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey(), 0).build()


Hmm it seems like a bug that we even permit 0 here. I don't see a good reason to do this in production and it would be pretty harmful to do this accidentally. Ok for now, but if we fixed this bug we'd need to find some other way to delay allocation.

DaveCTurner · 2022-07-21T08:15:46Z

...ernalClusterTest/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsActionIT.java

    }

-    public void testWaitOnPrimaryShardsReady() throws Exception {


I think we don't want to remove this test. It's still valid isn't it?

I was expecting us to strengthen this test to verify that creating the index concurrently with running the action still works. It turns out we already have that test, so we can leave this one as-is IMO.

DaveCTurner · 2022-07-21T08:19:19Z

...ernalClusterTest/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsActionIT.java

+                    Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build()
+                )
+                .get();
+        }


Yes I think we want that. As it stands the test suite still passes on master today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.

...gin/fleet/src/main/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsAction.java

DaveCTurner

LGTM

* upstream/master: (40 commits) Fix CI job naming [ML] disallow autoscaling downscaling in two trained model assignment scenarios (elastic#88623) Add "Vector Search" area to changelog schema [DOCS] Update API key API (elastic#88499) Enable the pipeline on the feature branch (elastic#88672) Adding the ability to register a PeerFinderListener to Coordinator (elastic#88626) [DOCS] Fix transform painless example syntax (elastic#88364) [ML] Muting InternalCategorizationAggregationTests testReduceRandom (elastic#88685) Fix double rounding errors for disk usage (elastic#88683) Replace health request with a state observer. (elastic#88641) [ML] Fail model deployment if all allocations cannot be provided (elastic#88656) Upgrade to OpenJDK 18.0.2+9 (elastic#88675) [ML] make bucket_correlation aggregation generally available (elastic#88655) Adding cardinality support for random_sampler agg (elastic#86838) Use custom task instead of generic AckedClusterStateUpdateTask (elastic#88643) Reinstate test cluster throttling behavior (elastic#88664) Mute testReadBlobWithPrematureConnectionClose Simplify plugin descriptor tests (elastic#88659) Add CI job for testing more job parallelism [ML] make deployment infer requests fully cancellable (elastic#88649) ...

idegtiarenko requested a review from DaveCTurner July 20, 2022 11:05

Update docs/changelog/88641.yaml

4817492

DaveCTurner reviewed Jul 20, 2022

View reviewed changes

DaveCTurner requested a review from Tim-Brooks July 20, 2022 11:28

Tim-Brooks reviewed Jul 20, 2022

View reviewed changes

Add a test case

3360044

idegtiarenko commented Jul 21, 2022

View reviewed changes

idegtiarenko requested a review from DaveCTurner July 21, 2022 07:11

DaveCTurner requested changes Jul 21, 2022

View reviewed changes

idegtiarenko commented Jul 21, 2022

View reviewed changes

...gin/fleet/src/main/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsAction.java Show resolved Hide resolved

idegtiarenko added 3 commits July 21, 2022 10:53

bring back the test

b3b5638

rework the test

56609fc

remove debug info

f0441c2

idegtiarenko requested a review from DaveCTurner July 21, 2022 09:07

DaveCTurner approved these changes Jul 21, 2022

View reviewed changes

idegtiarenko merged commit 3e85195 into elastic:master Jul 21, 2022

idegtiarenko deleted the fix_testWaitOnIndexCreated branch July 21, 2022 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace health request with a state observer. #88641

Replace health request with a state observer. #88641

idegtiarenko commented Jul 20, 2022

elasticsearchmachine commented Jul 20, 2022

DaveCTurner left a comment

DaveCTurner commented Jul 20, 2022

Tim-Brooks left a comment

elasticsearchmachine commented Jul 21, 2022

idegtiarenko Jul 21, 2022

idegtiarenko Jul 21, 2022

DaveCTurner Jul 21, 2022

DaveCTurner Jul 21, 2022

DaveCTurner Jul 21, 2022

DaveCTurner Jul 21, 2022

DaveCTurner left a comment

		}

		public void testWaitOnPrimaryShardsReady() throws Exception {

Replace health request with a state observer. #88641

Replace health request with a state observer. #88641

Conversation

idegtiarenko commented Jul 20, 2022

elasticsearchmachine commented Jul 20, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner commented Jul 20, 2022

Tim-Brooks left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jul 21, 2022

idegtiarenko Jul 21, 2022

Choose a reason for hiding this comment

idegtiarenko Jul 21, 2022

Choose a reason for hiding this comment

DaveCTurner Jul 21, 2022

Choose a reason for hiding this comment

DaveCTurner Jul 21, 2022

Choose a reason for hiding this comment

DaveCTurner Jul 21, 2022

Choose a reason for hiding this comment

DaveCTurner Jul 21, 2022

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment