-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace health request with a state observer. #88641
Replace health request with a state observer. #88641
Conversation
This was using health request assuming yellow index is one with all primaries initialized. This is not true as newly created index remain yellow when primaries allocation is trottled. This was discovered while working on a new shards allocator.
Hi @idegtiarenko, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'd like Tim to take a look too.
Actually I think we should have a test that exposes this change. It looks to me like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks fine to me. Agree with David's comment.
Pinging @elastic/es-distributed (Team:Distributed) |
Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build() | ||
) | ||
.get(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline diff is totally confusing, consider reviewing in a split mode.
In short, this test:
- sets
cluster.routing.allocation.node_initial_primaries_recoveries=0
so that every created index is throttled - creates an index
- executes
GetGlobalCheckpointsAction
with a timeout - asserts it is complaining about unavailable shards rather then anything else
- reverts the setting to a default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I could also rewrite it to revert the setting after executing GetGlobalCheckpointsAction
and assert it could get result back but I am not sure it would make the test better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we want that. As it stands the test suite still passes on master
today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.
.cluster() | ||
.prepareUpdateSettings() | ||
.setPersistentSettings( | ||
Settings.builder().put(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey(), 0).build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it seems like a bug that we even permit 0
here. I don't see a good reason to do this in production and it would be pretty harmful to do this accidentally. Ok for now, but if we fixed this bug we'd need to find some other way to delay allocation.
} | ||
|
||
public void testWaitOnPrimaryShardsReady() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't want to remove this test. It's still valid isn't it?
I was expecting us to strengthen this test to verify that creating the index concurrently with running the action still works. It turns out we already have that test, so we can leave this one as-is IMO.
Settings.builder().putNull(CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING.getKey()).build() | ||
) | ||
.get(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think we want that. As it stands the test suite still passes on master
today (i.e. reverting the production-code changes in this PR). I think we have to check the success case here to see the value in this change.
...gin/fleet/src/main/java/org/elasticsearch/xpack/fleet/action/GetGlobalCheckpointsAction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/master: (40 commits) Fix CI job naming [ML] disallow autoscaling downscaling in two trained model assignment scenarios (elastic#88623) Add "Vector Search" area to changelog schema [DOCS] Update API key API (elastic#88499) Enable the pipeline on the feature branch (elastic#88672) Adding the ability to register a PeerFinderListener to Coordinator (elastic#88626) [DOCS] Fix transform painless example syntax (elastic#88364) [ML] Muting InternalCategorizationAggregationTests testReduceRandom (elastic#88685) Fix double rounding errors for disk usage (elastic#88683) Replace health request with a state observer. (elastic#88641) [ML] Fail model deployment if all allocations cannot be provided (elastic#88656) Upgrade to OpenJDK 18.0.2+9 (elastic#88675) [ML] make bucket_correlation aggregation generally available (elastic#88655) Adding cardinality support for random_sampler agg (elastic#86838) Use custom task instead of generic AckedClusterStateUpdateTask (elastic#88643) Reinstate test cluster throttling behavior (elastic#88664) Mute testReadBlobWithPrematureConnectionClose Simplify plugin descriptor tests (elastic#88659) Add CI job for testing more job parallelism [ML] make deployment infer requests fully cancellable (elastic#88649) ...
This was using health request assuming yellow index is one with all primaries
initialized. This is not true as newly created index remain yellow when
primaries allocation is throttled. This was discovered while working on a new
shards allocator.