-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GatewayService #99994
Merged
elasticsearchmachine
merged 19 commits into
elastic:main
from
ywangd:es-89310-gateway-service
Oct 10, 2023
Merged
Refactor GatewayService #99994
Changes from 7 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
786062a
Refactor GatewayService
ywangd 571db2d
address feedback to capture term
ywangd f4a1632
Merge remote-tracking branch 'origin/main' into es-89310-gateway-service
ywangd 0a1e935
Tidy up and restructure based on comments
ywangd 2be8f92
tweak
ywangd f9c0f4e
adopt proposed changes by dct
ywangd 54a4ed2
compilation
ywangd 25bbeba
Add task submitted tracking
ywangd 9b99cf5
Merge remote-tracking branch 'origin/main' into es-89310-gateway-service
ywangd bf8f21a
add tests
ywangd 18ec15c
tweak logging
ywangd 4de9081
Merge remote-tracking branch 'origin/main' into es-89310-gateway-service
ywangd da050b4
Remove usage of DiscoveryNode.createLocal. Relates: #100281
ywangd 01cf04b
Use real objects for tests
ywangd a48d98a
tweak
ywangd 3e326e3
tweak
ywangd 19f89ba
remove case
ywangd b0d7da2
Merge remote-tracking branch 'origin/main' into es-89310-gateway-service
ywangd b5da2d1
Do not store clusterService as instance field.
ywangd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner There are still some complexities with this block of code and other related areas.
scheduledRecovery
isnull
. There can still be edge cases where we can schedule more than once due to racing between checkingscheduledRecovery
and reset it back tonull
. If submitting multiple update tasks isn't an issue, we may also chose to not check it all and just always schedule?scheduledRecovery
back to null in the scheduled runnable, it needs to be madevolatile
as well.Do we need to address the 1st point? If so, it seems we need another state variable for it. I forgot to mention it during the sync but this was one of the original complexity. Also because the
ClusterStateUpdateTask
may not run insideexecute
due to dataNodeSize dropping again, the state needs to be reset from within the task which brings back the need of passing a "runAfter" into the task. To simplify things, I think we don't want to check dataNodeSize again inside the task. It's an edge case anyway and dropping it makes things simpler. But we will still need some other state management if we want to address the 1st point. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think we should only ever submit one cluster state update task per term, so we ought to track this with a flag within the per-term state.
I would not expect any races here, or rather I think if we keep track of whether we've submitted the cluster state update task then that solves those races.
Good point, although since we only do that when actually submitting the task again I think the solution is to make this submission a once-only thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed 25bbeba to add a new state variable (
AtomicBoolean
) solved multiple issues that I had. Thanks!Please let me know if the main code looks good to you. I'll proceed to add some more tests if you are happy with the main code changes. Please also let me know if you have any ideas for what kinda tests we might need. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added multiple tests to cover different scenarios in bf8f21a
Now the whole thing looks ready to me.