-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Node Shutdown Waits for Running Restores to Complete #76070
Conversation
We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686
Pinging @elastic/es-distributed (Team:Distributed) |
@@ -2981,6 +3030,9 @@ void ensureNotClosing(final Store store) throws AlreadyClosedException { | |||
if (store.isClosing()) { | |||
throw new AlreadyClosedException("store is closing"); | |||
} | |||
if (lifecycle.started() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this here as well since we close the repositories service before the indices service and would otherwise have to wait for the restores to actually complete in some cases I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
return; | ||
} | ||
final boolean added = ongoingRestores.add(shardId); | ||
assert added; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -2981,6 +3030,9 @@ void ensureNotClosing(final Store store) throws AlreadyClosedException { | |||
if (store.isClosing()) { | |||
throw new AlreadyClosedException("store is closing"); | |||
} | |||
if (lifecycle.started() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
Thanks Tanguy! |
…76070) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686
…76070) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686
…76095) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686
…76092) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686
We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the
SNAPSHOT
or some searchable snapshot pools that quietly acceptbut never reject/fail tasks after shutdown.
same as #46178 where we had the same bug in recoveries
closes #75686