-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add RemoveCorruptedShardDataCommand #32281
add RemoveCorruptedShardDataCommand #32281
Conversation
Pinging @elastic/es-distributed |
Such a tool would be very helpful when no sane shards are not available anymore as it is often better to lose a single segment than an entire shard! I left some comment on #31389 about the API. |
5828119
to
843f977
Compare
183b3da
to
4116084
Compare
confirm("Continue and clean up Lucene index files ?", terminal); | ||
// TODO: waiting for Lucene fixes or IndexWriter (to be able to create a new segment while other files there) | ||
// or wait while Lucene CheckIndex makes files non corrupted | ||
Lucene.cleanLuceneIndex(dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a chat with @vladimirdolzhenko and dropping the whole index is probably the safest option at this point when CheckIndex.Status.missing is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link to LUCENE-6762 to the comment
4116084
to
2f116d8
Compare
9b6aece
to
8e0891f
Compare
…elasticsearch-shard
8e0891f
to
a8f1488
Compare
@lcawl could you please have a look into documentation part change ? |
I've added a "Command-line tools" section in the Index Modules page, rather than including that link under the settings. @debadair will also provide some feedback. |
@DaveCTurner I've addressed your comments, could you please have another look ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor stuff now, and I asked for another reviewer.
] | ||
} | ||
|
||
You must accept the possibility of data loss changing parameter `accept_data_loss` to `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits, sorry:
You must accept the possibility of data loss by changing the parameter
accept_data_loss
totrue
.
for (int possibleLockId = fromNodeId; possibleLockId < toNodeId; possibleLockId++) { | ||
final NodeEnvironment.NodeLock nodeLock; | ||
try { | ||
nodeLock = new NodeEnvironment.NodeLock(possibleLockId, logger, environment, p -> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be try (Releasable nodeLock = new NodeEnvironment.NodeLock(possibleLockId, logger, environment, p -> {})) {...
with the catch
at the bottom of this method?
Also, if the directory doesn't exist, I think that FSDirectory.open()
creates it, which is not what we want here, so the consumer passed to NodeLock
's constructor needs to check for this and bail out sooner. See the protected FSDirectory(Path path, LockFactory lockFactory)
constructor for instance. I think this warrants a test that we don't accidentally create a bunch of directories if we can't find the index we were looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
private final Lock[] locks; | ||
private final NodePath[] nodePaths; | ||
|
||
public NodeLock(final int nodeId, final Logger logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me but I would like another pair of eyes, as it's quite important not to break this. @ywelsch WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NodeLock change looks ok to me. I would also like to revive the discussion that was initiated here about removing the max-local-storage "feature".
server/src/main/java/org/elasticsearch/env/NodeEnvironment.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/shard/RemoveCorruptedShardDataCommand.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work @vladimirdolzhenko
The docs are much improved (and much shorter) compared with the version that was reviewed
thank you @DaveCTurner for the valuable comments and review 💯 |
* master: (46 commits) Fixing assertions in integration test (elastic#33833) [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821) HLRC: Delete ML calendar (elastic#33775) Move DocsStats into Engine (elastic#33835) [Docs] Clarify accessing Date methods in painless (elastic#33560) add elasticsearch-shard tool (elastic#32281) Cut over to unwrap segment reader (elastic#33843) SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828) Emphasize that filesystem-level backups don't work (elastic#33102) Use the global doc id to generate a random score (elastic#33599) Add minimal sanity checks to custom/scripted similarities. (elastic#33564) Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196) [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810) SQL: day and month name functions tests locale providers enforcement (elastic#33653) TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560) Test: Relax jarhell gradle test (elastic#33787) [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797) Add ES version 6.4.2 (elastic#33831) MINOR: Remove Some Dead Code in Scripting (elastic#33800) Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814) ...
Relates elastic#31389 (cherry picked from commit a3e8b83)
add
elasticsearch-shard
command-line toolThe tool will refuse to run if there are no existing corruption markers (i.e., it will only work on known corrupted shards)
elasticsearch-shard remove-corrupted-segments
tool instead of dropped in dropindex.shard.check_on_startup: fix
#32279index.shard.check_on_startup: fix
setting-dry-run
option could be there to provide an overviewRelates #31389