-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elasticsearch-shard remove-corrupted-data doesn't work on missing metadata #47435
Comments
Pinging @elastic/es-distributed (:Distributed/Cluster Coordination) |
Sorry @ct0br0 I cannot reproduce this from the instructions that you have given because they are too vague. What do you mean "reproduce the directory structure"? Can you share a sequence of the specific commands you are running? |
that last part i had to make because it is either not created or deleted and the metadata isn't deleted with it
|
You should absolutely never modify the contents of the data path yourself. Can you go back a few steps and describe why you are doing this? |
if you were to take a directory that has shard data in it and nuke everything, then re-create the directories, that is what we are getting when we have planned or unplanned outages (you can stop the elasticsearch service) |
i don't know how else to explain it...the data isn't there. it is either never there and bad metadata is created, or it is in the middle of being deleted and the metadata isn't updated. usually it happens when elastic stops, but has happened in the middle of running and been the reason for elastic stopping before too. physical drives, networked drives, doesn't matter. |
I'm struggling to follow what you are trying to describe. Please slow down. Are you deleting things from the data path yourself or are you saying that this happens on its own? If it happens on its own then that's unexpected and we should address that. Can you share the logs from such a case? |
yes, on its own.
the fact that it's happening is most likely a different issue than the behavior of |
Ok, this means that some metadata is not where it should be. I wouldn't expect Can you share a diagnostics bundle from your cluster please? |
it looks like you have to build it? if this is the case i cannot run it in our environment can this be a feature request then? add a |
Ok, if you can't provide diagnostics then can you tell us a lot more detail about your cluster, and about the node that's affected, and about the The fix for this isn't to add a tool to clean up some mess, it's to prevent the mess from happening in the first place. And for that we need to understand how it's happening. |
sure thing. i'm not sure i can upload the node information with hostnames and IPs on here though was 73 data nodes, a few went bad and we are currently at 69 the dce_rpc is from bro/zeek https://github.com/zeek fed in by filebeat. but various other indexes have had this issue. elasticsearch.yml
dce_rpc settings
|
Thanks, that is very helpful. We think this could be another instance of #47276, because this index is |
we just have to remove the |
hey, wanted to let you know that works fantastically. |
Closing as the user confirmed that the work around worked for them. |
elasticsearch-shard
appears to be the tool for removing corrupted metadata.This has happened several times to us after updating past 7.0.0
Issue: directory structure and files are either deleted or never created, and
elasticsearch-shard
(remove-corrupted-data) can not remove it from the metadataSteps:
recreate directory structure (as
elasticsearch-shard
errors out "directory must exist" if it does not)run
elasticsearch-shard
(hits null pointer exception, because only directories exist)What I'd expect:
Kill the shard and not have to
rm -rf
the entire node and rely on replicas.Hopefully there's an error in my steps.
elastic 7.3.0 (no plugins)
oracle linux 7.6
network drives (vSAN) for elastic storage (though this happens on physical boxes with docker containers too)
The text was updated successfully, but these errors were encountered: