Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add elasticsearch-node tool docs #37812

Merged
merged 31 commits into from
Mar 12, 2019

Conversation

andrershov
Copy link
Contributor

This PR adds documentation for elasticsearch-node tool #37696.

@andrershov andrershov added >docs General docs changes :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 24, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

<<path-settings,`path.data` setting>>. This means that in a disaster you can
also restart a node by moving its data directories to another host, presuming
that those data directories can be recovered from the faulty host. Note that it
is not possible to restore the data directory from a backup because this will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner Not sure what you mean by "is not possible to restore the data directory from a backup".
If you copy full data folder to another node, this node will be indistinguishable from the previous node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the literal data directory of the dead node somewhere else, post-mortem, is ok, but restoring from a backup (which could be who-knows-how stale) is decidedly not. We occasionally see people taking backups of their data directories and causing hassle when they discover that they can't restore from such things, and I saw a risk that this paragraph might be interpreted wrongly by those kinds of people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner Now I see what you mean, can we re-phrase it like "Note that if you have previously taken a backup of the data folder of the stopped node, you can not restore from it, you need a data folder state at the moment this node was stopped"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble with the phrase "if you have previously taken a backup of the data folder" is that it suggests this is a thing you might try and do, and I think we should avoid making that suggestion. Technically you can't take a backup of the data folder: a backup from which you cannot ever safely restore isn't really a backup at all 🤔.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner I agree. In this case, probably it makes sense to remove this sentence at all because it's confusing?
Also, the better place for explaining that there is no reason to copy data folders because it's not possible to restore from them is on the snapshot page.

@andrershov
Copy link
Contributor Author

andrershov commented Jan 24, 2019

@DaveCTurner thanks for your re-work! I've left one question and pushed one commit, that adds (term, version) instructions and updates sample run.
Let me know what you think about the algorithm of running the tool and answering no on each survived master-eligible node. Probably we want to add --output-cluster-state flag instead.

@DaveCTurner
Copy link
Contributor

I was also thinking of a --dry-run flag. But in most cases people won't be running this with >1 node remaining (since by far the most common case is 3 master-eligible nodes) so I didn't want to make too big a deal of it. I pushed a change that explains lexicographic ordering using shorter words than "lexicographic" and which pulls this out to a separate paragraph so as not to clutter up the step-by-step instructions. WDYT?

@DaveCTurner
Copy link
Contributor

@elasticmachine please run elasticsearch-ci/oss-distro-docs

andrershov added a commit that referenced this pull request Jan 24, 2019
elasticsearch-node tool helps to restore cluster if half or more of
master eligible nodes are lost. Of course, all bets are off, regarding
data consistency.

There are two parts of the tool: unsafe-bootstrap to be used when there
is still at least one master-eligible node alive and detach-cluster,
when there are no master-eligible nodes left.
This commit implements the first part.

Docs for the tool will be added separately as a part of #37812.
@DaveCTurner
Copy link
Contributor

I looked at this with a fresh set of eyes this morning and tidied up a couple of things that stood out to me. I would also like Lisa to review (or to delegate to someone else on the docs team).

@DaveCTurner DaveCTurner removed their request for review January 25, 2019 09:28
@ywelsch ywelsch self-requested a review January 25, 2019 09:50
Copy link
Contributor Author

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner I still think we need to re-phrase the part about data folder backup, other than that looks good to me.

<<path-settings,`path.data` setting>>. This means that in a disaster you can
also restart a node by moving its data directories to another host, presuming
that those data directories can be recovered from the faulty host. Note that it
is not possible to restore the data directory from a backup because this will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner Now I see what you mean, can we re-phrase it like "Note that if you have previously taken a backup of the data folder of the stopped node, you can not restore from it, you need a data folder state at the moment this node was stopped"

@DaveCTurner DaveCTurner removed the request for review from lcawl January 25, 2019 11:28
@DaveCTurner
Copy link
Contributor

Andrey rightly points out that we've not finished the other half of this tool, or its docs, so my request for review from the docs team was premature. Sorry for the noise.

@lcawl
Copy link
Contributor

lcawl commented Jan 25, 2019

One thing I'd suggest adding is the Synopsis section that lists all the options, per https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-passwords.html
I can take a deeper look when you're ready for doc review.

@ywelsch ywelsch mentioned this pull request Jan 26, 2019
61 tasks
@andrershov
Copy link
Contributor Author

andrershov commented Feb 1, 2019

@DaveCTurner I've added detach-cluster sample output. Also, I've updated unsafe-bootstrap sample output.
One more thing, I have reformatted messages displayed to the user, so they fit 72 chars (same length for the delimiter).
Please note, that I've removed delimiter from the sample output because docs build fails if it's not removed because it confuses this delimiter and delimiter used to show the beginning and end of the source section.
I've merged master to this branch, so now these tools are available on this branch.

Copy link
Contributor

@lcawl lcawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added minor suggestions but overall LGTM

@andrershov andrershov merged commit 09425d5 into elastic:master Mar 12, 2019
andrershov pushed a commit that referenced this pull request Mar 12, 2019
This commit, mostly authored by @DaveCTurner,
adds documentation for elasticsearch-node tool #37696.

(cherry picked from commit 09425d5)
andrershov pushed a commit that referenced this pull request Mar 12, 2019
This commit, mostly authored by @DaveCTurner,
adds documentation for elasticsearch-node tool #37696.

(cherry picked from commit 09425d5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >docs General docs changes v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants