Add elasticsearch-node tool docs #37812

andrershov · 2019-01-24T11:38:34Z

This PR adds documentation for elasticsearch-node tool #37696.

elasticmachine · 2019-01-24T11:38:37Z

Pinging @elastic/es-distributed

andrershov · 2019-01-24T16:01:21Z

docs/reference/commands/node-tool.asciidoc

+<<path-settings,`path.data` setting>>. This means that in a disaster you can
+also restart a node by moving its data directories to another host, presuming
+that those data directories can be recovered from the faulty host. Note that it
+is not possible to restore the data directory from a backup because this will


@DaveCTurner Not sure what you mean by "is not possible to restore the data directory from a backup".
If you copy full data folder to another node, this node will be indistinguishable from the previous node.

Copying the literal data directory of the dead node somewhere else, post-mortem, is ok, but restoring from a backup (which could be who-knows-how stale) is decidedly not. We occasionally see people taking backups of their data directories and causing hassle when they discover that they can't restore from such things, and I saw a risk that this paragraph might be interpreted wrongly by those kinds of people.

@DaveCTurner Now I see what you mean, can we re-phrase it like "Note that if you have previously taken a backup of the data folder of the stopped node, you can not restore from it, you need a data folder state at the moment this node was stopped"

The trouble with the phrase "if you have previously taken a backup of the data folder" is that it suggests this is a thing you might try and do, and I think we should avoid making that suggestion. Technically you can't take a backup of the data folder: a backup from which you cannot ever safely restore isn't really a backup at all 🤔.

@DaveCTurner I agree. In this case, probably it makes sense to remove this sentence at all because it's confusing?
Also, the better place for explaining that there is no reason to copy data folders because it's not possible to restore from them is on the snapshot page.

andrershov · 2019-01-24T16:31:07Z

@DaveCTurner thanks for your re-work! I've left one question and pushed one commit, that adds (term, version) instructions and updates sample run.
Let me know what you think about the algorithm of running the tool and answering no on each survived master-eligible node. Probably we want to add --output-cluster-state flag instead.

DaveCTurner · 2019-01-24T17:02:24Z

I was also thinking of a --dry-run flag. But in most cases people won't be running this with >1 node remaining (since by far the most common case is 3 master-eligible nodes) so I didn't want to make too big a deal of it. I pushed a change that explains lexicographic ordering using shorter words than "lexicographic" and which pulls this out to a separate paragraph so as not to clutter up the step-by-step instructions. WDYT?

DaveCTurner · 2019-01-24T17:38:36Z

@elasticmachine please run elasticsearch-ci/oss-distro-docs

elasticsearch-node tool helps to restore cluster if half or more of master eligible nodes are lost. Of course, all bets are off, regarding data consistency. There are two parts of the tool: unsafe-bootstrap to be used when there is still at least one master-eligible node alive and detach-cluster, when there are no master-eligible nodes left. This commit implements the first part. Docs for the tool will be added separately as a part of #37812.

…ing the node itself

DaveCTurner · 2019-01-25T09:27:57Z

I looked at this with a fresh set of eyes this morning and tidied up a couple of things that stood out to me. I would also like Lisa to review (or to delegate to someone else on the docs team).

andrershov

@DaveCTurner I still think we need to re-phrase the part about data folder backup, other than that looks good to me.

andrershov · 2019-01-25T10:24:00Z

docs/reference/commands/node-tool.asciidoc

+<<path-settings,`path.data` setting>>. This means that in a disaster you can
+also restart a node by moving its data directories to another host, presuming
+that those data directories can be recovered from the faulty host. Note that it
+is not possible to restore the data directory from a backup because this will


@DaveCTurner Now I see what you mean, can we re-phrase it like "Note that if you have previously taken a backup of the data folder of the stopped node, you can not restore from it, you need a data folder state at the moment this node was stopped"

DaveCTurner · 2019-01-25T11:30:00Z

Andrey rightly points out that we've not finished the other half of this tool, or its docs, so my request for review from the docs team was premature. Sorry for the noise.

lcawl · 2019-01-25T17:36:43Z

One thing I'd suggest adding is the Synopsis section that lists all the options, per https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-passwords.html
I can take a deeper look when you're ready for doc review.

# Conflicts: # docs/reference/commands/node-tool.asciidoc

andrershov · 2019-02-01T15:10:38Z

@DaveCTurner I've added detach-cluster sample output. Also, I've updated unsafe-bootstrap sample output.
One more thing, I have reformatted messages displayed to the user, so they fit 72 chars (same length for the delimiter).
Please note, that I've removed delimiter from the sample output because docs build fails if it's not removed because it confuses this delimiter and delimiter used to show the beginning and end of the source section.
I've merged master to this branch, so now these tools are available on this branch.

docs/reference/commands/node-tool.asciidoc

lcawl

I added minor suggestions but overall LGTM

Co-Authored-By: andrershov <[email protected]>

@DaveCTurner

This commit, mostly authored by @DaveCTurner, adds documentation for elasticsearch-node tool #37696. (cherry picked from commit 09425d5)

@DaveCTurner

This commit, mostly authored by @DaveCTurner, adds documentation for elasticsearch-node tool #37696. (cherry picked from commit 09425d5)

node-tool docs

2c61bdf

andrershov added >docs General docs changes :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 24, 2019

andrershov requested a review from DaveCTurner January 24, 2019 11:38

DaveCTurner mentioned this pull request Jan 24, 2019

Add tool elasticsearch-node unsafe-bootstrap #37696

Merged

Rework docs

769c93d

andrershov commented Jan 24, 2019

View reviewed changes

Add (term, version) instructions and example run

0d3bae3

Expand section on how to choose the freshest node

d72d1b6

DaveCTurner added 2 commits January 25, 2019 09:00

Talk about _cluster_ bootstrapping and avoid talking about bootstrapp…

af8d49f

…ing the node itself

Rewording

23b3c9b

DaveCTurner requested a review from lcawl January 25, 2019 09:26

DaveCTurner removed their request for review January 25, 2019 09:28

ywelsch self-requested a review January 25, 2019 09:50

andrershov commented Jan 25, 2019

View reviewed changes

DaveCTurner removed the request for review from lcawl January 25, 2019 11:28

ywelsch mentioned this pull request Jan 26, 2019

A new cluster coordination layer #32006

Closed

61 tasks

Andrey Ershov added 4 commits February 1, 2019 15:34

Merge branch 'master' into zen2_node_tool_docs

b76a132

Adjust messages and delimiter to 72 chars

f8f3bf8

Update sample output of unsafe-bootstrap and add detach-cluster

ca5d84e

Merge branch 'zen2_node_tool_docs' into zen2_node_tool_docs

0311381

# Conflicts: # docs/reference/commands/node-tool.asciidoc