-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent downgrades from 8.x to 7.x #78586
Prevent downgrades from 8.x to 7.x #78586
Conversation
Users sometimes attempt to downgrade a node in place, but downgrades are totally untested and unsupported and generally don't work. We protect against this by recording the node version in the metadata and refusing to start if we encounter metadata written by a future version. However in 8.0 (elastic#42489) we changed the directory layout so that a 7.x node won't find the upgraded metadata, or indeed any other data, and will proceed as if it's a fresh node. That's almost certainly not what the user wants, so with this commit we create a file at `${path.data}/nodes` at each startup, preventing an older node from starting. Closes elastic#52414
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening this for discussion, but won't merge it before the relevant bits of #78525 are done since it'll just add more conflicts.
This approach is super-simple but doesn't generate the nicest message on an attempted downgrade:
[2021-10-02T14:48:42,219][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-0] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: FileSystemException[/Users/davidturner/src/elasticsearch-master/elasticsearch-7.15.0/data-0/nodes/0: Not a directory];
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:171) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:158) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:75) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:114) ~[elasticsearch-cli-7.15.0.jar:7.15.0]
at org.elasticsearch.cli.Command.main(Command.java:79) ~[elasticsearch-cli-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:123) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:81) ~[elasticsearch-7.15.0.jar:7.15.0]
Caused by: org.elasticsearch.ElasticsearchException: failed to bind service
at org.elasticsearch.node.Node.<init>(Node.java:825) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.node.Node.<init>(Node.java:288) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:219) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:219) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:167) ~[elasticsearch-7.15.0.jar:7.15.0]
... 6 more
Caused by: java.nio.file.FileSystemException: /Users/davidturner/src/elasticsearch-master/elasticsearch-7.15.0/data-0/nodes/0: Not a directory
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:396) ~[?:?]
at java.nio.file.Files.createDirectory(Files.java:694) ~[?:?]
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:801) ~[?:?]
at java.nio.file.Files.createDirectories(Files.java:787) ~[?:?]
at org.elasticsearch.env.NodeEnvironment.lambda$new$0(NodeEnvironment.java:265) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.env.NodeEnvironment$NodeLock.<init>(NodeEnvironment.java:202) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:262) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.node.Node.<init>(Node.java:383) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.node.Node.<init>(Node.java:288) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:219) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:219) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:399) ~[elasticsearch-7.15.0.jar:7.15.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:167) ~[elasticsearch-7.15.0.jar:7.15.0]
... 6 more
I also experimented with leaving something that looks like node metadata under ${path.data}/nodes/0/_state
which would yield a message indicating more clearly that the problem was a downgrade, but this messes with the logic that detects whether an upgrade is needed and I'd rather keep that logic as simple as possible.
It also doesn't address the question of what to do if we reorganise the data path and then discover that the cluster metadata is somehow invalid and requires a downgrade to fix it. A downgrade in that situation is technically ok since we fail before writing the cluster metadata in a new version, but won't work since we reorganise the data path before loading the cluster metadata for the first time.
Not being able to read the older metadata is clearly a bug but I don't think we've seen any such bugs for quite a while, and it would be much messier to reorganise the data path later on in node startup, so IMO we should just accept this. If we do encounter a bug that requires a downgrade then the user can recover by manually moving the data back.
Also NB we delete the nodes
directory and then create the file in its place; if we crashed in between those two actions then a downgrade would not be blocked and wouldn't find any data. IMO we should also just accept this.
/** | ||
* Converts an array of Strings to an array of Paths, adding an additional child if specified | ||
*/ | ||
private Path[] stringsToPaths(String[] strings, String additional) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, but unused.
@@ -408,12 +419,11 @@ private static boolean upgradeLegacyNodeFolders(Logger logger, Settings settings | |||
IOUtils.fsync(nodePath.path, true); | |||
}); | |||
|
|||
// now do the actual upgrade. start by upgrading the node metadata file before moving anything, since a downgrade in an | |||
// intermediate state would be pretty disastrous | |||
loadNodeMetadata(settings, logger, legacyNodeLock.getNodePath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually do anything any more, the node metadata is in the persisted cluster state which doesn't get written until much later.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
// versions 7.x and earlier put their data under ${path.data}/nodes/; leave a file at that location to prevent downgrades | ||
final Path legacyNodesPath = environment.dataFile().resolve("nodes"); | ||
if (Files.isRegularFile(legacyNodesPath) == false) { | ||
final String content = "written by Elasticsearch v" + Version.CURRENT + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should follow-up with a 7.16 change to log the first 256 bytes of the file if nodes
is a file and not a directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh yeah I suppose we could at least help 7.16 users here :)
Bah sorry I forgot that I was holding off on merging this until the relevant bits of #78525 are done. I'll revert and reopen. |
This reverts commit 02b7b17.
Users sometimes attempt to downgrade a node in place, but downgrades are
totally untested and unsupported and generally don't work. We protect
against this by recording the node version in the metadata and refusing
to start if we encounter metadata written by a future version.
However in 8.0 (#42489) we changed the directory layout so that a 7.x
node won't find the upgraded metadata, or indeed any other data, and
will proceed as if it's a fresh node. That's almost certainly not what
the user wants, so with this commit we create a file at
${path.data}/nodes
at each startup, preventing an older node fromstarting.
Closes #52414