-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features not matching version after an upgrade to 8.13+ #109254
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
@thbkrkr To help debug this, it'll be really helpful to have the order that nodes were upgraded, the next time this occurs, as well as logs from all the upgraded nodes. |
We already have some accounting to make sure features are set properly when a master node is upgraded, but it looks like this is going awry |
This does not reproduce readily, I suspect it very much depends on the order nodes are restarted. Continuing to investigate. |
I encountered this issue today while migrating a cluster into k8s using ECK and version 8.14.1 of Elasticsearch. If you can provide an idea of what logs you would like, I can get them for you. For context, this is a 15 node cluster running on GKE. The three dedicated master nodes did have the In the screenshot below, lXbHD9JaRlSP51wL2D442Q is a node with a data role (no master) and tTBB9pOuRN6Q1LpVUJvujQ was a dedicated master node. Note: this was an empty cluster (v8.14.1) and then I restored a snapshot to it from a v7.17.22 cluster. ECK v2.13.0 |
I just need all the logs of the nodes from when the upgrade started to when it completed |
@mikeprince3 Just spotted you're not an Elastician, so don't have access to our infrastructure. If you can provide a tarball of the log files somewhere that would be very helpful; if you don't have anywhere to upload them to please let me know and we can sort something out. |
@thecoop Everything logs into GCP but I'll see what I can do to share or export that. I haven't tried attaching to the pods to pull the logs direct either but will give that a shot too. I assume you're looking for logs from both ECK and the ES nodes? |
Just the ES nodes will do - this is a bug with elasticsearch |
@thecoop I pinged you on linkedin. I can send you a link to the logs through a private message there. I'm open to other alternatives if you prefer. |
@thecoop uploaded as requested. There are 15 nodes in the cluster so I only included logs from the two node ids in my screenshot above. The |
@mikeprince3 Thanks for the logs. This bug is around the exact order in which nodes are restarted and get elected to master, so could you send the logs for the other two master nodes, and maybe one node that was unaffected by this bug for comparison? |
@thecoop No problem. I was able to combine the logs across the three masters and export them as a single file. Hopefully you'll be able to view the logs as they happened instead of jumping between log files (plus this was way easier for me). I was also able to pull the first 10k log records for the entire cluster during the initial startup so maybe that context can help as well. Regarding the unaffected nodes, the three masters were the only ones that seemed to have any entiries in their Note: I don't think this part is relevant but wanted to explain what you will see in the logs. Our cluster has 6 nodes called |
Pinging @elastic/es-distributed (Team:Distributed) |
Looks like this bug happens when there are non-master-eligible nodes in a cluster when a cluster is upgraded. The non-master-eligible nodes are not going through the same codepath in |
Thanks for the logs @mikeprince3, we've merged in a fix that will be in 8.15 |
Elasticsearch Version
8.13.x
Java Version: bundled
OS Version: N/A (different k8s versions)
Problem Description
ECK operator
2.12.1
fails upgrading Elasticsearch8.13+
as it is stalled on the following error when calling the desired nodes API:Steps to Reproduce
The steps should be (not tested yet):
Notes
Note: this is from ECK
2.12.1
that the operator stops to use the deprecatednode_version
field if the cluster is 8.13+:Two occurences of this issue have been reported for these versions upgrades:
Each time, the users confirmed that:
desired_node.version_deprecated
featureThe text was updated successfully, but these errors were encountered: