Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconciler error - elasticsearch client failed - [update_desired_nodes_request] failed to parse field [nodes] #7664

Closed
thbkrkr opened this issue Mar 27, 2024 · 3 comments · Fixed by #7663
Labels
>bug Something isn't working v2.12.1

Comments

@thbkrkr
Copy link
Contributor

thbkrkr commented Mar 27, 2024

After upgrading ECK 2.12.0, if CPU and memory resources are set for the Elasticsearch container, upgrading Elasticsearch from 8.12.2 to 8.13.0 is stalled on the following error when calling the desired nodes API:

2024-03-27T11:48:26.479+0100    ERROR   manager.eck-operator    Reconciler error        {
  "service.version": "2.12.0-SNAPSHOT+dbdcc1c5", 
  "controller": "elasticsearch-controller", 
  "object": {"name":"q","namespace":"test"}, "namespace": "test", "name": "q", 
  "reconcileID": "f7ab9df2-ab31-4a34-b07e-c0c56c36e5f0", 
  "error": "elasticsearch client failed for https://q-es-internal-http.test.svc:9200/_internal/desired_nodes/4859b6c6-4d16-41a2-a59b-563dcb9cf34c/3?error_trace=true: 
  400 Bad Request: {Status:400 Error:{CausedBy:{Reason:Required [node_version] Type:illegal_argument_exception}
  Reason:[1:1387] [update_desired_nodes_request] failed to parse field [nodes] Type:x_content_parse_exception
  StackTrace:org.elasticsearch.xcontent.XContentParseException: [1:1387] [update_desired_nodes_request] failed to parse field [nodes]
...
@thbkrkr thbkrkr added the >bug Something isn't working label Mar 27, 2024
@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 27, 2024

The bug was introduced in:

The Elasticsearch version from the Spec is used to check if the deprecated field needs to be removed or not, instead of the minimum Elasticsearch version in the cluster.

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 27, 2024

Manifest to reproduce:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: test
spec:
  version: 8.12.0
  nodeSets:
  - name: master
    count: 1
    config:
      node.store.allow_mmap: false
    podTemplate:
      spec:
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 4Gi
              cpu: 2
            limits:
              memory: 4Gi

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Mar 28, 2024

We should have caught this bug with the e2e-tests suite. We need to update at least one TestVersionUpgrade with container CPU resources, so that the desired node API is used during this upgrade.

// DefaultResources for the Elasticsearch container. The JVM default heap size is 1Gi, so we
// request at least 2Gi for the container to make sure ES can work properly.
// Not applying this minimum default would make ES randomly crash (OOM) on small machines.
// Similarly, we apply a default memory limit of 2Gi, to ensure the Pod isn't the first one to get evicted.
// No CPU requirement is set by default.
DefaultResources = corev1.ResourceRequirements{
Requests: map[corev1.ResourceName]resource.Quantity{
corev1.ResourceMemory: DefaultMemoryLimits,
},
Limits: map[corev1.ResourceName]resource.Quantity{
corev1.ResourceMemory: DefaultMemoryLimits,
},

// Neither the limit nor the request is set
return n.addReason("no CPU request or limit set")

// Stop here if there is at least one reason to not compute the desired state.
if err := n.toError(); err != nil {
return nil, err
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working v2.12.1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant