Skip to content
This repository has been archived by the owner on May 16, 2023. It is now read-only.

[elasticsearch] use new node.roles settings #1186

Merged
merged 10 commits into from
May 25, 2021
Merged
2 changes: 1 addition & 1 deletion elasticsearch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ support multiple versions with minimal changes.
| `readinessProbe` | Configuration fields for the readiness [probe][] | see [values.yaml][] |
| `replicas` | Kubernetes replica count for the StatefulSet (i.e. how many pods) | `3` |
| `resources` | Allows you to set the [resources][] for the StatefulSet | see [values.yaml][] |
| `roles` | A hash map with the specific [roles][] for the `nodeGroup` | see [values.yaml][] |
| `roles` | A list with the specific [roles][] for the `nodeGroup` | see [values.yaml][] |
| `schedulerName` | Name of the [alternate scheduler][] | `""` |
| `secretMounts` | Allows you easily mount a secret as a file inside the StatefulSet. Useful for mounting certificates and other secrets. See [values.yaml][] for an example | `[]` |
| `securityContext` | Allows you to set the [securityContext][] for the container | see [values.yaml][] |
Expand Down
8 changes: 2 additions & 6 deletions elasticsearch/examples/migration/client.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
---

replicas: 2

clusterName: "elasticsearch"
nodeGroup: "client"

esMajorVersion: 6

roles:
master: "false"
ingest: "false"
data: "false"
roles: []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mark-vieira
This configuration doesn't work to set a coordinating node and instead adds all default roles to the node.

I followed https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#coordinating-only-node, but we define node.roles as environment variable is the chart instead of configuring it in elasticsearch.yaml so this configuration translates to node.roles=, environment variable in the Elasticsearch container

I did a few tries but couldn't find the correct syntax for setting client node with environment variable.

Some example reproduced using an Elasticsearch container outside of Helm charts:

~ docker run -e "node.roles=," docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT|grep role
{"@timestamp":"2021-05-12T16:50:06.917Z", "log.level": "INFO", "message":"node name [afa554bbd8cc], node ID [8TGL3o2qRkeuNw3TCbaxFg], cluster name [docker-cluster], roles [data, remote_cluster_client, ingest, master, ml]", "service.name":"ES_ECS","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","type":"server","node.name":"afa554bbd8cc","cluster.name":"docker-cluster"}
...
~ docker run -e "node.roles=" docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT|grep role
{"@timestamp":"2021-05-12T16:50:50.805Z", "log.level": "INFO", "message":"node name [574604b8f312], node ID [K9RDU2WUSHGVZRMWlAeGqQ], cluster name [docker-cluster], roles [remote_cluster_client, master, ml, ingest, transform, data]", "service.name":"ES_ECS","process.thread.name":"main","log.logger":"org.elasticsearch.node.Node","type":"server","node.name":"574604b8f312","cluster.name":"docker-cluster"}
...
~ docker run -e "node.roles=[]" docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT|grep role
uncaught exception in thread [main]
{"@timestamp":"2021-05-12T16:51:19.595Z", "log.level":"ERROR", "message":"uncaught exception in thread [main]", "service.name":"ES_ECS","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler","type":"server","node.name":"cbd6de59c5ee","cluster.name":"docker-cluster","error.type":"org.elasticsearch.bootstrap.StartupException","error.message":"java.lang.IllegalArgumentException: unknown role [[]]","error.stack_trace":"java.lang.IllegalArgumentException: unknown role [[]]\n\tat org.elasticsearch.cluster.node.DiscoveryNode.getRoleFromRoleName(DiscoveryNode.java:454)\n\tat java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)\n\tat java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)\n\tat java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)\n\tat java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)\n\tat java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)\n\tat java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)\n\tat java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)\n\tat org.elasticsearch.common.settings.Setting.lambda$listSetting$44(Setting.java:1474)\n\tat org.elasticsearch.common.settings.Setting.get(Setting.java:441)\n\tat org.elasticsearch.common.settings.Setting.get(Setting.java:435)\n\tat org.elasticsearch.cluster.node.DiscoveryNode.getRolesFromSettings(DiscoveryNode.java:222)\n\tat org.elasticsearch.cluster.node.DiscoveryNode.isDataNode(DiscoveryNode.java:82)\n\tat org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:289)\n\tat org.elasticsearch.node.Node.<init>(Node.java:335)\n\tat org.elasticsearch.node.Node.<init>(Node.java:268)\n\tat org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:225)\n\tat org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:225)\n\tat org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:387)\n\tat org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)\n\tat org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)\n\tat org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)\n\tat org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)\n\tat org.elasticsearch.cli.Command.main(Command.java:90)\n\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)\n\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)\nFor complete error details, refer to the log at /usr/share/elasticsearch/logs/docker-cluster.log\n"}
java.lang.IllegalArgumentException: unknown role [[]]
...
~ docker run -e "node.roles=null" docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT|grep role
{"@timestamp":"2021-05-12T16:52:18.560Z", "log.level":"ERROR", "message":"uncaught exception in thread [main]", "service.name":"ES_ECS","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.ElasticsearchUncaughtExceptionHandler","type":"server","node.name":"e7767eaf1286","cluster.name":"docker-cluster","error.type":"org.elasticsearch.bootstrap.StartupException","error.message":"java.lang.IllegalArgumentException: unknown role [null]","error.stack_trace":"java.lang.IllegalArgumentException: unknown role [null]\n\tat org.elasticsearch.cluster.node.DiscoveryNode.getRoleFromRoleName(DiscoveryNode.java:454)\n\tat java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)\n\tat java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)\n\tat java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)\n\tat java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)\n\tat java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)\n\tat java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)\n\tat java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)\n\tat org.elasticsearch.common.settings.Setting.lambda$listSetting$44(Setting.java:1474)\n\tat org.elasticsearch.common.settings.Setting.get(Setting.java:441)\n\tat org.elasticsearch.common.settings.Setting.get(Setting.java:435)\n\tat org.elasticsearch.cluster.node.DiscoveryNode.getRolesFromSettings(DiscoveryNode.java:222)\n\tat org.elasticsearch.cluster.node.DiscoveryNode.isDataNode(DiscoveryNode.java:82)\n\tat org.elasticsearch.env.NodeEnvironment.<init>(NodeEnvironment.java:289)\n\tat org.elasticsearch.node.Node.<init>(Node.java:335)\n\tat org.elasticsearch.node.Node.<init>(Node.java:268)\n\tat org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:225)\n\tat org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:225)\n\tat org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:387)\n\tat org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170)\n\tat org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161)\n\tat org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)\n\tat org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)\n\tat org.elasticsearch.cli.Command.main(Command.java:90)\n\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)\n\tat org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)\nFor complete error details, refer to the log at /usr/share/elasticsearch/logs/docker-cluster.log\n"}
uncaught exception in thread [main]
java.lang.IllegalArgumentException: unknown role [null]
...

Can you help finding the good syntax 🙏🏻

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so the helm charts don't actually generate an elasticsearch.yaml file? Instead we rely on environment variables exclusively?

@rjernst the -E stuff is handled by the core settings stuff isn't it? I don't think that's Docker-specific. How would one denote an empty list via that syntax?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allow users to define elasticsearch.yaml config using esConfig value, but for everything that is configured by the chart itself using custom values we have to use environment variables for a few reasons:

  • Helm can't parse the content of esConfig.elasticsearch.yml to detect that role is master and reuse it in Helm logic for example
  • The content of esConfig.elasticsearch.yml is directly mounted as a file inside the container so we can't merge it with some custom config generated by Helm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, given that issue is still open I'm not sure we have a real solution here for coordinating nodes. My only other though it to just use voting_only as that's pretty much functionally equivalent. The only different being that node will also participate in master node elections, which is probably ok. The real purpose of a coordinating node is to ensure it's only workload is handling requests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor Might have thoughts. I don't see a way to set an empty list via -E

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a workaround in e1bf227 by adding empty roles list in both roles chart value (to override default value) and node.roles settings in elasticsearch.yml.

I think we can use this workaround for now, but it could be great to find a way to define coordinating node properly using environment variable only.


volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
Expand Down
5 changes: 1 addition & 4 deletions elasticsearch/examples/migration/data.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---

replicas: 2

esMajorVersion: 6
Expand All @@ -12,6 +11,4 @@ clusterName: "elasticsearch"
nodeGroup: "data"

roles:
master: "false"
ingest: "false"
data: "true"
- data
7 changes: 2 additions & 5 deletions elasticsearch/examples/migration/master.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---

# Temporarily set to 3 so we can scale up/down the old a new cluster
# one at a time whilst always keeping 3 masters running
replicas: 1
Expand All @@ -14,12 +13,10 @@ clusterName: "elasticsearch"
nodeGroup: "master"

roles:
master: "true"
ingest: "false"
data: "false"
- master

volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
Expand Down
8 changes: 1 addition & 7 deletions elasticsearch/examples/multi/client.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,8 @@
---

clusterName: "multi"
nodeGroup: "client"

roles:
master: "false"
ingest: "false"
data: "false"
ml: "false"
remote_cluster_client: "false"
roles: []

persistence:
enabled: false
7 changes: 1 addition & 6 deletions elasticsearch/examples/multi/data.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
---

clusterName: "multi"
nodeGroup: "data"

roles:
master: "false"
ingest: "true"
data: "true"
ml: "false"
remote_cluster_client: "false"
- data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should have the ingest role as well.

7 changes: 1 addition & 6 deletions elasticsearch/examples/multi/master.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
---

clusterName: "multi"
nodeGroup: "master"

roles:
master: "true"
ingest: "false"
data: "false"
ml: "false"
remote_cluster_client: "false"
- master
6 changes: 3 additions & 3 deletions elasticsearch/examples/security/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ clusterName: "security"
nodeGroup: "master"

roles:
master: "true"
ingest: "true"
data: "true"
- master
- ingest
- data

protocol: https

Expand Down
6 changes: 6 additions & 0 deletions elasticsearch/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ We truncate at 63 chars because some Kubernetes name fields are limited to this
{{- end -}}
{{- end -}}

{{- define "elasticsearch.roles" -}}
{{- range $.Values.roles -}}
{{ . }},
{{- end -}}
{{- end -}}

{{- define "elasticsearch.esMajorVersion" -}}
{{- if .Values.esMajorVersion -}}
{{ .Values.esMajorVersion }}
Expand Down
13 changes: 3 additions & 10 deletions elasticsearch/templates/statefulset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -295,15 +295,12 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.name
{{- if eq .Values.roles.master "true" }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from the Helm template right? Not elasticsearch.yml? So effectively we've changed how you specify roles to only support the new node.roles setting? If so, how do we handle compatibility with versions of Elasticsearch that don't have the node.roles setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is coming from the Helm template to define some environment variables into Elasticsearch container.

If so, how do we handle compatibility with versions of Elasticsearch that don't have the node.roles setting?

helm-charts repo is using release branches following the same model as elastic/elasticsearch. We expect to merge this PR only on master branch so it will only be released with helm-charts 8.x which will use Elasticsearch 8.x Docker images.

Currently master branch is using 8.0.0-SNAPSHOT image which is why CI tests is failing, while all [released versions](https://github.com/elastic/helm-charts/releases] are using aligned Elasticsearch versions.

Note that anybody can override the imageTag value, to use the chart from master branch with an old Elasticsearch version for example. However this is not something we support and we don't try to handle backward compatibility in this scenario.

{{- if ge (int (include "elasticsearch.esMajorVersion" .)) 7 }}
{{- if has "master" .Values.roles }}
- name: cluster.initial_master_nodes
value: "{{ template "elasticsearch.endpoints" . }}"
{{- else }}
- name: discovery.zen.minimum_master_nodes
value: "{{ .Values.minimumMasterNodes }}"
{{- end }}
{{- end }}
- name: node.roles
value: "{{ template "elasticsearch.roles" . }}"
{{- if lt (int (include "elasticsearch.esMajorVersion" .)) 7 }}
- name: discovery.zen.ping.unicast.hosts
value: "{{ template "elasticsearch.masterService" . }}-headless"
Expand All @@ -319,10 +316,6 @@ spec:
- name: ES_JAVA_OPTS
value: "{{ .Values.esJavaOpts }}"
{{- end }}
{{- range $role, $enabled := .Values.roles }}
- name: node.{{ $role }}
value: "{{ $enabled }}"
{{- end }}
{{- if .Values.extraEnvs }}
{{ toYaml .Values.extraEnvs | indent 10 }}
{{- end }}
Expand Down
21 changes: 4 additions & 17 deletions elasticsearch/tests/elasticsearch_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,7 @@ def test_defaults():
{"name": "discovery.seed_hosts", "value": uname + "-headless"},
{"name": "network.host", "value": "0.0.0.0"},
{"name": "cluster.name", "value": clusterName},
{"name": "node.master", "value": "true"},
{"name": "node.data", "value": "true"},
{"name": "node.ingest", "value": "true"},
{"name": "node.roles", "value": "master,ingest,data,remote_cluster_client,ml,"},
]

c = r["statefulset"][uname]["spec"]["template"]["spec"]["containers"][0]
Expand Down Expand Up @@ -174,7 +172,7 @@ def test_overriding_the_image_and_tag():
def test_set_initial_master_nodes():
config = """
roles:
master: "true"
- master
"""
r = helm_template(config)
env = r["statefulset"][uname]["spec"]["template"]["spec"]["containers"][0]["env"]
Expand All @@ -192,7 +190,7 @@ def test_set_initial_master_nodes():
def test_dont_set_initial_master_nodes_if_not_master():
config = """
roles:
master: "false"
- data
"""
r = helm_template(config)
env = r["statefulset"][uname]["spec"]["template"]["spec"]["containers"][0]["env"]
Expand All @@ -203,7 +201,7 @@ def test_dont_set_initial_master_nodes_if_not_master():
def test_set_discovery_seed_host():
config = """
roles:
master: "true"
- master
"""
r = helm_template(config)
env = r["statefulset"][uname]["spec"]["template"]["spec"]["containers"][0]["env"]
Expand All @@ -216,17 +214,6 @@ def test_set_discovery_seed_host():
assert e["name"] != "discovery.zen.ping.unicast.hosts"


def test_enabling_machine_learning_role():
config = """
roles:
ml: "true"
"""
r = helm_template(config)
env = r["statefulset"][uname]["spec"]["template"]["spec"]["containers"][0]["env"]

assert {"name": "node.ml", "value": "true"} in env


def test_adding_extra_env_vars():
config = """
extraEnvs:
Expand Down
12 changes: 6 additions & 6 deletions elasticsearch/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ nodeGroup: "master"
masterService: ""

# Elasticsearch roles that will be applied to this nodeGroup
# These will be set as environment variables. E.g. node.master=true
# These will be set as environment variables. E.g. node.roles=master
roles:
master: "true"
ingest: "true"
data: "true"
remote_cluster_client: "true"
ml: "true"
- master
- ingest
- data
- remote_cluster_client
- ml

replicas: 3
minimumMasterNodes: 2
Expand Down