-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to ES 5 #8
base: master
Are you sure you want to change the base?
Conversation
# Create and take ownership over required directories
RUN mkdir -p /opt/consul/config \
&& mkdir -p /opt/consul/data \
&& chmod 770 /opt/consul/data \
&& chown -R elasticsearch:elasticsearch /opt/consul \
&& mkdir -p /etc/containerpilot \
&& chmod -R g+w /etc/containerpilot \
&& chmod +x /usr/local/bin/elastic-server.sh \
&& chown -R elasticsearch:elasticsearch /etc/containerpilot |
Dockerfile
Outdated
COPY /bin/* /usr/local/bin/ | ||
|
||
# Should we remove unzip? | ||
# RUN apk del unzip tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing anything in a different layer doesn't reduce total image size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, does it if you squash it?
@@ -0,0 +1,6 @@ | |||
#!/bin/sh -xe | |||
manage.sh onStart #|| exit $? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything like this should be in a preStart
, not wrapping the main app. That would also eliminate the need for this shell script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from the redis autopilot example.
https://github.com/autopilotpattern/redis/blob/master/bin/redis-server-sentinel.sh Where it obviously does a bit more.
Does preStart fire BEFORE you start the co-processes? I'm asking because we need the co-process consul agent to start before we start the onStart
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some research and unfortunately this has to be a separate wrapper as the coprocess doesn't start until immediately after preStart. We need the coprocess (consul agent) to start prior to onStart. onStart's job is there to check whether it has joined the consul cluster at which point we can verify if other services exist. If we did this in preStart, the consul agent won't be running and will assume every container should become the master.
Coprocesses are started immediately following the exit of the preStart hook, before polling for health or backend hooks, and before the main application starts. - Containerpilot Docs
I think this could be confusing for developers like me who are just getting started with containerpilot and consul agent running as a coprocess. Is there a reason you guys use CONSUL_AGENT=1
in most of the examples. What is the purpose of making that optional? Meaning, why wouldn't you always use consul agent on each container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, no you're right. This will be fixed in CPv3 because we're forcing people to admit they need a real init system. But in the meantime we're stuck with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a "real" init system change this? I'm not sure what you mean by real init though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CPv3 we're supporting multiple processes as first-class, rather than having a "main" process and coprocesses. The end user will create the chain of dependencies explicitly, so in this case we'll be able to have everything wait on the consul agent. This is pretty much how every modern init system works, but was something we were avoiding in ContainerPilot originally because of community... obstinance(?) around the notion of there being more than one process in a container. Turns out that's exactly what you want a lot of the time. 😀
Dockerfile
Outdated
@@ -1,54 +1,65 @@ | |||
FROM debian:jessie | |||
FROM docker.elastic.co/elasticsearch/elasticsearch:5.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why swap out for the official image from Elastic? We were using the debian image because it's lean and gives us a good starting point.
There are a couple problems with the official image for us here:
- according to the Elastic docs this image includes X-Pack, which has only a trial license. I don't even want to start down the road of reconciling this repo's MPL license with that. 😀 So I don't think this is appropriate for us to distribute.
- I can't find the Dockerfile for this image anywhere, or at least not easily, so even if we did use this image we should include a link on where to find the Dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had a feeling this would be an issue. This is due to the "official" elasticsearch image being depreciated.
This image will receive no further updates after 2017-06-20 (June 20, 2017). Please adjust your usage accordingly. - Docker Hub
X-Pack is definitely part of the official image and has been turned off via env vars. I can remove it if that will help? Or obviously drop back to debian and build it.
Funny you mentioned not being able to find the Dockerfiles, I remember coming across a github issue mentioning the same thing. I can't seem to find that issue right now though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X-Pack is definitely part of the official image and has been turned off via env vars. I can remove it if that will help?
Removing it doesn't remove it from the shipped binary blobs because of the layered file system. If we going to try to use their current build I'd say we should just fork from what they have in their Dockerfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, would you like to fork based on their dockerfile? I would think copying the "official" repository from Elastic would be the best option for long term support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use FROM docker.elastic.co/elasticsearch/elasticsearch-alpine-base:latest
but after that I imagine we could just copy in the contents of https://github.com/elastic/elasticsearch-docker/blob/master/build/elasticsearch/Dockerfile into this directory and remove the bits we don't want. We'd need to put the LICENSE notice for their repo at the top of that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll work on that tomorrow morning
# leave this here then we get the ports as well-known environment variables | ||
# for purposes of linking. | ||
EXPOSE 9200 | ||
EXPOSE 9300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these removed just because the official image has them already? (Again, not sure because I can't find the Dockerfile.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the Dockerfile includes them.
https://github.com/elastic/elasticsearch-docker/blob/master/build/elasticsearch/Dockerfile#L44
bin/manage.sh
Outdated
consulCommand() { | ||
consul-cli --quiet --consul="${CONSUL_HOST}:8500" $* | ||
} | ||
|
||
preStart() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something this preStart
doesn't do anything, right? Almost everything you have in the onStart
should really be in the preStart
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you mean preStart not the highlighted consulCommand... But correct, I just left it in there as a placeholder, will remove it now.
bin/manage.sh
Outdated
|
||
if [[ -z ${CONSUL} ]]; then | ||
readonly lockPath=service/elasticsearch-master/locks/master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? It's unused anywhere else I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to pull it, it was left in there when I was testing using a lock
local-compose.yml
Outdated
- ES_NODE_DATA=true | ||
|
||
consul: | ||
image: autopilotpattern/consul:0.7.2-r0.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing warnings for consol 0.8.0 now, any blockers to upgrading that you are aware of?
ContainerPilot has only 0.7 bindings, so that shouldn't be a problem but it's a possible source of problems. Also we don't have our Consul image updated yet, so we'd want to verify that the updated 0.8 Consul agent works with 0.7 first too. I'd suggest not doing this in this PR to keep the changeset manageable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌🏼
local-compose.yml
Outdated
- ES_NODE_DATA=true | ||
links: | ||
- consul:consul | ||
elasticsearch_master: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a big fan of elasticsearch-master, elasticsearch-data and elasticsearch (with a service name of elasticsearch-master). I'm guessing this has to do with elasticsearch_master being a "gateway" where you would expose the ports? Wouldn't it be better to have HA running and reverse proxy to the elastic masters?
The reason we have 3 variants is for flexibility in deploying the 3 roles. In a small-to-middling-size environment one would just deploy all the ES instances as master+data. In a large production environment typically you'll want to send all the writes thru masters and do all the reads on data-only instances. We didn't want to be overly opinionated about the topology in this blueprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, I've never run a big ES environment. My experience is limited to < 1 TB.
bin/manage.sh
Outdated
n=$((n+1)) | ||
else | ||
|
||
# wait for a healthy master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ugly how it just keeps hitting consul every second to see if there are any other services registered as 'elasticsearch-master'. Maybe creating a lock is a good idea?
You mean here? That's only during startup, so while I suppose it's ugly it's also rather simple and it goes away after the initial startup. How does creating a lock help us out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't now that I've had a chance to sleep on it.
etc/elasticsearch.yml
Outdated
# | ||
# gateway.recover_after_nodes: 3 | ||
discovery.zen.ping.unicast.hosts: ["${ES_BOOTSTRAP_HOST}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You had ES_BOOTSTRAP_HOST as an env var for discovery.zen.ping.unicast.hosts: ["${ES_BOOTSTRAP_HOST}"] but it isn't defined in the project.
True, and this should be better documented. It's not used by default but it's an option for a user who has an existing cluster they want to bootstrap from. We didn't end up using it in https://github.com/autopilotpattern/elk and it seems to be of dubious use. I'd be fine with eliminating that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'll comment out that line and the function replace
(which has been renamed replaceZenHosts
to be a little more verbose) will place the host and uncomment it.
Something I just realized is that we only populate it with a single host. If that container restarts and the referenced node is no longer part of the cluster it will fail to join the cluster.
@tgross I've implemented several changes based on your feedback. Thanks for being so detailed and giving it a thorough review. |
Joyent issue: + exec /usr/share/elasticsearch/bin/es-docker
time="2017-04-08T20:57:01Z" level=info msg=" 2017/04/08 20:57:01 [INFO] agent: Synced node info"
[2017-04-08T20:57:06,064][WARN ][o.e.b.JNANatives ] unable to install syscall filter:
java.lang.UnsupportedOperationException: seccomp unavailable: requires kernel 3.5+ with CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER compiled in
at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:350) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:111) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:204) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:58) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.3.0.jar:5.3.0] This error does NOT show up when running on Docker for Mac (Version 17.03.1-ce-mac5 (16048)). Update: https://www.elastic.co/guide/en/elasticsearch/reference/current/system-call-filter-check.html |
Thanks. Let me open a ticket with the appropriate team here at Joyent and see what they say. |
Taking a deeper look I would not expect their bootstrap check to work. Seccomp isn't currently supported by LX/sdc-docker. It's not even supported in all Linux distros, so more enterprisey Linux has a similar problem. (See also elasticsearch GH #23159, puppet GH #798, and elk-docker GH #107)
Yeah looks like this was a breaking change for 5.2, where they changed it from a warning to an error. For now it looks like we can disable this check. Elastic says:
|
@tgross any suggestion on how to check in But to confirm I was able to disable system call filter checks via config and have it running on Triton this weekend. |
Not sure. Maybe we could hijack whatever mechanism Elastic is using? Do we know if they're running a shell script to do that check or is it all in Java? |
This is what I found on seccomp:
Running on triton: bash-4.3$ head /proc/config.gz
head: /proc/config.gz: No such file or directory
bash-4.3$ cat /boot/config-`uname -r`
cat: can't open '/boot/config-3.13.0': No such file or directory Using |
Requires us to duplicate a few files from dockers official image. Also add in a seccomp check to disable it on systems that don’t have it
@tgross: I've changed the base docker image and added checks for seccomp. Unfortunately we have to duplicate a few files from their official repo. I wasn't sure what license you were talking about included on the top of the dockerfile so I took a guess. I have it running on Joyent and scaling up and down just fine |
@@ -0,0 +1,39 @@ | |||
#!/bin/bash | |||
|
|||
# Run Elasticsearch and allow setting default settings via env vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we control the command we're using to call ES along with its environment, we don't really need this shell script. We can include its effects entirely in the docker-compose.yml
's command
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move it to elastic-server.sh
to reduce the number of files but it doesn't really simplify it that much does it? I don't think it is smart to remove the code, anyone who has used the official docker repo for Elasticsearch will expect that functionality to continue. I would expect switching to containerpilot from the official image to work without any changes. I would want all ENV variables I used to configure elastic to continue working as is with containerpilot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want all ENV variables I used to configure elastic to continue working as is with containerpilot.
Sure but there's literally only one line that doesn't work as-is: https://github.com/autopilotpattern/elasticsearch/pull/8/files/2f807599ea62bf41d3fc811586ffd0648ac40496#diff-35c179adb2b0c25802542935645540a7R37
exit 0 | ||
fi | ||
consulCommand() { | ||
consul-cli --quiet --consul="${CONSUL_HOST}:8500" $* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a consul
binary in the container. We don't need this wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want me to pull this out and make the following changes?
Service addresses
Use curl
and jq
to get service addresses
https://github.com/sberryman/elasticsearch/blob/wip-v5/bin/manage.sh#L28
waitForLeader
waitForLeader using consul members -status alive | grep server
Replace: https://github.com/sberryman/elasticsearch/blob/wip-v5/bin/manage.sh#L48-L64
With: https://github.com/sberryman/autopilotpattern-nats/blob/master/bin/manage.sh#L28-L44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that should do it.
bin/manage.sh
Outdated
replace() { | ||
REPLACEMENT=$(printf 's/^discovery\.zen\.ping\.unicast\.hosts.*$/discovery.zen.ping.unicast.hosts: ["%s"]/' ${MASTER}) | ||
sed -i "${REPLACEMENT}" /etc/elasticsearch/elasticsearch.yml | ||
# seccomp (not supported on joyent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to nitpick this comment slightly: # disable seccomp (only supported on newer Linux kernels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, done
Please don’t merge this yet, this is a work in progress. Doing this to get feedback from you guys.