Skip to content
This repository has been archived by the owner on Apr 12, 2022. It is now read-only.

Add hack for obtaining cgroup stats #25

Merged
merged 2 commits into from
Jan 25, 2017

Conversation

jasontedor
Copy link
Member

Docker mounts cgroup stats in the wrong place (i.e., inconsistently with the hierarchy in /proc/1/cgroup). This breaks the logic for traversing from /proc/self/cgroup to the stats for a process running inside a container. In core Elasticsearch, an undocumented hack was added to support this, for now. Thus, for cgroup stats to be available from Elasticsearch running inside a Docker container, the Elasticsearch process should be started with -Des.cgroups.hierarchy.override=/. This commit adds support for this hack.

Relates elastic/elasticsearch#22757

Docker mounts cgroup stats in the wrong place (i.e., inconsistently with
the hierarchy in /proc/1/cgroup). This breaks the logic for traversing
from /proc/self/cgroup to the stats for a process running inside a
container. In core Elasticsearch, an undocumented hack was added to
support this, for now. Thus, for cgroup stats to be available from
Elasticsearch running inside a Docker container, the Elasticsearch
process should be started with `-Des.cgroups.hierarchy.override=/`. This
commit adds support for this hack.
@jasontedor jasontedor requested a review from dliappis January 24, 2017 11:42
@dliappis
Copy link
Contributor

Thank you @jasontedor for this PR!

I added one comment. I'd also like to add a test once we have this in the image, what would be the right way of testing if the option works as expected?

For the record, I did some reading to familiarize myself with the issue and found historical info in this docker issue as well as in: https://fabiokung.com/2014/03/13/memory-inside-linux-containers/). There also seems to be an ongoing effort for a library that standardizes how to retrieve info normally found under /proc and /sys/fs/cgroup.

@jasontedor
Copy link
Member Author

I added one comment. I'd also like to add a test once we have this in the image, what would be the right way of testing if the option works as expected?

The best way to test this is to start a container against a build of Elasticsearch that contains the hack, and submit the request GET /_nodes/stats/os?pretty=true against the container. With the hack in place and this option used to start Elasticsearch, you should see cgroup stats in the response. Against a build of Elasticsearch that does not have the hack, or starting Elasticsearch without this option, this request will produce no cgroup stats.

I tested this in the following way. I built a package of Elasticsearch containing the hack. I started a web server in the directory containing the package with python -m SimpleHTTPServer. I built the Docker image with:

$ sudo docker build --no-cache -t elasticsearch-cgroups --build-arg ES_DOWNLOAD_URL=http://192.168.1.232:8000 --build-arg ELASTIC_VERSION=6.0.0-alpha1-SNAPSHOT .

where the web server started above is listening on 192.168.1.232. I started an image with:

$ sudo docker run -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" elasticsearch-cgroups

and all was right in the world.

For the record, I did some reading to familiarize myself with the issue and found historical info in this docker issue as well as in: https://fabiokung.com/2014/03/13/memory-inside-linux-containers/). There also seems to be an ongoing effort for a library that standardizes how to retrieve info normally found under /proc and /sys/fs/cgroup.

Additional relevant issues are opencontainers/runtime-spec#65, opencontainers/runtime-spec#66, opencontainers/runtime-spec#397, and opencontainers/runc#781 (note that as of now, the last one is still open).

@jasontedor
Copy link
Member Author

I added one comment.

I don't see a comment on the PR?

@dliappis
Copy link
Contributor

dliappis commented Jan 24, 2017

@jasontedor Ah sorry didn't mention your handle: 67fd123#r97553765

@@ -22,4 +22,6 @@ do
fi
done < <(env)

echo "-Des.cgroups.hierarchy.override=/" >> config/jvm.options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is better to do this fix in this file (which is the CMD) or in the Dockerfile. There are + and - for both approaches:

As shown above, though the CMD (bin/es-docker) script

As this is always invoked when starting the container (unless the user overrides our CMD) there is a corner case that if a user ships his own jvm.options via a bind mount (or in the Dockerfile, while building his own docker image by forking ours) there could be a duplicate entry or generally the user get surprised where did that entry come from.

On the other hand having the echo line in bin/es-docker will always make sure that the parameter is present, but we could ensure it's done only if needed with a combo of grep + echo.

Via the Dockerfile

If we just add the above echo command as a RUN line in the Dockerfile (after extracting the tarball) we ensure that things are correct in the image-shipped jvm.options file and also no unexpected surprises if the user decides to bind mount jvm.options or create his own image.

Downside of this approach is that it doesn't "enforce" the option for the user, in case he opts to use his own config file.

I'd like to hear your opinion on this. FWIW I am usually leaning in favor of solutions that don't try too hard to be clever and invasive so I think I'd favor doing it in the Dockerfile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dliappis Given your thoughts here, which I agree with, I wonder what you think of:

diff --git a/build/elasticsearch/bin/es-docker b/build/elasticsearch/bin/es-docker
index b50046e..de544fd 100755
--- a/build/elasticsearch/bin/es-docker
+++ b/build/elasticsearch/bin/es-docker
@@ -22,6 +22,6 @@ do
     fi
 done < <(env)
 
-echo "-Des.cgroups.hierarchy.override=/" >> config/jvm.options
+export ES_JAVA_OPTS="-Des.cgroups.hierarchy.override=/ $ES_JAVA_OPTS"
 
 exec bin/elasticsearch ${es_opts}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this much more. Mutating that file could be really nasty, since it might be owned by the user and living on some persistent storage that they control.

If we can apply this fix via an environment variable, is there any reason not to do so with an ENV declaration in the Dockerfile? I think that's the most transparent and obvious place for future readers to understand what's happening.

Wherever we do it, let's also have a comment linking back to this thread, since it's a pretty subtle thing that's going on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong please, but I don't think we can do it in the Dockerfile since that bakes the value of ES_JAVA_OPTS into the image but users can override this externally when running the image.

I will push a comment explaining the situation later this morning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong please, but I don't think we can do it in the Dockerfile since that bakes the value of ES_JAVA_OPTS into the image but users can override this externally when running the image.

Yes that is correct and in the website docs docker-compose.yml example we are showing how to use ES_JAVA_OPTS as one method of defining the jvm heap size.

@jasontedor Just so that I understand better this solution, basically es.cgroups.hierarchy.override can be defined as a jvm option, as documented here hence will be honored when presented through ES_JAVA_OPTS?
I am asking this as my understanding is that most parameters defined in jvm.options can also be passed as -E style cli args. If that is true, the user can run the image using docker run ... -e es.cgroups.hierarchy.override=/ and so it could be the value here instead of the existing es_opts=''?

(Note that I tried the last point with 6.0.0-alpha1-SNAPSHOT and received:

uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: unknown setting [es.cgroups.hierarchy.override] please check that any required plugins are installed, or check the breaking changes documentation for removed settings

)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that I understand better this solution, basically es.cgroups.hierarchy.override can be defined as a jvm option, as documented here hence will be honored when presented through ES_JAVA_OPTS?

Yes.

I am asking this as my understanding is that most parameters defined in jvm.options can also be passed as -E style cli args. If that is true, the user can run the image using docker run ... -e es.cgroups.hierarchy.override=/ and so it could be the value here instead of the existing es_opts=''?

This part is not right, ES_JAVA_OPTS and jvm.options control JVM flags, so options in ES_JAVA_OPTS can also be set in jvm.options and vice-versa. The -E CLI flags to Elasticsearch are Elasticsearch settings and can also be set in elasticsearch.yml.

(Note that I tried the last point with 6.0.0-alpha1-SNAPSHOT and received:

uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: unknown >setting [es.cgroups.hierarchy.override] please check that any required plugins are installed, or >check the breaking changes documentation for removed settings

)

This is expected since es.cgroups.hierarchy.override is not an Elasticsearch setting, we are reading it as JVM system property which is set via the JVM flag -Des.cgroups.hierarchy.override=/; this can be done via jvm.options or ES_JAVA_OPTS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification @jasontedor !

@dliappis
Copy link
Contributor

@jasontedor Thanks for the info about the endpoint _nodes/stats/os! I think I can discern a pattern for codifying a test in our docker repo using that. Out of curiosity, what is the purpose of running the Python SimpleHTTPServer on the package dir? Just to be able to check the dir structure and retrieve files via http or is there something I am missing?

@dliappis
Copy link
Contributor

@Jarpy I think I understand better the issue:

From reading the initial comment I was under the impression that the issue is an inconsistency between /proc/self/cgroup and /proc/1/cgroup inside the container, but as we both noticed /proc/self/cgroup and /proc/1/cgroup are present and identical inside the container.

Digging a bit more, I guess I didn't understand that the real issue is that the /docker/<container_id>/... style paths reported in /proc/self/cgroup don't correspond to real paths under /sys/fs/cgroup/{memory|cpuset|... and so on }. On an non docker OS, /proc/self/cgroup replies with existing paths under /sys/fs/cgroup/.... So https://github.com/elastic/elasticsearch/pull/22757/files addresses that.

$ docker run --rm -ti docker.elastic.co/elasticsearch/elasticsearch:5.1.2 sh -c "diff -u /proc/1/cgroup /proc/self/cgroup"

$ docker run --rm -ti docker.elastic.co/elasticsearch/elasticsearch:5.1.2 sh -c "cat /proc/1/cgroup && echo && cat /proc/self/cgroup"
11:memory:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
10:cpuset:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
9:devices:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
8:freezer:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
7:hugetlb:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
6:perf_event:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
5:pids:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
4:blkio:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
3:cpu,cpuacct:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
2:net_cls,net_prio:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
1:name=systemd:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253

11:memory:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
10:cpuset:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
9:devices:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
8:freezer:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
7:hugetlb:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
6:perf_event:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
5:pids:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
4:blkio:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
3:cpu,cpuacct:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
2:net_cls,net_prio:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253
1:name=systemd:/docker/0e31101f0c7f1b7c0de5e03d86472a109fb4155f86d38c65062a295033c6b253

@jasontedor
Copy link
Member Author

From reading the initial comment I was under the impression that the issue is an inconsistency between /proc/self/cgroup and /proc/1/cgroup inside the container, but as we both noticed /proc/self/cgroup and /proc/1/cgroup are present and identical inside the container.

Digging a bit more, I guess I didn't understand that the real issue is that the /docker/<container_id>/... style paths reported in /proc/self/cgroup don't correspond to real paths under /sys/fs/cgroup/{memory|cpuset|... and so on }. On an non docker OS, /proc/self/cgroup replies with existing paths under /sys/fs/cgroup/.... So https://github.com/elastic/elasticsearch/pull/22757/files addresses that.

Correct, sorry for not being clear.

@jasontedor
Copy link
Member Author

Out of curiosity, what is the purpose of running the Python SimpleHTTPServer on the package dir? Just to be able to check the dir structure and retrieve files via http or is there something I am missing?

So that I could build a Docker image from an Elasticsearch distribution that I assembled locally. The Dockerfile does this:

RUN wget ${ES_DOWNLOAD_URL}/elasticsearch-${ELASTIC_VERSION}.tar.gz && \

so setting ES_DOWNLOAD_URL to point to the web server lets the Docker build just pull the package (and the sha1 file) from the package I just built.

@dliappis
Copy link
Contributor

jenkins, test it

@dliappis
Copy link
Contributor

I've tested this manually for now as we can't test against master as there are no published artifacts. (I will need to adjust the building framework for this to work.)

@dliappis dliappis merged commit c922253 into elastic:master Jan 25, 2017
@jasontedor jasontedor deleted the cgroups-hack branch January 25, 2017 17:06
@jasontedor
Copy link
Member Author

Thank you @dliappis and @Jarpy.

dliappis pushed a commit that referenced this pull request Jan 26, 2017
Docker mounts cgroup stats inconsistently with the hierarchy 
in /proc/self/cgroup. This breaks the logic for traversing
from /proc/self/cgroup to the stats for a process running inside a
container. In core Elasticsearch, an undocumented hack was added to
support this, for now. Thus, for cgroup stats to be available from
Elasticsearch running inside a Docker container, the Elasticsearch
process should be started with `-Des.cgroups.hierarchy.override=/`.

Add hack for obtaining cgroup stats and comment it.

Relates #25 
Relates elastic/elasticsearch#22757
dliappis pushed a commit that referenced this pull request Jan 26, 2017
Docker mounts cgroup stats inconsistently with the hierarchy 
in /proc/self/cgroup. This breaks the logic for traversing
from /proc/self/cgroup to the stats for a process running inside a
container. In core Elasticsearch, an undocumented hack was added to
support this, for now. Thus, for cgroup stats to be available from
Elasticsearch running inside a Docker container, the Elasticsearch
process should be started with `-Des.cgroups.hierarchy.override=/`.

Add hack for obtaining cgroup stats and comment it.

Relates #25 
Relates elastic/elasticsearch#22757
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants