Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we package an alternate zlib implementation in our distributions? #81662

Open
DJRickyB opened this issue Dec 13, 2021 · 12 comments
Open
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts discuss >enhancement Team:Delivery Meta label for Delivery team

Comments

@DJRickyB
Copy link
Contributor

DJRickyB commented Dec 13, 2021

As discussed in #81208, using Cloudflare's zlib implementation improves performance in the following cases:

  • Compression of stored fields with index.codec: best_compression,
    which we use by default for observability and security integrations' data.
  • Request / response compression.
  • (untested, but presumed) Transport compression when using transport.compression_scheme: deflate (not the default for local transport, but is the default for remote clusters when compression is enabled as of Remote compression scheme default to deflate #76580)

Whereas #81208 scoped the initial approach to our Docker image, where we control both the distribution and the environment, this issue asks if we can go one step further and bundle it in our tarballs. Some general notes about why this is worth considering:

  1. When Use Cloudflare's zlib in Docker images #81245 was implemented, it steered the concern of linking the alternate zlib into the elasticsearch script, alleviating need to control the environment variables externally.
  2. Use Cloudflare's zlib in Docker images #81245 added a build step to the Dockerfile for the cloudflare-zlib library. If we could instead host or fetch the desired library we could remove a build step.

If instead we packaged an improved zlib in the Elasticsearch distribution, we could reap the following advantages:

  • Potential for improved performance for more users/use-cases
  • New-normal performance for best_compression, which can be considered for more uses potentially if it comes with a lower expected overhead
  • The change would be applicable to our nightly benchmarking environment, which currently runs the default linux distributions (this may change to a docker-based environment at some point but planning has not begun on that).
  • We would not need to re-build the library every time we build the Docker image (or maybe just every time we update an earlier layer? I don't know much about what our build environment has cached typically).

Disadvantages:

  • The library isn't hosted anywhere, we would have to build and host it.
  • In order to mitigate security vulnerabilities, we would need to keep it up-to-date either by monitoring versions as we currently do for JVM releases (if we choose a library with binary releases) or periodically/automatically build new versions ourselves to bundle with our distributions
    • Likewise, CVEs which affect the zlib implementation we choose are inextricably linked to our releases, meaning instead of patching the system independently we would need to produce patched releases for any supported version of Elasticsearch (alternatively mitigation functionality/notices to just have the user fall back to system zlib)
  • amd64/aarch64 support is implied to not be optimized (see: this blog)

We recently brought up this issue in a Fix-It meeting, and open questions included:

  • What implementation should we prefer? There are a multitude of choices. A nice option is zlib-ng, which is actively maintained and spells out a few of its own advantages versus Cloudflare nicely here: 2.0.0 Benchmark comparisons zlib-ng/zlib-ng#871 (comment)
  • Do we need it in the tarball if it's already in the Docker image? Currently the Cloudflare-zlib will be activated for Cloud customers, as well as users of ECE, ECK, and the Docker image. Currently we have a broad install base of self-managed Elasticsearch, plus this the default tarball distribution is the one that currently gets targeted in our benchmarks (such as the nightly/release results published at https://elasticsearch-benchmarks.elastic.co)
@DJRickyB DJRickyB added >enhancement discuss :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Dec 13, 2021
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@mark-vieira
Copy link
Contributor

mark-vieira commented Dec 13, 2021

I don't have a lot of understanding here but another concern is compatibility. Unlike our Docker distribution, in which we own the runtime environment (mostly) enabling this for all Linux packages in general is tougher. To what degree does the Cloudflare zlib depend on system dynamic libraries or can we build a statically-linked binary to minimize any system compatibility concerns.

@inqueue
Copy link
Member

inqueue commented Dec 14, 2021

+1 to building a statically-linked library, to make cloudflare-zlib universally available in all distributions that can support it instead of the LD_LIBRARY_PATH solution we have currently in the Docker distribution. There are a number of articles, this one is a good start, mentioning LD_LIBRARY_PATH should only be used in testing or development and not in production. I am sure it is fine in our isolated Docker distribution, I imagine users not using Docker would like to realize the benefits of Cloudflare zlib as well and we probably should not be recommending any modification to LD_LIBRARY_PATH.

@droberts195
Copy link
Contributor

+1 to building a statically-linked library

It's one thing to build a static library, but that can only get used if the program that currently expects a dynamic library is rebuilt to link the static library at build time.

In this case I think that program that would need to get rebuilt is java right? So statically linking zlib is really a proposal to produce our own build of Java, and not using the exact build system that comes with Java but with a modified build system that statically links zlib.

I agree LD_LIBRARY_PATH is considered bad practice - if it lists directories that don't contain the shared library then it opens up the possibility of an attacker putting a malicious library with the same name in one of those directories.

The other possibility that's not quite as radical as rebuilding java from scratch is to set an RPATH in the existing java program. This takes priority over LD_LIBRARY_PATH when looking for libraries and can be set without rebuilding using a tool called patchelf. We do this for some libraries we ship with ml-cpp in https://github.com/elastic/ml-cpp/blob/ecb9c6c036d1104c68c90281d554312285c2ac6d/3rd_party/3rd_party.sh#L286

The way library search works is that first RPATH is searched, then LD_LIBRARY_PATH, then RUNPATH and finally the system default locations. RPATH and RUNPATH are embedded in the executable and LD_LIBRARY_PATH is an environment variable.

So if you set RPATH to $ORIGIN/../lib then a program will prefer to load its shared libraries from a lib directory parallel to the bin directory it's in.

java currently sets a RUNPATH that looks in the same directory its in and in the parallel lib directory, for example:

$ readelf -d java | head -40

Dynamic section at offset 0x1d08 contains 33 entries:
  Tag        Type                         Name/Value
 0x0000000000000003 (PLTGOT)             0x2f88
 0x0000000000000002 (PLTRELSZ)           288 (bytes)
 0x0000000000000017 (JMPREL)             0x7d8
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000007 (RELA)               0x718
 0x0000000000000008 (RELASZ)             192 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffff9 (RELACOUNT)          4
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000006 (SYMTAB)             0x278
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000005 (STRTAB)             0x470
 0x000000000000000a (STRSZ)              381 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x5f0
 0x0000000000000004 (HASH)               0x628
 0x0000000000000001 (NEEDED)             Shared library: [libz.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libjli.so]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x8f8
 0x000000000000000d (FINI)               0xd3c
 0x000000000000001a (FINI_ARRAY)         0x2cf8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000019 (INIT_ARRAY)         0x2d00
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN:$ORIGIN/../lib] <-- HERE !
 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
 0x000000006ffffff0 (VERSYM)             0x6c8
 0x000000006ffffffe (VERNEED)            0x6f4
 0x000000006fffffff (VERNEEDNUM)         1
 0x0000000000000000 (NULL)               0x0

So I think if you put the dynamic library Cloudflare zlib in $JAVA_HOME/lib then it will get used in preference to the system version unless LD_LIBRARY_PATH is set. Or you could make it more robust by setting an RPATH on java so that it's immune to LD_LIBRARY_PATH manipulation. And if you wanted that RPATH could include a separate directory for Cloudflare's zlib in addition to $ORIGIN:$ORIGIN/../lib.

@mark-vieira
Copy link
Contributor

I'd like to consider another option here which is providing guidance for folks deploying on Linux (and not using Docker) on how to install cloudflare-zlib themselves rather than us hack a distribution that tries to load it for them. While we certainly want to expose this optimization to as many users as possible, we don't want to sacrifice or complicate portability, which increases the more native code we package in Elasticsearch.

@inqueue
Copy link
Member

inqueue commented Dec 17, 2021

I'd like to consider another option here which is providing guidance for folks deploying on Linux (and not using Docker) on how to install cloudflare-zlib themselves rather than us hack a distribution that tries to load it for them. While we certainly want to expose this optimization to as many users as possible, we don't want to sacrifice or complicate portability, which increases the more native code we package in Elasticsearch.

I had considered this solution and I am -1 as it pushes the burden to users to configure it properly. There could be an added support burden as it may not always be clear which zlib is in use.

@mark-vieira
Copy link
Contributor

I had considered this solution and I am -1 as it pushes the burden to users to configure it properly. There could be an added support burden as it may not always be clear which zlib is in use.

Of course, I just wanted to make this explicit and ensure that we are triaging that burden against the technical complexity and maintenance cost of supporting bundling this functionality in. Let's keep in mind this is an optimization so it should be treated slightly differently than something that affects actual functionality.

There could be an added support burden as it may not always be clear which zlib is in use.

I'd also argue that issues related to us effectively "hacking" Java to use our bundled dynamic library are sure to arise, and will have a cost as well.

@DJRickyB DJRickyB changed the title Should we package Cloudflare's zlib in our distributions? Should we package an alternate zlib implementation in our distributions? Feb 1, 2022
@DJRickyB
Copy link
Contributor Author

DJRickyB commented Feb 1, 2022

Thanks all. After recent discussion I've updated the title and the description to better reflect more concerns

@cburgess
Copy link

cburgess commented Mar 4, 2022

While working on a a local build of elasticsearch in docker for my company I discovered the changes in 8.0+ to use the cloudflare zlib (#81245, and elastic/dockerfiles#95). The problem with what has been merged is that these do not actually use the cloudflare zlib. The cloudflare zlib repo (https://github.com/cloudflare/zlib) is a fork of upstream zlib (https://github.com/madler/zlib). If you look you will see that the tags present in the cloudflare zlib repo are not from cloudflare, they are forked from upstream.

cfb@sandman:foo/> wget https://github.com/madler/zlib/archive/refs/tags/v1.2.8.tar.gz -O upstream
--2022-03-04 11:07:44--  https://github.com/madler/zlib/archive/refs/tags/v1.2.8.tar.gz
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/madler/zlib/tar.gz/refs/tags/v1.2.8 [following]
--2022-03-04 11:07:45--  https://codeload.github.com/madler/zlib/tar.gz/refs/tags/v1.2.8
Resolving codeload.github.com (codeload.github.com)... 192.30.255.120
Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 604952 (591K) [application/x-gzip]
Saving to: ‘upstream’

upstream                                                                                            100%[================================================================================================================================================================================================================================================================>] 590.77K  3.00MB/s    in 0.2s

2022-03-04 11:07:45 (3.00 MB/s) - ‘upstream’ saved [604952/604952]

cfb@sandman:foo/> wget https://github.com/cloudflare/zlib/archive/refs/tags/v1.2.8.tar.gz -O cloudflare
--2022-03-04 11:07:59--  https://github.com/cloudflare/zlib/archive/refs/tags/v1.2.8.tar.gz
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/cloudflare/zlib/tar.gz/refs/tags/v1.2.8 [following]
--2022-03-04 11:07:59--  https://codeload.github.com/cloudflare/zlib/tar.gz/refs/tags/v1.2.8
Resolving codeload.github.com (codeload.github.com)... 192.30.255.120
Connecting to codeload.github.com (codeload.github.com)|192.30.255.120|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 604952 (591K) [application/x-gzip]
Saving to: ‘cloudflare’

cloudflare                                                                                          100%[================================================================================================================================================================================================================================================================>] 590.77K  3.11MB/s    in 0.2s    s

2022-03-04 11:08:00 (3.11 MB/s) - ‘cloudflare’ saved [604952/604952]

cfb@sandman:foo/> sha256sum *
e380bd1bdb6447508beaa50efc653fe45f4edc1dafe11a251ae093e0ee97db9a  cloudflare
e380bd1bdb6447508beaa50efc653fe45f4edc1dafe11a251ae093e0ee97db9a  upstream

The net effect of what has been merged at this time is that elastic is not seeing any performance improvements as its not running the cloudflare code. Additionally, by using the older upstream tag the docker images have the following CVEs that would be fixed by using the zlib package provided in Ubuntu 20.04.

https://nvd.nist.gov/vuln/detail/CVE-2016-9840 - CVSS V3.x Score: 8.8
https://nvd.nist.gov/vuln/detail/CVE-2016-9841 - CVSS V3.x Score: 9.8
https://nvd.nist.gov/vuln/detail/CVE-2016-9842 - CVSS V3.x Score: 8.8
https://nvd.nist.gov/vuln/detail/CVE-2016-9843 - CVSS V3.x Score: 9.8

Since it seems there is some debate as to exactly what/how to provide this improved performance I would recommend the changes to this repo (#81245) and the public docker build repo (elastic/dockerfiles#95) be reverted until this discussion is resolved.

@mark-vieira
Copy link
Contributor

It seems to me that the issue with Cloudflare zlib is it's very much not intended for general consumption. The only way to stay up to date with the latest upstream zlib is to consume HEAD from that repo which is far to prone to errors and I don't think we have sufficient coverage against our Docker packaging format to give us strong confidence here.

My preference would be to revert this change until we can investigate a better way to ensure we are integrating an up-to-date fork of zlib and have better test coverage in this area. @jpountz thoughts?

pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Mar 4, 2022
Relates to elastic#81662. This library isn't ready for public consumption.
Remove it from the Docker build.
pugnascotia added a commit that referenced this issue Mar 7, 2022
Relates to #81662. This library isn't ready for public consumption.
Remove it from the Docker build.
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Mar 7, 2022
Relates to elastic#81662. This library isn't ready for public consumption.
Remove it from the Docker build.
pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Mar 7, 2022
Relates to elastic#81662. This library isn't ready for public consumption.
Remove it from the Docker build.
elasticsearchmachine pushed a commit that referenced this issue Mar 7, 2022
Relates to #81662. This library isn't ready for public consumption.
Remove it from the Docker build.
elasticsearchmachine pushed a commit that referenced this issue Mar 7, 2022
Relates to #81662. This library isn't ready for public consumption.
Remove it from the Docker build.
@jpountz
Copy link
Contributor

jpountz commented Mar 8, 2022

@mark-vieira +1 to revert for now

@b-deam
Copy link
Member

b-deam commented Apr 12, 2022

We should also investigate the use of Intel's IPP zlib as another alternative option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts discuss >enhancement Team:Delivery Meta label for Delivery team
Projects
None yet
Development

No branches or pull requests

8 participants