Package cloudflare-zlib with the Docker image #81208

jpountz · 2021-12-01T13:55:30Z

Elasticsearch uses zlib for two purposes:

Compression of stored fields with index.codec: best_compression, which we use for observability and security data.
Request / response compression.

The original zlib, which is usually the one that is installed, optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this, notably an Intel fork, zlib-ng and a Cloudflare fork.

Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. A downside of this approach is that it didn't allow using one of these faster forks, but since version 9 the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found.

I performed testing with the Cloudflare zlib, which yielded almost 2x faster compression and decompression for JSON documents that are representative of those produced by the Elastic Observability solution. A run of the solutions/logs track with Rally yielded 2.35% faster indexing, 8.28% less cumulative merge time and 4.83% less cumulative indexing time. One particularity of the Cloudflare zlib is that compression levels retain the same semantics as the original zlib (unlike the Intel fork which uses a compatible format but gives different semantics for some compression levels) so the space efficiency of the produced indices was sensibly the same.

This issue suggests that we update our Docker image to use the Cloudflare fork of zlib instead of the original zlib so that users of the Elastic Cloud service and of the Docker image in general would get better performance out of their Elasticsearch clusters.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-12-01T13:55:33Z

Pinging @elastic/es-delivery (Team:Delivery)

jpountz · 2021-12-01T14:25:34Z

Note that a possible alternative would be to include the cloudflare-zlib library directly in the tarball. It felt more challenging than doing it in the Docker image (which is a more controlled environment), hence the suggestion to go with the Docker image. But I'd be equally happy with the tarball approach if it has the preference of the Delivery team.

pugnascotia · 2021-12-01T14:45:56Z

Cloudflare helpfully have their own package repository, with instructions for Debian, Ubuntu and RHEL. Unfortunately, their repo doesn't include Ubuntu packages for ARM / aarch64. The lib is quick to build from source, so maybe we can do something cross-platform.

pugnascotia · 2021-12-01T14:57:29Z

For the record, here are the steps that were used to test the zlib.

git clone [email protected]:cloudflare/zlib.git cloudflare-zlib
cd cloudflare-zlib
./configure --prefix=/usr/local/cloudflare-zlib
make
sudo make install

And then it's about setting the LD_LIBRARY_PATH environment variable, e.g. LD_LIBRARY_PATH=/usr/local/cloudflare-zlib/lib/ bin/elasticsearch, the JVM will automatically pick it up.

jpountz · 2021-12-01T16:37:49Z

I didn't check the Cloudflare packages but hopefully they put the lib in a directory that ld gives higher priority to the directory where the original zlib is so that it would get used naturally without additional measures.

DJRickyB · 2021-12-01T16:44:34Z

Or if we do package it with the tarball, bin/elasticsearch itself could check for its presence and set/extend LD_PRELOAD or LD_LIBRARY_PATH before the JVM initialization

pugnascotia · 2021-12-01T16:47:19Z

TBH I was planning on modifying LD_LIBRARY_PATH. We can mostly do what we like in the Docker images.

Closes elastic#81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.

Closes #81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.

jpountz added >enhancement :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts labels Dec 1, 2021

elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 1, 2021

pugnascotia mentioned this issue Dec 2, 2021

Use Cloudflare's zlib in Docker images #81245

Merged

elasticsearchmachine closed this as completed in #81245 Dec 3, 2021

DJRickyB mentioned this issue Dec 13, 2021

Should we package an alternate zlib implementation in our distributions? #81662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package cloudflare-zlib with the Docker image #81208

Package cloudflare-zlib with the Docker image #81208

jpountz commented Dec 1, 2021

elasticmachine commented Dec 1, 2021

jpountz commented Dec 1, 2021

pugnascotia commented Dec 1, 2021

pugnascotia commented Dec 1, 2021

jpountz commented Dec 1, 2021

DJRickyB commented Dec 1, 2021 •

edited

Loading

pugnascotia commented Dec 1, 2021

Package cloudflare-zlib with the Docker image #81208

Package cloudflare-zlib with the Docker image #81208

Comments

jpountz commented Dec 1, 2021

elasticmachine commented Dec 1, 2021

jpountz commented Dec 1, 2021

pugnascotia commented Dec 1, 2021

pugnascotia commented Dec 1, 2021

jpountz commented Dec 1, 2021

DJRickyB commented Dec 1, 2021 • edited Loading

pugnascotia commented Dec 1, 2021

DJRickyB commented Dec 1, 2021 •

edited

Loading