-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package cloudflare-zlib with the Docker image #81208
Comments
Pinging @elastic/es-delivery (Team:Delivery) |
Note that a possible alternative would be to include the cloudflare-zlib library directly in the tarball. It felt more challenging than doing it in the Docker image (which is a more controlled environment), hence the suggestion to go with the Docker image. But I'd be equally happy with the tarball approach if it has the preference of the Delivery team. |
Cloudflare helpfully have their own package repository, with instructions for Debian, Ubuntu and RHEL. Unfortunately, their repo doesn't include Ubuntu packages for ARM / aarch64. The lib is quick to build from source, so maybe we can do something cross-platform. |
For the record, here are the steps that were used to test the zlib.
And then it's about setting the |
I didn't check the Cloudflare packages but hopefully they put the lib in a directory that ld gives higher priority to the directory where the original zlib is so that it would get used naturally without additional measures. |
Or if we do package it with the tarball, |
TBH I was planning on modifying |
Closes elastic#81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.
Closes #81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.
Closes #81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.
Closes #81208. Elasticsearch uses zlib for two purposes: * Compression of stored fields with `index.codec: best_compression`, which we use for observability and security data. * Request / response compression. Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. However, the original zlib optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this. Since version 9, the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found. This commit changes the Docker image to install the Cloudflare fork of zlib, and run Java using the fork instead of the original zlib, so that users of the Docker image can get better performance. Other ES distribution types are out-of-scope, since configuring the JVM to use an alternative zlib requires an environment config as well as installed another zlib, and Docker is the only distribution type where we can control both.
Elasticsearch uses zlib for two purposes:
index.codec: best_compression
, which we use for observability and security data.The original zlib, which is usually the one that is installed, optimizes for portability and misses a number of important optimizations such as leveraging vectorization support for x86 and ARM architectures. Several forks have been created in order to address this, notably an Intel fork, zlib-ng and a Cloudflare fork.
Historically, zlib was packaged within the JDK, so that users wouldn't have to have zlib installed for basic usage of Java. A downside of this approach is that it didn't allow using one of these faster forks, but since version 9 the JDK uses the system's zlib when available and falls back to the zlib that is packaged within the JDK if a system zlib cannot be found.
I performed testing with the Cloudflare zlib, which yielded almost 2x faster compression and decompression for JSON documents that are representative of those produced by the Elastic Observability solution. A run of the
solutions/logs
track with Rally yielded 2.35% faster indexing, 8.28% less cumulative merge time and 4.83% less cumulative indexing time. One particularity of the Cloudflare zlib is that compression levels retain the same semantics as the original zlib (unlike the Intel fork which uses a compatible format but gives different semantics for some compression levels) so the space efficiency of the produced indices was sensibly the same.This issue suggests that we update our Docker image to use the Cloudflare fork of zlib instead of the original zlib so that users of the Elastic Cloud service and of the Docker image in general would get better performance out of their Elasticsearch clusters.
The text was updated successfully, but these errors were encountered: