Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Cloudflare's zlib in Docker images #81245

Merged

Conversation

pugnascotia
Copy link
Contributor

Closes #81208. Elasticsearch uses zlib for two purposes:

  • Compression of stored fields with index.codec: best_compression,
    which we use for observability and security data.
  • Request / response compression.

Historically, zlib was packaged within the JDK, so that users wouldn't
have to have zlib installed for basic usage of Java. However, the
original zlib optimizes for portability and misses a number of important
optimizations such as leveraging vectorization support for x86 and ARM
architectures. Several forks have been created in order to address this.

Since version 9, the JDK uses the system's zlib when available and falls
back to the zlib that is packaged within the JDK if a system zlib cannot
be found.

This commit changes the Docker image to install the Cloudflare fork of
zlib, and run Java using the fork instead of the original zlib, so that
users of the Docker image can get better performance.

Other ES distribution types are out-of-scope, since configuring the JVM
to use an alternative zlib requires an environment config as well as
installed another zlib, and Docker is the only distribution type where
we can control both.

Closes elastic#81208. Elasticsearch uses zlib for two purposes:

   * Compression of stored fields with `index.codec: best_compression`,
     which we use for observability and security data.
   * Request / response compression.

Historically, zlib was packaged within the JDK, so that users wouldn't
have to have zlib installed for basic usage of Java. However, the
original zlib optimizes for portability and misses a number of important
optimizations such as leveraging vectorization support for x86 and ARM
architectures. Several forks have been created in order to address this.

Since version 9, the JDK uses the system's zlib when available and falls
back to the zlib that is packaged within the JDK if a system zlib cannot
be found.

This commit changes the Docker image to install the Cloudflare fork of
zlib, and run Java using the fork instead of the original zlib, so that
users of the Docker image can get better performance.

Other ES distribution types are out-of-scope, since configuring the JVM
to use an alternative zlib requires an environment config as well as
installed another zlib, and Docker is the only distribution type where
we can control both.
@pugnascotia pugnascotia added >enhancement :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts v8.0.0 v8.1.0 labels Dec 2, 2021
@elasticmachine elasticmachine added the Team:Delivery Meta label for Delivery team label Dec 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@elasticsearchmachine
Copy link
Collaborator

Hi @pugnascotia, I've created a changelog YAML for you.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks for putting up a PR so quickly! I wonder if there's a way we can test that this cloudflare-zlib is actually being picked up by the bundled JDK, such as running ldd /path/to/java | grep libz and making sure it's the cloudflare zlib and not the original zlib?

@@ -73,6 +73,10 @@ if [[ -n "$ES_LOG_STYLE" ]]; then
esac
fi

if [[ -d /usr/local/cloudflare-zlib/lib ]]; then
export LD_LIBRARY_PATH=/usr/local/cloudflare-zlib/lib
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we try to preserve existing values of LD_LIBRARY_PATH, e.g.

Suggested change
export LD_LIBRARY_PATH=/usr/local/cloudflare-zlib/lib
export LD_LIBRARY_PATH=/usr/local/cloudflare-zlib/lib:$LD_LIBRARY_PATH

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that, but I also struggled to think of another reason why someone else would also be setting LD_LIBRARY_PATH with our image. Any ideas, anyone?

Also, I now think that this isn't the right place for this change, because Cloud do something different with the Docker entrypoint. I'll move this into the elasticsearch script and put a guard on it with $ES_DISTRIBUTION_TYPE.

@pugnascotia
Copy link
Contributor Author

pugnascotia commented Dec 2, 2021

Looks like it gets picked up OK:

$ docker run -u root -it --rm elasticsearch:test bash
root@443cf81c2b16:/usr/share/elasticsearch# export LD_LIBRARY_PATH=/usr/local/cloudflare-zlib/lib
root@443cf81c2b16:/usr/share/elasticsearch# ldd jdk/bin/java
        linux-vdso.so.1 (0x0000ffff851b9000)
        libz.so.1 => /usr/local/cloudflare-zlib/lib/libz.so.1 (0x0000ffff85140000)
        libjli.so => /usr/share/elasticsearch/jdk/bin/../lib/libjli.so (0x0000ffff8511f000)
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff850ef000)
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffff850db000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff84f68000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffff85189000)

@pugnascotia
Copy link
Contributor Author

@mark-vieira the packaging tests on SLES are reporting failure again, with what looks like a failure to talk to HOMER?

@DJRickyB
Copy link
Contributor

DJRickyB commented Dec 2, 2021

A few thoughts on scope:

Other ES distribution types are out-of-scope, since configuring the JVM
to use an alternative zlib requires an environment config as well as
installed another zlib, and Docker is the only distribution type where
we can control both.

Now since the environment config is being applied from within bin/elasticsearch this is slightly less true. Likewise we are adding a build step for a tiny binary, the version being > 8 years old. Would we consider instead building/persisting the platform-specific libraries somewhere (possibly even in this repo) if they are small enough, rather than adding a build step to our process?

If so:

  • the tarball could package the library, meaning the enhancement comes on every supported platform
  • our nightly benchmarks could pick up the change, as they do not currently test using the docker distribution
  • the added library gets scoped into more integrations tests, including those driven by the rest of the stack/solutions
  • we can simplify the elasticsearch script condition to simply look for the library in $ES_HOME/wherever, or generically add a place for native libraries in ES_HOME and always set this

@DJRickyB
Copy link
Contributor

DJRickyB commented Dec 2, 2021

a note on testing in docker, i think at this point we'd want to see the following show cloudflare-zlib in use, given setting the library path has moved out of the entrypoint script:

# bin/elasticsearch -d
# pmap -p $(pidof java)

@pugnascotia
Copy link
Contributor Author

Thanks @DJRickyB, I added a test case to cover this.


final boolean matches = sh.run("bash -c 'pmap -p $(pidof java)'").stdout.lines().anyMatch(line -> line.contains("cloudflare-zlib"));

assertTrue("Expect java to be using cloudflare-zlib", matches);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be nice to have the output of the command in the error message in case the problem is not fully reproducible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea @jpountz, I've modified the test.

@mark-vieira
Copy link
Contributor

@mark-vieira the packaging tests on SLES are reporting failure again, with what looks like a failure to talk to HOMER?

Infra merged a fix for this which might take a day to percolate through CI as the images get rebuilt.

@pugnascotia
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-unix

@pugnascotia pugnascotia added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Dec 3, 2021
@elasticsearchmachine elasticsearchmachine merged commit 1f5a0ed into elastic:master Dec 3, 2021
pugnascotia added a commit that referenced this pull request Dec 3, 2021
Closes #81208. Elasticsearch uses zlib for two purposes:    *
Compression of stored fields with `index.codec: best_compression`,     
which we use for observability and security data.    * Request /
response compression. Historically, zlib was packaged within the JDK, so
that users wouldn't have to have zlib installed for basic usage of Java.
However, the original zlib optimizes for portability and misses a number
of important optimizations such as leveraging vectorization support for
x86 and ARM architectures. Several forks have been created in order to
address this. Since version 9, the JDK uses the system's zlib when
available and falls back to the zlib that is packaged within the JDK if
a system zlib cannot be found. This commit changes the Docker image to
install the Cloudflare fork of zlib, and run Java using the fork instead
of the original zlib, so that users of the Docker image can get better
performance. Other ES distribution types are out-of-scope, since
configuring the JVM to use an alternative zlib requires an environment
config as well as installed another zlib, and Docker is the only
distribution type where we can control both.
@pugnascotia
Copy link
Contributor Author

Backported to 8.0 in e807fb3.

pugnascotia added a commit that referenced this pull request Dec 3, 2021
Closes #81208. Elasticsearch uses zlib for two purposes:    *
Compression of stored fields with `index.codec: best_compression`,
which we use for observability and security data.    * Request /
response compression. Historically, zlib was packaged within the JDK, so
that users wouldn't have to have zlib installed for basic usage of Java.
However, the original zlib optimizes for portability and misses a number
of important optimizations such as leveraging vectorization support for
x86 and ARM architectures. Several forks have been created in order to
address this. Since version 9, the JDK uses the system's zlib when
available and falls back to the zlib that is packaged within the JDK if
a system zlib cannot be found. This commit changes the Docker image to
install the Cloudflare fork of zlib, and run Java using the fork instead
of the original zlib, so that users of the Docker image can get better
performance. Other ES distribution types are out-of-scope, since
configuring the JVM to use an alternative zlib requires an environment
config as well as installed another zlib, and Docker is the only
distribution type where we can control both.
@pugnascotia
Copy link
Contributor Author

Backported to 7.16 in 6582acf.

@pugnascotia pugnascotia deleted the use-cloudflare-zlib-in-docker branch December 3, 2021 11:37
@pugnascotia
Copy link
Contributor Author

Hmm, we may have missed the cut-off for 7.16.0, meaning that we'd either release this in 7.16.1 (which is questionable) or wait until 7.17.0.

@mark-vieira
Copy link
Contributor

Would this be considered breaking in any way? What are the user facing implications here?

@jpountz
Copy link
Contributor

jpountz commented Dec 3, 2021

FWIW my expectation was that this change would make it to 8.1 since it's an enhancement and we're past feature freeze for both 7.16 and 8.0.

@mark-vieira
Copy link
Contributor

If we back it out of 7.16 we definitely want it in for 7.17 though. That said, we haven't created a branch for that release yet so we'd have to hold off on the backport until that is done.

pugnascotia added a commit that referenced this pull request Dec 3, 2021
@pugnascotia
Copy link
Contributor Author

OK, I've yanked it from 7.16 in 4eb9667 for now. I can apply it to 7.17 once it exists.

@jpountz
Copy link
Contributor

jpountz commented Dec 9, 2021

I can apply it to 7.17 once it exists.

Given recent communication that 7.17 should effectively be treated as a patch release, I think we should only target 8.x with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >enhancement Team:Delivery Meta label for Delivery team v7.16.0 v8.0.0-rc1 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package cloudflare-zlib with the Docker image
6 participants