Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Java 17 support #14433

Open
pettyalex opened this issue Apr 1, 2024 · 2 comments
Open

feature request: Java 17 support #14433

pettyalex opened this issue Apr 1, 2024 · 2 comments
Assignees

Comments

@pettyalex
Copy link

pettyalex commented Apr 1, 2024

What happened?

Hello,

It looks like Hail has a hard coded check to only run on Java 8 and 11, despite Spark supporting Java 17 for a couple years now, including on spark 3.3.x, which is the currently used release for pip install hail: https://spark.apache.org/releases/spark-release-3-3-0.html#build

Would it be possible to add Java 17 support, or possibly even remove the Java version check in general so that it can track what underlying Spark does without additional updates?

There are a bunch of benefits of moving to Java 17, including:

  1. https://kstefanj.github.io/2021/11/24/gc-progress-8-17.html - Significant garbage collector improvements that will likely improve throughput and reduce costs
  2. https://vmnotescom.wordpress.com/2021/09/14/java-17-whats-new-removed-and-preview-in-jdk-17/ - Better Apple Silicon support. I know that darwin-aarch64 has been backported to 8 and 11, but 17 is faster on that platform.
  3. https://spark.apache.org/releases/spark-release-3-5-0.html#removals-behavior-changes-and-deprecations - The next release of Spark will require Java 17 as a minimum version, and making the change now is easier than making more changes all at once in the future.

The following features will be removed in the next Spark major release

Support for Java 8 and Java 11, and the minimal supported Java version will be Java 17
Support for Scala 2.12, and the minimal supported Scala version will be 2.13

Also, requiring specifically Java 8 or 11 has led to some friction for students and researchers who are first evaluating hail. In the past few weeks, I've talked to a lot of students and researchers who wanted to evaluate hail, followed the documentation to install Azul Java 8 but already had an existing Java install and did not update their PATH or JAVA_HOME. Most of their existing Java versions were 17, as 17 is the current default on most Linux distros and a common one to have been installed via Brew in the past few years on Mac.

Alternatively, if you don't want to allow Java 17, potentially Hail could bundle a JRE with it in the PyPI distributable? Using jdeps on the Hail shadow jar, I saw that it only needs these modules in a JRE:

java.base,java.compiler,java.desktop,java.management,java.naming,java.rmi,java.scripting,java.security.jgss,java.sql,jdk.httpserver,jdk.unsupported

That means that a JRE created with jlink like this:

jlink --add-modules java.base,java.compiler,java.desktop,java.management,java.naming,java.rmi,java.scripting,java.security.jgss,java.sql,jdk.httpserver,jdk.unsupported --output minimaljre --strip-debug --no-man-pages --no-header-files --compress=2

comes in at under 30MB gzipped, which would increase the PyPI package by about 20% in size, while allowing users to install and run Hail in any supported python environment without having to consider Java versions at all. Alternatively, have you ever considered distributing Hail through conda-forge or bioconda, where you could specify a JRE that should be installed with it and automatically linked?

Is there a better channel than Github Issues for feature requests? I realize this is not a bug report, and if you want to just close it and say "nope" that's fine, but I've seen a good number of first-time hail users get a bad impression because of this.

Ramble about other nitpicks below

I don't want to spam this repo with issues, but I also noticed while poking around at hail:

  1. It seems to use the default Java GC, which is now G1 in Java 9+. Performance on newer Javas would likely improve with -XX:+UseParallelGC in java opts
  2. The Hail jar includes module-info.class from azure storage, this broke my first attempt to use jdeps to see what modules it needs. Specifically, hail-all-spark.jar says it exports:
backend % jar --describe-module --file=hail-all-spark.jar
releases: 9

[email protected] jar:file:///Users/alex/src/hail/hail/build/deploy/hail/backend/hail-all-spark.jar!/module-info.class
exports com.azure.storage.blob
exports com.azure.storage.blob.models
exports com.azure.storage.blob.options
exports com.azure.storage.blob.sas
exports com.azure.storage.blob.specialized
requires com.azure.storage.common transitive
requires com.azure.storage.internal.avro
requires com.fasterxml.jackson.dataformat.xml
requires java.base mandated
qualified exports com.azure.storage.blob.implementation to com.azure.storage.blob.batch com.azure.storage.blob.cryptography com.azure.storage.file.datalake
qualified exports com.azure.storage.blob.implementation.models to com.azure.storage.blob.batch com.azure.storage.blob.cryptography
qualified exports com.azure.storage.blob.implementation.util to com.azure.storage.blob.batch com.azure.storage.blob.changefeed com.azure.storage.blob.cryptography com.azure.storage.blob.nio com.azure.storage.file.datalake com.fasterxml.jackson.databind
qualified opens com.azure.storage.blob.implementation to com.azure.core com.fasterxml.jackson.databind
qualified opens com.azure.storage.blob.implementation.models to com.azure.core com.fasterxml.jackson.databind
qualified opens com.azure.storage.blob.models to com.azure.core com.fasterxml.jackson.databind
contains com.azure.storage.blob.implementation.accesshelpers

You probably don't want this, shadow jars are usually not modular and you can exclude the module-info from them: GradleUp/shadow#352

Version

0.2.128

Relevant log output

No response

@pettyalex pettyalex added the needs-triage A brand new issue that needs triaging. label Apr 1, 2024
@daniel-goldstein
Copy link
Contributor

daniel-goldstein commented Apr 1, 2024

Hi @pettyalex, thank you for the detailed and thoughtful issue. Hopefully I can shed some light and address all your concerns.

I think the assertion on Java 8 and 11 was an overly defensive precaution put in place some time ago, as hail uses some unsafe JVM APIs that have been deprecated for a while. But as you noted, the world goes on in Java 17 and I don't see a reason Hail shouldn't be compatible. Since most of our closest users use Hail on GCP Dataproc, we generally keep in lock-step with their platform which is unfortunately still on Java 11 so that is what we test against and officially support. Nevertheless, we should remove the restriction and add some light validation in CI against Java 17 and advertise it as unofficially supported until such a time that Dataproc moves to Java 17. Hopefully Spark 3.6 will force their hand. The release process for 0.2.129 is already underway but expect this to be resolved in 0.2.130.

Thanks for your suggestions regarding bundling the JRE and the GC options, we'll definitely consider them. Regarding the module-info.class nonsense, my apologies. That just seems like a bug we should fix. I will create a separate tracking issue for that but I'm not yet sure where that will get prioritized. If it is more than an annoyance for you, please let us know.

Regarding conda-forge, I don't think we currently have the bandwidth or demand (that we know of) to add more distribution systems. Again, this is something where hearing from the community is the best way to figure out how to direct our efforts.

Hopefully this addresses your concerns. Please do follow up if I've missed anything or open more issues if you encounter new problems.

@pettyalex
Copy link
Author

Thanks,

The module-info.class thing is incredibly unlikely to break any user. I only discovered it because it broke jdeps when I was testing generating a bundled JRE.

None of this directly impacts me or established users of Hail in my group, but I have seen Java version be the single biggest pain point for new users wanting to install and try Hail for the first time, which is why I posted this.

@daniel-goldstein daniel-goldstein added query enhancement new-feature and removed needs-triage A brand new issue that needs triaging. enhancement labels Apr 1, 2024
@daniel-goldstein daniel-goldstein removed their assignment Apr 25, 2024
@ehigham ehigham self-assigned this Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants