-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: Java 17 support #14433
Comments
Hi @pettyalex, thank you for the detailed and thoughtful issue. Hopefully I can shed some light and address all your concerns. I think the assertion on Java 8 and 11 was an overly defensive precaution put in place some time ago, as hail uses some unsafe JVM APIs that have been deprecated for a while. But as you noted, the world goes on in Java 17 and I don't see a reason Hail shouldn't be compatible. Since most of our closest users use Hail on GCP Dataproc, we generally keep in lock-step with their platform which is unfortunately still on Java 11 so that is what we test against and officially support. Nevertheless, we should remove the restriction and add some light validation in CI against Java 17 and advertise it as unofficially supported until such a time that Dataproc moves to Java 17. Hopefully Spark 3.6 will force their hand. The release process for 0.2.129 is already underway but expect this to be resolved in 0.2.130. Thanks for your suggestions regarding bundling the JRE and the GC options, we'll definitely consider them. Regarding the Regarding conda-forge, I don't think we currently have the bandwidth or demand (that we know of) to add more distribution systems. Again, this is something where hearing from the community is the best way to figure out how to direct our efforts. Hopefully this addresses your concerns. Please do follow up if I've missed anything or open more issues if you encounter new problems. |
Thanks, The None of this directly impacts me or established users of Hail in my group, but I have seen Java version be the single biggest pain point for new users wanting to install and try Hail for the first time, which is why I posted this. |
What happened?
Hello,
It looks like Hail has a hard coded check to only run on Java 8 and 11, despite Spark supporting Java 17 for a couple years now, including on spark 3.3.x, which is the currently used release for
pip install hail
: https://spark.apache.org/releases/spark-release-3-3-0.html#buildWould it be possible to add Java 17 support, or possibly even remove the Java version check in general so that it can track what underlying Spark does without additional updates?
There are a bunch of benefits of moving to Java 17, including:
Also, requiring specifically Java 8 or 11 has led to some friction for students and researchers who are first evaluating hail. In the past few weeks, I've talked to a lot of students and researchers who wanted to evaluate hail, followed the documentation to install Azul Java 8 but already had an existing Java install and did not update their PATH or JAVA_HOME. Most of their existing Java versions were 17, as 17 is the current default on most Linux distros and a common one to have been installed via Brew in the past few years on Mac.
Alternatively, if you don't want to allow Java 17, potentially Hail could bundle a JRE with it in the PyPI distributable? Using jdeps on the Hail shadow jar, I saw that it only needs these modules in a JRE:
That means that a JRE created with jlink like this:
comes in at under 30MB gzipped, which would increase the PyPI package by about 20% in size, while allowing users to install and run Hail in any supported python environment without having to consider Java versions at all. Alternatively, have you ever considered distributing Hail through conda-forge or bioconda, where you could specify a JRE that should be installed with it and automatically linked?
Is there a better channel than Github Issues for feature requests? I realize this is not a bug report, and if you want to just close it and say "nope" that's fine, but I've seen a good number of first-time hail users get a bad impression because of this.
Ramble about other nitpicks below
I don't want to spam this repo with issues, but I also noticed while poking around at hail:
-XX:+UseParallelGC
in java optsjdeps
to see what modules it needs. Specifically,hail-all-spark.jar
says it exports:You probably don't want this, shadow jars are usually not modular and you can exclude the module-info from them: GradleUp/shadow#352
Version
0.2.128
Relevant log output
No response
The text was updated successfully, but these errors were encountered: