Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Remove Java 8 support in Arrow v18 #38051

Closed
7 tasks done
danepitkin opened this issue Oct 5, 2023 · 21 comments
Closed
7 tasks done

[Java] Remove Java 8 support in Arrow v18 #38051

danepitkin opened this issue Oct 5, 2023 · 21 comments

Comments

@danepitkin
Copy link
Member

danepitkin commented Oct 5, 2023

Describe the enhancement requested

  1. Java 8 is holding back development of newer Java features. For example, the Java Platform Module System (JPMS)[1], which was introduced in Java 9.
  2. Java 8 is preventing Arrow from using latest packages/dependencies in some places. See examples[2][3][4].
  3. Arrow Java is quite stable, so Java 8 users can probably be fine pinning the Arrow dependency if they aren't interested in upgrading Java versions.
  4. Java 8 is on the decline, and is not the most used Java version in 2023[5].

[1]https://en.wikipedia.org/wiki/Java_Platform_Module_System
[2]https://github.com/apache/arrow/blob/main/dev/release/verify-release-candidate.sh#L571
[3]#37723 (comment)
[4]#13072 (comment)
[5]https://newrelic.com/sites/default/files/2023-04/new-relic-2023-state-of-the-java-ecosystem-2023-04-20.pdf

Post-upgrade tasks

Component(s)

Java

@kou
Copy link
Member

kou commented Oct 5, 2023

@davisusanibar
Copy link
Contributor

In addition, the following dependencies are pinned for JDK8:

@danepitkin
Copy link
Member Author

danepitkin commented Oct 5, 2023

Apache Spark has dropped support for Java 8 and 11 on the main branch (targeting a 4.0 release) apache/spark#43005

Edit: Spark 4.0 release timeframe is 2024-06[1]

[1]https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6

@danepitkin
Copy link
Member Author

Netty 5.0 will remove support for Java 8 netty/netty#10650

@danepitkin
Copy link
Member Author

The current consensus on the Arrow mailing list[1] is to postpone Java 8 deprecation and to revisit it when Spark releases 4.0, which deprecates Java 8 (~2024-06).

[1] https://lists.apache.org/thread/kml53f81z1oskcf00xl7wlbcjssmn91g

@danepitkin
Copy link
Member Author

Apache Derby continuously drops support for older JDK versions #38813

@kevingurney
Copy link
Member

My apologies!

I accidentally unpinned this issue because I thought I had pinned it just for me, by accident. I just repinned it.

@danepitkin
Copy link
Member Author

Apache Iceberg is considering dropping java 8 support https://lists.apache.org/thread/ntrk2thvsg9tdccwd4flsdz9gg743368

@danepitkin
Copy link
Member Author

New mailing list discussion: https://lists.apache.org/thread/65vqpmrrtpshxo53572zcv91j1lb2y8g

@danepitkin danepitkin changed the title [Java] Remove Java 8 support [Java] Remove Java 8 support in Arrow v18 May 16, 2024
@thisisnic thisisnic unpinned this issue May 17, 2024
@thisisnic
Copy link
Member

Apologies, I also unpinned it thinking this was just my GitHub view 😂

@normanj-bitquill
Copy link
Contributor

I've looked into this and have some notes.

Java Modules

When compiling Java code in Java 9 or higher, you can use both the classpath and the module-path.

  • All libraries in the classpath are considered to be part of the UNNAMED module.
  • All libraries in the module path that contain a module-info.java file will be a Java module as expected.
  • All libraries in the module path the do not contain a module-info.java file will be treated as automatic Java modules. The names of the modules are dependent on the name of the Jar file. This creates deployment issues.

Maven with Java Modules

Maven may choose to use both the classpath and module-path.

  • If the Java target is 9 or greater and the current Maven module contains a module-info.java file, then all libraries with a module-info.java file will be placed in the module-path. All other libraries will be on the classpath (this can be configured).
  • Maven can be told to also place libraries without a module-info.java file in the module-path. This will cause them to become automatic Java modules.

Getting Started

A first step migrating to Java 11 would be to remove (or hide) the module-info.java files. This would cause Maven to put everything on the classpath and not cause any build issues. We would not be distributing any module information, so consumers would have to treat Arrow modules as either automatic Java modules or put them on the classpath.

Without the module-info.java files, IntelliJ can easily resolve dependencies and is able to run unit tests.

Longer Term

Longer term, we should include proper module-info.java files in all Arrow modules. Not all of Arrow's dependencies have a module-info.java file, such as flatbuffers-java. It is not reliable to treat these as automatic Java modules during build, since that depends on the file name. We could either shade in the java classes or keep such dependencies on the classpath. If they are on the classpath, then we cannot declare any dependency on them in the module-info.java file and consumers may need extra flags when compiling/running projects depending on Arrow.

I recommend shading in legacy dependencies. This ease the burden for consumers of Arrow libraries. We would not expose packages from those libaries. Consumers can simply add Arrow libraries to the module path without needed flags to grant Arrow modules access to the UNNAMED module.

Some dependencies are obsolete, such as jsr305. We should migrate away from obsolete dependencies. The ThreadSafe annotation could have use, but it is becoming increasingly unlikely that anyone would consume it.

@laurentgo
Copy link
Collaborator

Do you know why module-info.java files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?

I also haven't observed any change of behavior from "Maven" based on the presence or absence of module-info.java either. Maybe it's a plugin thing? Do you have pointers?

@jduo
Copy link
Member

jduo commented Jun 12, 2024

Do you know why module-info.java files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?

I also haven't observed any change of behavior from "Maven" based on the presence or absence of module-info.java either. Maybe it's a plugin thing? Do you have pointers?

The module-info.java files were added to support JPMS in Arrow 17.

When running surefire and failsafe, maven will put JARs with a module-info.class file in the module-path instead of the classpath (when running >JDK8). IIRC there's an option to force using the classpath instead.

@laurentgo
Copy link
Collaborator

laurentgo commented Jun 13, 2024

The module-info.java files were added to support JPMS in Arrow 17.

Arrow 16 you meant? Still why was JPMS support needed? Other projects like iceberg and parquet do not provide JPMS support. #13072 description goes over some of the supposed benefits of JPMS but nothing like a concrete issue the project is trying to solve and it seems now we are discussing removing (temporarily) JPMS support in order to move to Java 11? Something doesn't add up

@normanj-bitquill
Copy link
Contributor

@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a module-info.java file. Maven will always use the module-path for dependencies with a module-info.java file.

@normanj-bitquill
Copy link
Contributor

This work is intended for Arrow 18. I was looking for a way to split up the work. I am not suggesting removing a feature from Arrow for Arrow 18.

There are issues with the current module-info.java files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.

Given the sensitivity here, it looks like everything must be solved in one commit.

@laurentgo
Copy link
Collaborator

@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a module-info.java file. Maven will always use the module-path for dependencies with a module-info.java file.

But since code is tested with Java 11 and higher, doesn't it mean that this already works?

There are issues with the current module-info.java files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.

It seems to be a separate issue from this one, isn't it?

@normanj-bitquill
Copy link
Contributor

This didn't show up yet since the target version of Java is 1.8.

The Maven compiler plugin cares about what the target version of Java is. Currently Arrow targets Java 1.8, so all libraries are placed on the classpath (even if using JDK 11). When targeting Java 9 or higher, Maven compiler plugin will start to look for "module-info.java" files and decide on whether libraries belong in the classpath or module-path.

Use of automatic modules is a separate issue, but may get higher visibility once Java 11 is the minimum for Arrow. More users may start to make use of the JPMS modules.

Switching Arrow to Java 11 is not as simple as changing only the target version of Java. That will cause the Maven compiler plugin to use of the module-path for most dependencies and exposes issues with the existing module-info.java files. I suspect that the module-info.java files were only tested at runtime (with unit tests) not at compile time since the target version of Java was always 1.8. Trying to verify this.

@normanj-bitquill
Copy link
Contributor

I've looked into the CI builds using JDK 11. Those builds still target Java 1.8 when compiling Java code.

laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 3, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
@laurentgo
Copy link
Collaborator

laurentgo commented Jul 3, 2024

As the proof is in the pudding, I took a stab at dropping JDK 8 support and created a pull request

laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 10, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 15, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 16, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 17, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
laurentgo added a commit to laurentgo/arrow that referenced this issue Jul 17, 2024
Remove build support for Java 8 and make Java 11 the minimum version to
use to build Arrow in github actions and ci tasks
danepitkin pushed a commit that referenced this issue Jul 17, 2024
### What changes are included in this PR?

* Remove support for Java 8 in Github actions and other CI/CD tasks and make Java 11 now the default version
* Make Java 11 the minimum version required to build and run Arrow by changing the Maven project configuration:
  - Change minimum java version and source/target/release compiler properties to 11
  - Remove `maven` modules
  - Remove jdk11+ profiles and integrate their content into the main section
  - Let maven-compiler-plugin process `module-info.java` files and address several declaration issues
  - Exclude non modularized modules from javadoc aggregate tasks
  - Exclude module-info.class files from shaded jars as it is not representative of the whole content and may actually directly coming from a 3rd party dependency.
* Update documentation

### Are these changes tested?

Through CI/CD.

### Are there any user-facing changes?

Yes. Java 11 is now required to run any Arrow code

**This PR includes breaking changes to public APIs.**

* GitHub Issue: #38051

Authored-by: Laurent Goujon <[email protected]>
Signed-off-by: Dane Pitkin <[email protected]>
@danepitkin danepitkin added this to the 18.0.0 milestone Jul 17, 2024
@danepitkin
Copy link
Member Author

Issue resolved by pull request 43139
#43139

@raulcd raulcd unpinned this issue Jul 22, 2024
dongjoon-hyun added a commit to apache/spark that referenced this issue Oct 31, 2024
### What changes were proposed in this pull request?

This PR aims to upgrade `Arrow` to 18.0.0 for Apache Spark 4.0.0.

### Why are the changes needed?

To bring the latest improvements and bug fixes,
- https://arrow.apache.org/release/18.0.0.html  (28 October 2024)

Note that `Arrow 18` is the first release who dropped Java 8 like Apache Spark 4.0.0.
- apache/arrow#38051

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48708 from dongjoon-hyun/SPARK-50177.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants